Continue from conversation history
This endpoint allows you to continue a conversation from a given history of messages.
It takes your entire conversation history, uses the chat template to format the prompt, and generates tokens in response.
Endpoint
Method: POST
Path: /api/v1/continue_from_conversation_history
Payload
Parameters
add_generation_prompt
Whether to append the opened assistant
prompt to the conversation history.
conversation_history
Array of all the previous conversation messages.
enable_thinking
If you are using a model that supports thinking (like DeepSeek, or Qwen), this will enable the thinking mode.
If you enable this mode you need to send the thinking
part of the messages in the conversation_history
array (the part between <think>
and </think>
) alongside the rest of the messages.
max_tokens
Maximum number of tokens to generate in the response. This is a hard limit, use it as a failsafe to prevent the model from generating too many tokens.
Response
Success
Stream of tokens in the reponse body. Each token is a JSON object:
The last token that ends the stream is:
Error
In case of an error, the response will be:
Sending requests with function calling
To use function calling with this endpoint, you need to define the functions in the optional tools
parameter.
An example payload with the function calling might look like this:
And the possible response:
<tool_call>
{
"name": "get_weather",
"arguments": {
"location": "New York City, NY",
"unit": "fahrenheit"
}
}
</tool_call>