Continue from raw prompt
This endpoint allows you to send a raw prompt to the LLM without applying any chat templates and receive tokens in response.
Not applying the chat template means that whatever you pass as the raw prompt will be sent directly to the LLM.
Generally, use the other endpoint to generate completions (Continue from conversation history), because they are much safer, and guarantee any kind of quality of the response.
If you need to force your own chat template, prefer to do that through the Balancer desired state endpoint instead.
Endpoint
Method: POST
Path: /api/v1/continue_from_raw_prompt
Payload
Parameters
max_tokens
Maximum number of tokens to generate in the response. This is a hard limit, use it as a failsafe to prevent the model from generating too many tokens.
raw_prompt
String with the raw prompt to send to the LLM.
Response
Success
Stream of tokens in the reponse body. Each token is a JSON object:
The last token that ends the stream is:
Error
In case of an error, the response will be: