Generate embedding batch
This endpoint allows you to generate batches of embeddings for a given content. To use it, you need first to ensure you have embeddings enabled (see: How to enable embeddings for more information).
You need to give each document a unique ID because the embedding will not be returned in the same order as in the input. This is because the embeddings will be generated in parallel by the agents. To match them with your original documents, you need to match source_document_id
field in the resulting endpoint with the id
field of your input document.
Paddler will divide the input batch roughly into context-size chunks and distribute them evenly between the available agents.
Those requests use the Paddler's buffer, so if you need to generate a lot of embeddings, you likely need to increase the --max-buffered-requests
and --buffered-request-timeout
parameters of the balancer by a lot.
Endpoint
Method: POST
Path: /api/v1/generate_embedding_batch
Payload
Parameters
input_batch
An array of objects, each containing an ID and content for which the embedding should be generated.
The ID is what will allow you to match the generated embedding with the original content later.
normalization_method
This is how Paddler normalizes the generated embedding before it's sent back to you. Possible values are:
"L2"
"None"
{ "RmsNorm": { "epsilon": 0.001 } }
Response
Success
The last token that ends the stream is:
Error
You need to have the enable_embeddings
option enabled when sending the request. If not enabled, you will get the 501 error (Not Implemented).