WebSocket endpoint
Generally, conversational applications require a lot of back-and-forth communication between the client and the server. WebSocket allows for a persistent connection, which reduces the overhead of establishing a new connection for each request.
Paddler's inference endpoint multiplexes and demultiplexes multiple requests over a single connection, so you can reuse the same socket for multiple requests at the same time.
Endpoint
Method: WebSocket
Path: /api/v1/inference_socket
Protocol
Paddler's protocol is similar to JSON-RPC in a sense that if follows the general idea of giving each request a unique ID and returning a response with the same ID.
The primary difference is, to simplify the protocol, Paddler requires the same arguments as with the HTTP API, just wrapped in an additional envelope with the ID of the request. For example:
The response envelope looks like this:
Supported methods
Continue from conversation history
This method works exactly the same as its HTTP counterpart. You only need to wrap the request in the envelope and pass it to the WebSocket endpoint.
After that you will receive a stream of responses, each containing a generated token. You can follow the HTTP docs for the response format.
Continue from raw prompt
Same as above, but you can follow input/output specification of the Continue from raw prompt instead:
You can know that the request is done when you receive the "Done"
token in the response stream: