Going beyond a single device
So far, we've started a Paddler fleet with several agents, but everything was set up locally on a single device. Let's now go through setting up a multi-agent fleet on several devices.
Starting the balancer
As before, we will start the balancer with the Inference service and the Management service. They need to run on the same device, but on different network interfaces.
The Inference service should be exposed externally, because it needs to be reachable by products and applications that send requests to it to obtain tokens and embeddings.
The Management service should be exposed internally for agents only.
Here's an example of running the balancer this way (assuming that 192.168.1.0
and 10.0.0.0
are two different subnets with no routing between them):
Running agents
Agents need to be able to reach the Management service, so we will put them and the Management service in one isolated subnet with no external traffic at the routing level.
The optimal way is to run each agent on a separate device and give it several slots it can work with to handle the concurrent requests. And this is exactly what we will do in our multi-device setup.
Let's run agent-1 on one separate device:
And the second agent-2 on another device:
Additionally, if you want the agents to use models from Hugging Face, make sure your agents have access to the internet (or at least to Hugging Face).