Production-ready agent deployment and scaling strategies
min-agents
0
if unspecifiedMinimum number of agent instances maintained in a pool at all times.
min-agents
configuration to determines the number of agent instances that should be kept warm in their deployment pool. A warm instance is kept running and can immediately be used to serve an active session.
Maintaining a minimum number of agent instances is important to keep agent start times fast and reduce cold starts.
--min-agents
to 1 or greater will incur charges even when the agent
is not in use.max-agents
Maximum agents is the hard limit on the number of agents in your pool.
max-agents
configuration that limits the number of agent instances that your pool can contain.
This exists as a cost control measure, allowing developers to limit the total number of active sessions that can be run at any one time.
The maximum instance count is a hard limit, meaning requests made to a pool that is at capacity will receive a 429
response. See starting sessions for more information for how to handle this in your application code.
Pool Initialization
min-agents
configuration (defaults to 0
)Session Assignment
429
response from the start requestAuto-scaling
How long does a cold start take?
Mitigation strategies
min-agents
) in your pool to ensure that there are always agent instances available to handle requests.max-agents
) and issue capacity notifications in your application./start
endpoint without worrying about configuring additional capacitymin-agents
value than your peak traffic requirementsmin-agents
values is for extremely rapid traffic spikes (tens or hundreds of calls per second) where the buffer can’t be provisioned fast enough.
min-agents
to 0 initially and test how the system performs for your specific use case. Many applications work well without any pre-warmed agent instances.
min-agents
value can help prevent cold starts during critical periods. Consider scheduling higher min-agents
values only during your peak usage hours.
--min-agents
parameter to 1. However, if your application experiences fluctuations in traffic, you may need to plan for additional warm capacity to ensure your agents are always ready to respond immediately.
/start
endpoint (or CLI or SDK equivalent). The active session ends when your pipeline shuts down.
Reserved session minutes are the time your warm agent instances are kept running, even if they are not handling active sessions. When active sessions start, the auto-scaler may provision further warm agent instances to support the next incoming request. Reserved session minutes are optional and controlled by setting --min-agents
in your deployment configuration.