Retries and Timeouts
LLM APIs fail in ways normal APIs rarely do: requests hang for 60+ seconds, providers return intermittent 5xx errors under load, and a "successful" response can still be unparseable. Hand-rolling retry loops around every call is the boilerplate llm-exe is built to remove — timeouts, retries, and backoff are configuration, not code.
Step 1 - Configure the LLM
All three options are generic options — they work the same for every provider:
export function createSummarizer() {
const llm = useLlm("openai.gpt-4o-mini", {
timeout: 15000, // fail any single API call after 15 seconds
numOfAttempts: 3, // make up to 3 attempts before throwing
maxDelay: 5000, // cap the backoff wait between attempts at 5 seconds
});
return createLlmExecutor({
name: "summarize",
llm,
prompt: createChatPrompt<{ text: string }>(
"Summarize the following in one sentence: {{text}}"
),
parser: createParser("string"),
});
}timeout— maximum time for a single API call, in millisecondsnumOfAttempts— total attempts before the executor throwsmaxDelay— cap on the backoff wait between attempts
Step 2 - Observe Failures and Handle the Final Error
Retries handle the transient failures silently. For the failures that survive all attempts, attach an onError hook for telemetry, and catch the typed error at the call site to decide the fallback:
export async function summarize(text: string): Promise<string | null> {
const summarizer = createSummarizer();
summarizer.on("onError", (exec, meta) => {
console.error(`${meta.name} failed after retries:`, exec.errorMessage);
});
try {
return await summarizer.execute({ text });
} catch (error) {
if (isLlmExeError(error)) {
// error.code identifies what failed (timeout, provider error,
// parse failure) so you can decide the right fallback per case.
return null;
}
throw error;
}
}The hook observes; the try/catch decides. isLlmExeError narrows the error so you can read error.code and error.category to distinguish a timeout from a parse failure from a provider error — see Error Handling for the full list of codes.
What to set the values to
There is no universal right answer, but the trade-offs are consistent:
- User-facing requests: short timeout (10–15s), 2 attempts. A user won't wait 90 seconds; fail fast and show a fallback.
- Background jobs: longer timeout (30–60s), 3+ attempts. Nobody is waiting, so trade latency for reliability.
- Parse failures are not transient. Retrying the identical prompt after a parse failure sometimes helps (sampling varies), but if a prompt fails to parse consistently, fix the prompt or the parser — don't raise
numOfAttemptsto paper over it.
Related
- Generic LLM Options — full reference for
timeout,numOfAttempts,maxDelay, and more - Executor Hooks — logging, latency tracking, and alerting on every call
- Error Handling —
LlmExeErrorcodes and categories - Get a Yes/No Decision from an LLM — a small typed function worth making production-safe
