Embed a Batch of Strings in One Call
You have a list of strings to embed -- documents to index, queries to compare, passages to cluster -- and you want one gRPC call to return one vector per input, in order, without writing per-string fetch loops or queueing logic. This page walks through the bounded task of sending a batch and reading the response, so callers can focus on what to do with the vectors instead of how to fetch them.
For the full setup including architecture and connection details, see Embed Text Using a Shared Service.
Prerequisites
-
Completed the
Embed Text Using a Shared Service
guide -- you have
@forwardimpact/librpcand@forwardimpact/libtypeinstalled, the embedding service is running, andcreateClient("embedding")connects successfully.
Connect
import { createClient, createTracer } from "@forwardimpact/librpc";
import { createLogger } from "@forwardimpact/libtelemetry";
import { embedding } from "@forwardimpact/libtype";
const logger = createLogger("my-product");
const tracer = await createTracer("my-product");
const embeddingClient = await createClient("embedding", logger, tracer);
Embed a batch
Pass every input in a single EmbeddingsRequest:
const inputs = [
"Reset the database connection pool on each restart.",
"Pool restarts force every active query to reissue.",
"Coffee beans roast best at 215 degrees Celsius.",
];
const request = embedding.EmbeddingsRequest.fromObject({ input: inputs });
const result = await embeddingClient.CreateEmbeddings(request);
The response preserves order.
result.data[i] corresponds to inputs[i],
so you can zip them back together without tracking IDs:
const pairs = inputs.map((text, i) => ({
text,
vector: result.data[i].values,
}));
Why batch in one call
The service issues one HTTP request to the TEI sidecar per gRPC
call, regardless of input length. Calling
CreateEmbeddings once with 50 strings is faster than
calling it 50 times with one string each -- you avoid the per-call
gRPC round trip and the per-request TEI overhead. The TEI backend
batches internally on the inference side as well.
Practical batch-size guidance:
-
For typical short text (titles, queries, log lines), batches of
32-128 strings move smoothly through the default
bge-small-en-v1.5model on a CPU host. - For long documents, split into smaller batches first; TEI imposes a per-request token limit that the default model enforces at 512 tokens.
- For online queries that need low tail latency, send one input at a time even though batching would be more throughput-efficient -- the round-trip cost is small at single-input size.
Handle a partial failure
The TEI backend either returns all vectors or fails the entire request. If the call throws, none of the vectors are usable and the request needs to be retried (or split if a specific input is the cause).
try {
const result = await embeddingClient.CreateEmbeddings(request);
return result.data;
} catch (err) {
// Whole batch failed. Retry or split inputs to isolate the offending one.
throw err;
}
The service does not attempt to recover by re-running individual inputs; that policy belongs in the caller because it depends on the feature using the embeddings.
Verify
You have reached the outcome of this guide when:
-
A single
CreateEmbeddingscall returns oneEmbeddingVectorper input in the request array, in the same order. - Batches in the 32-128 range complete in a single gRPC round trip without client-side queuing.
- A whole-batch failure surfaces as a thrown error, not as partial data.