Look Up Context in an Index
You need to find resources in a growing index -- not by semantic
similarity or graph traversal, but by structural properties like
type prefix or identifier pattern. Loading the entire dataset into
your application is wasteful when you only need a filtered subset.
@forwardimpact/libindex provides a JSONL-backed index
with lazy loading and built-in filters that keep memory use
proportional to results, not to corpus size.
For the full workflow of building a grounded context pipeline, see Ground Agents in Context.
Prerequisites
- Node.js 18+
@forwardimpact/libindexinstalled:
npm install @forwardimpact/libindex
Create an index
An IndexBase instance needs a storage backend and an
optional index key (defaults to index.jsonl):
import { IndexBase } from "@forwardimpact/libindex";
import { createStorage } from "@forwardimpact/libstorage";
const storage = createStorage("my-index");
const index = new IndexBase(storage);
The index file does not need to exist yet. On first access,
IndexBase checks for the file and initializes an empty
in-memory map if the file is missing.
Add items
Each item requires an id string and an
identifier object. The id is the map key;
the identifier carries the typed resource metadata:
import { resource } from "@forwardimpact/libtype";
const identifier = new resource.Identifier({
type: "common.Message",
name: "a1b2c3",
parent: "",
});
identifier.tokens = 42;
await index.add({
id: String(identifier),
identifier,
});
Each add call appends one JSON line to the storage file
and updates the in-memory map. The index is immediately queryable
after the write.
Query with filters
The queryItems method scans the in-memory index and
applies three filters in sequence: prefix, limit, and token budget.
Filter by prefix
Return only identifiers whose string representation starts with a given prefix:
const messages = await index.queryItems({ prefix: "common.Message" });
console.log(messages.length);
12
Limit the result count
Cap the number of returned identifiers:
const first5 = await index.queryItems({ prefix: "common.Message", limit: 5 });
console.log(first5.length);
5
Cap by token budget
When the downstream consumer has a context window to respect, use
max_tokens to stop accumulating results once the total
token count exceeds the budget. Every identifier must carry a
tokens field -- the filter throws if one is missing:
const budgeted = await index.queryItems({
prefix: "common.Message",
max_tokens: 200,
});
const totalTokens = budgeted.reduce((sum, id) => sum + id.tokens, 0);
console.log(`${budgeted.length} items, ${totalTokens} tokens`);
4 items, 187 tokens
The filter walks items in index order, adding each identifier's token count until the next item would exceed the budget. It does not optimize for the maximum number of items -- it preserves insertion order.
Combine filters
All three filters compose. The index applies them in order: prefix first, then limit, then token budget:
const results = await index.queryItems({
prefix: "common.Message",
limit: 10,
max_tokens: 500,
});
This returns at most 10 common.Message identifiers,
stopping earlier if the cumulative token count reaches 500.
Check existence and retrieve by ID
Use has to check whether an item exists without loading
its content, and get to retrieve identifiers by their
IDs:
const exists = await index.has("common.Message.a1b2c3");
console.log(exists); // true
const found = await index.get(["common.Message.a1b2c3", "common.Message.d4e5f6"]);
console.log(found.length); // 2
Missing IDs are silently skipped -- the result array may be shorter than the input.
Use buffered writes for high volume
When adding many items in a tight loop, the default
IndexBase writes one JSON line per
add call. BufferedIndex batches writes and
flushes periodically or when the buffer fills:
import { BufferedIndex } from "@forwardimpact/libindex";
import { createStorage } from "@forwardimpact/libstorage";
const storage = createStorage("bulk-index");
const index = new BufferedIndex(storage, "index.jsonl", {
flush_interval: 5000, // flush every 5 seconds
max_buffer_size: 1000, // or when 1000 items accumulate
});
for (const item of largeDataset) {
await index.add(item); // buffered, not written yet
}
await index.shutdown(); // flush remaining items and clear timer
Items are queryable immediately after add -- they enter
the in-memory map at once -- but the storage write is deferred until
the next flush. Always call shutdown() before the
process exits to avoid losing buffered data.
Both IndexBase and BufferedIndex defer
loading until the first read. If the storage file does not exist,
the index initializes empty rather than throwing.
Related
- Ground Agents in Context -- the end-to-end workflow for building and querying a context pipeline.
- Query a Graph -- when the question is about relationships between entities, not flat lookups.
- Search Semantically -- when you need ranked similarity rather than prefix-based filtering.
-
@forwardimpact/libindexon npm -- installation and changelog.