# Ground Agents in Typed, Retrievable Knowledge You need agents that can answer questions about relationships between entities, look up context by identifier, and find semantically related content. Right now, the knowledge is either trapped in untyped files or requires an external search engine to retrieve. Four libraries -- `@forwardimpact/libresource`, `@forwardimpact/libgraph`, `@forwardimpact/libindex`, and `@forwardimpact/libvector` -- give you a self-contained knowledge infrastructure that runs locally without external databases. The pipeline flows in three stages: ingest HTML into typed resources, extract RDF triples into a graph, and generate vector embeddings for semantic retrieval. Each stage produces a JSONL-backed index that agents can query directly. ## Prerequisites - Node.js 18+ - An embedding endpoint (any OpenAI-compatible `/v1/embeddings` API) for vector indexing - HTML files with [schema.org](https://schema.org/) microdata markup in a `data/knowledge/` directory Install all four libraries: ```sh npm install @forwardimpact/libresource @forwardimpact/libgraph @forwardimpact/libindex @forwardimpact/libvector ``` ## How the pipeline fits together Each library owns one stage. The output of one stage feeds the next: ```text data/knowledge/*.html | v libresource --> data/resources/*.json (typed resources) | +-------+ | | v v libgraph libvector | | v v data/graphs/ data/vectors/ index.jsonl index.jsonl ontology.ttl ``` `libindex` provides the `IndexBase` class that both `GraphIndex` and `VectorIndex` extend. It handles JSONL persistence, lazy loading, prefix filtering, and token budgeting so the specialized indexes inherit that behavior without reimplementing it. ## 1. Prepare the knowledge directory Create `data/knowledge/` and add HTML files with schema.org microdata. The resource processor extracts typed entities from `itemscope` / `itemtype` / `itemprop` attributes: ```html

Alice Chen Senior Engineer

Acme Corp

``` The `` element sets the IRI for all relative references in the document. Without it, the processor falls back to the `--base` flag or a default URI. ## 2. Ingest HTML into typed resources Run the resource processor to parse every HTML file in `data/knowledge/` and store each entity as a typed `Message` resource: ```sh npx fit-process-resources --base=https://example.com/ ``` The processor: 1. Finds all `.html` files in `data/knowledge/` 2. Sanitizes the DOM (normalizes whitespace, encodes stray characters) 3. Extracts RDF quads from microdata using the streaming parser 4. Skolemizes blank nodes into content-hashed URIs for cross-document deduplication 5. Serializes each entity's triples as Turtle RDF 6. Stores the result in `data/resources/` as a JSON file with a content-hashed identifier When the same entity appears in multiple HTML files, the processor merges triples using RDF union semantics -- new properties are added, existing identical triples are deduplicated. After processing, verify the resources exist: ```sh ls data/resources/ ``` ```text common.Message.a1b2c3d4.json common.Message.e5f6g7h8.json ``` Each file contains the entity's typed identifier, its role (`system`), and the RDF content as a Turtle string. ## 3. Build the RDF graph With resources in place, extract their RDF content into a graph index and generate the ontology: ```sh npx fit-process-graphs ``` The graph processor: 1. Reads all resource identifiers from `data/resources/` 2. Filters to `common.Message` resources (which contain RDF content) 3. Parses each resource's Turtle content back into quads 4. Adds quads to the in-memory N3 triple store, keyed by resource identifier 5. Writes the graph index to `data/graphs/index.jsonl` 6. Builds a SHACL ontology from all observed types and predicates 7. Writes the ontology to `data/graphs/ontology.ttl` The ontology file describes the shape of the data -- which types exist, what properties each type has, and how types relate to each other. Agents read this file to understand what questions the graph can answer before writing queries. Verify the graph was built: ```sh npx fit-subjects ``` ```text https://example.com/team#alice https://schema.org/Person https://example.com/org/acme https://schema.org/Organization ``` Each line shows a subject URI and its type. Run a triple-pattern query to test a relationship: ```sh npx fit-query "?" schema:worksFor "?" ``` ```text common.Message.a1b2c3d4 ``` The output is the resource identifier containing the matching triple. The query uses the `subject predicate object` pattern where `?` is a wildcard. Prefixed names like `schema:worksFor` expand using the standard prefix map (`schema:` -> `https://schema.org/`). ## 4. Generate vector embeddings The vector processor takes each resource's text content, sends it to an embedding endpoint, and stores the resulting vectors: ```sh npx fit-process-vectors ``` This requires an OpenAI-compatible embedding endpoint. Configure the endpoint and token through environment variables or `config/vectors.yaml`: ```yaml # config/vectors.yaml embeddingBaseUrl: http://localhost:8080 ``` The processor: 1. Reads all resource identifiers from `data/resources/` 2. Filters out conversations and tool functions 3. Batches resource content for efficient embedding API calls 4. Stores each vector alongside its resource identifier in `data/vectors/index.jsonl` After processing, test a semantic search: ```sh npx fit-search "senior engineering role" ``` ```text common.Message.a1b2c3d4 0.8712 common.Message.e5f6g7h8 0.6543 ``` Results are ranked by dot-product score (cosine similarity for normalized vectors). Higher scores indicate closer semantic matches. ## 5. Query from code The CLIs are thin wrappers around the library APIs. For programmatic access, use the libraries directly: ```js import { createGraphIndex, parseGraphQuery } from "@forwardimpact/libgraph"; import { createResourceIndex } from "@forwardimpact/libresource"; // Query the graph for all Person entities const graph = createGraphIndex("graphs"); const pattern = parseGraphQuery("? rdf:type schema:Person"); const identifiers = await graph.queryItems(pattern); // Resolve matched identifiers to full resources const resources = createResourceIndex("resources"); const items = await resources.get(identifiers.map(String)); for (const item of items) { console.log(item.id.type, item.id.name); console.log(item.content); // Turtle RDF string } ``` The `createGraphIndex("graphs")` call reads from `data/graphs/`; the `createResourceIndex("resources")` call reads from `data/resources/`. Both use the `data//` convention. Pass a different prefix to point at a different directory. For vector search from code: ```js import { VectorIndex } from "@forwardimpact/libvector/index/vector.js"; import { createStorage } from "@forwardimpact/libstorage"; const storage = createStorage("vectors"); const vectorIndex = new VectorIndex(storage); // Assume you have a query vector from your embedding API const queryVector = [0.12, -0.34, 0.56, /* ... */]; const results = await vectorIndex.queryItems([queryVector], { limit: 5, threshold: 0.5, }); for (const id of results) { console.log(String(id), id.score?.toFixed(4)); } ``` Both `queryItems` methods accept a filter object with `prefix`, `limit`, and `max_tokens` to scope results by identifier prefix, cap the count, or stay within a token budget. ## Verify After running all three stages, confirm the full pipeline produced the expected artifacts: ```sh ls data/resources/ # Typed resource JSON files ls data/graphs/ # index.jsonl + ontology.ttl ls data/vectors/ # index.jsonl with embeddings npx fit-subjects # All subjects and types npx fit-query "?" rdf:type schema:Person # Graph query npx fit-search "team member" # Semantic search ``` Each command should return results drawn from the HTML files you ingested. If a command returns nothing, check that the previous stage completed: resources must exist before graphs, and resources must exist before vectors. ## What's next Each query mode has a dedicated guide for deeper work: - [Query the Graph](/docs/libraries/ground-agents/query-graph/) -- write triple-pattern queries, filter by type, and traverse relationships in the RDF graph. - [Look Up Context](/docs/libraries/ground-agents/lookup-context/) -- retrieve resources by identifier, apply prefix filters, and manage token budgets with the index API. - [Resolve a Resource](/docs/libraries/ground-agents/resolve-resource/) -- load a typed resource by identifier with access control and inspect its content and metadata. - [Search Semantically](/docs/libraries/ground-agents/search-semantically/) -- embed a query, score against the vector index, and rank results by relevance.