Quickstart

bootstrap

Bulk-ingest your existing SQL, dbt models, CLAUDE.md notes, a data dictionary or codebook, and semantic-layer artifacts so your agent is grounded on day one, not a blank store.

bootstrap solves the cold-start problem. A fresh ClariLayer context store is empty, so your agent has nothing to recall on its first session. Rather than typing definitions in by hand, you point your agent at the work you already have — your SQL files, your dbt models, an existing CLAUDE.md, a data dictionary, or a semantic-layer artifact — and it ingests them in one batch. Day-1 value, not a cold empty store.

bootstrap feeds on your files, never your data. Your agent reads each artifact locally — your SQL, your dbt models, your CLAUDE.md, a data dictionary, a semantic-layer artifact — and passes what it reads to the bootstrap tool (raw text for some kinds, structured rows or models for others). ClariLayer never connects to or reads your warehouse, holds no credentials, and runs no server-side SQL, so you stay in control of exactly what is shared.

What it ingests

bootstrap accepts five source kinds today. Three are structured (sql, dictionary, and semantic_model); the other two are stored mostly as raw notes:

sql — a SELECT query. This is the only source kind structured by server-side parsing of raw text: ClariLayer validates the SQL and deterministically extracts a structured shape (tables, joins, group-bys, time grain) so the resulting entry is genuinely queryable context, not just stored text. (dictionary and semantic_model are also structured, but via a different mechanism — see below.)
dbt — a dbt model file. The content is imported and stored as a schema note (raw content plus light metadata). It is not parsed into structure the way SQL is.
claude_md — a CLAUDE.md or freeform notes. Imported and stored as a note so your agent can recall it later.
dictionary — a column / variable dictionary (a codebook, data dictionary, CSV header + labels, df.dtypes, a Looker view, or a SAS/SPSS/Stata export). This is another structured source kind, but it is structured differently from sql: there is no server-side dictionary parser. Instead your agent maps whatever artifact it has into a structured rows payload, and ClariLayer fans those agent-supplied rows out to one schema note per variable. This is the cold-start path when your context lives in a codebook rather than a SQL repo. See the dictionary section below.
semantic_model — a semantic-layer artifact (a Databricks Metric View, dbt semantic models, a Snowflake semantic view, and so on). Like dictionary, there is no server-side vendor parser: your agent normalizes the artifact into structured models and ClariLayer fans those out to one metric definition per model, each carrying a structured metric contract. This is the path when your definitions already live in a semantic layer. See Use ClariLayer with Databricks for a worked example.

`dictionary` — a codebook or data dictionary

If your data context lives in a codebook or data dictionary (common for survey, research, or spreadsheet-first work that has no SQL repo behind it), point your agent at it. Unlike the other kinds, you do not paste raw file text: your agent maps the artifact into structured rows — each row is one column / variable — and ClariLayer expands it into one schema note per row. There is no per-format parser; the agent does the mapping, so any codebook shape works (a CSV header plus labels, df.dtypes, a dbt schema.yml's column docs, a SAS/SPSS/Stata dictionary export).

Each row carries a required variable and optional label, type, value_labels (a coded-value → label map), and notes. An optional dataset on the source names the table. A small worked example — a customer-survey codebook:

{
  "kind": "dictionary",
  "name": "Customer survey 2026",
  "dataset": "survey_2026",
  "rows": [
    {
      "variable": "respondent_id",
      "label": "Stable per-respondent identifier (hashed owner email)",
      "type": "string"
    },
    {
      "variable": "nps_bucket",
      "label": "Net Promoter Score bucket from the q3 survey",
      "type": "categorical",
      "value_labels": { "0": "Detractor (0-6)", "1": "Passive (7-8)", "2": "Promoter (9-10)" }
    },
    {
      "variable": "csat_score",
      "label": "Customer satisfaction rating, 1-5 Likert",
      "type": "integer",
      "notes": "null when the respondent skipped the item"
    }
  ]
}

Each row becomes one recall-addressable schema note named {dataset}.{variable} — so the example above lands survey_2026.respondent_id, survey_2026.nps_bucket, and survey_2026.csat_score. When dataset is omitted, the source name is used as the prefix instead. After this, asking your agent "what does survey_2026.nps_bucket mean?" recalls the variable with its label and decoded value labels. The same per-call bounds apply (each row is size-checked, and a very large dictionary is split across multiple calls).

How to run it

Ask your agent to bootstrap from a directory or a set of files. A natural prompt:

Bootstrap my ClariLayer context from ./analytics/sql and my dbt models in ./models.

Or, when your context lives in a codebook instead of a SQL repo:

Bootstrap my ClariLayer context from the data dictionary in ./docs/survey_2026_codebook.csv.

Your agent reads those files and calls bootstrap with their content (mapping a codebook into the dictionary source's rows). The tool is bounded so a single call cannot flood your store: each source is capped (around 50 KB per source), and the whole call is capped (around 200 KB and 200 sources). Oversized sources are reported back, never silently truncated.

What you get back

bootstrap returns a summary of what happened: how many entries were created, updated, and skipped, a breakdown by source type, and any sources it dropped (for example, an oversized file or an unparseable SQL statement). Ingestion is deduped, so re-running bootstrap over the same artifacts updates existing entries rather than creating duplicates.

After a bootstrap, the entries it created are stored with status asserted and provenance that records where they came from: SQL uses sql_import, dbt uses dbt, dictionary rows use dictionary, semantic models use semantic_model, and CLAUDE.md / freeform notes use you. The separate agent provenance identifies facts an AI agent extracted through an agent-originated write path; it is not the provenance assigned to claude_md bootstrap entries. asserted is the honest baseline — the entry has not been contradicted, but it is not stamped as proven. To check a supported definition, run reconcile against it.

Bootstrap, then recall

Once your store has content, your agent can recall it in-flow. The typical first loop is:

bootstrap the relevant SQL, dbt, CLAUDE.md notes, dictionary rows, or semantic models so the store is not empty.
Ask a question; your agent recalls the relevant definitions and answers from them.
When your agent learns something new or you correct it, remember it so the next session benefits.
When a number looks off, reconcile a stored SQL definition with warehouse actual_sample or a HubSpot CRM contract with row-free crm_evidence.

Privacy posture

ClariLayer receives only what your agent chooses to send — the artifact text (your SQL, dbt models, CLAUDE.md), or the structured input your agent maps for the schema-first kinds (a dictionary source's rows, a semantic_model source's models); it never connects to your warehouse, holds source credentials, executes SQL server-side, or calls a CRM provider. The sql structuring is a static parse of query text, not a live data read. reconcile is a separate flow: warehouse actual_sample can include optional preview rows with real values, while HubSpot crm_evidence is row-free by contract.