Memory¶
OpenClaw memory is plain Markdown in the agent workspace. The files are the source of truth; the model only "remembers" what gets written to disk.
Memory search tools are provided by the active memory plugin (default:
memory-core). Disable memory plugins with plugins.slots.memory = "none".
Memory files (Markdown)¶
The default workspace layout uses two memory layers:
memory/YYYY-MM-DD.md- Daily log (append-only).
- Read today + yesterday at session start.
MEMORY.md(optional)- Curated long-term memory.
- Only load in the main, private session (never in group contexts).
These files live under the workspace (agents.defaults.workspace, default
~/.openclaw/workspace). See Agent workspace for the full layout.
When to write memory¶
- Decisions, preferences, and durable facts go to
MEMORY.md. - Day-to-day notes and running context go to
memory/YYYY-MM-DD.md. - If someone says "remember this," write it down (do not keep it in RAM).
- This area is still evolving. It helps to remind the model to store memories; it will know what to do.
- If you want something to stick, ask the bot to write it into memory.
Automatic memory flush (pre-compaction ping)¶
When a session is close to auto-compaction, OpenClaw triggers a silent,
agentic turn that reminds the model to write durable memory before the
context is compacted. The default prompts explicitly say the model may reply,
but usually NO_REPLY is the correct response so the user never sees this turn.
This is controlled by agents.defaults.compaction.memoryFlush:
{
agents: {
defaults: {
compaction: {
reserveTokensFloor: 20000,
memoryFlush: {
enabled: true,
softThresholdTokens: 4000,
systemPrompt: "Session nearing compaction. Store durable memories now.",
prompt: "Write any lasting notes to memory/YYYY-MM-DD.md; reply with NO_REPLY if nothing to store.",
},
},
},
},
}
Details:
- Soft threshold: flush triggers when the session token estimate crosses
contextWindow - reserveTokensFloor - softThresholdTokens. - Silent by default: prompts include
NO_REPLYso nothing is delivered. - Two prompts: a user prompt plus a system prompt append the reminder.
- One flush per compaction cycle (tracked in
sessions.json). - Workspace must be writable: if the session runs sandboxed with
workspaceAccess: "ro"or"none", the flush is skipped.
For the full compaction lifecycle, see Session management + compaction.
Vector memory search¶
OpenClaw can build a small vector index over MEMORY.md and memory/*.md so
semantic queries can find related notes even when wording differs.
Defaults:
- Enabled by default.
- Watches memory files for changes (debounced).
- Configure memory search under
agents.defaults.memorySearch(not top-levelmemorySearch). - Uses remote embeddings by default. If
memorySearch.provideris not set, OpenClaw auto-selects: localif amemorySearch.local.modelPathis configured and the file exists.openaiif an OpenAI key can be resolved.geminiif a Gemini key can be resolved.voyageif a Voyage key can be resolved.- Otherwise memory search stays disabled until configured.
- Local mode uses node-llama-cpp and may require
pnpm approve-builds. - Uses sqlite-vec (when available) to accelerate vector search inside SQLite.
Remote embeddings require an API key for the embedding provider. OpenClaw
resolves keys from auth profiles, models.providers.*.apiKey, or environment
variables. Codex OAuth only covers chat/completions and does not satisfy
embeddings for memory search. For Gemini, use GEMINI_API_KEY or
models.providers.google.apiKey. For Voyage, use VOYAGE_API_KEY or
models.providers.voyage.apiKey. When using a custom OpenAI-compatible endpoint,
set memorySearch.remote.apiKey (and optional memorySearch.remote.headers).
QMD backend (experimental)¶
Set memory.backend = "qmd" to swap the built-in SQLite indexer for
QMD: a local-first search sidecar that combines
BM25 + vectors + reranking. Markdown stays the source of truth; OpenClaw shells
out to QMD for retrieval. Key points:
Prereqs
- Disabled by default. Opt in per-config (
memory.backend = "qmd"). - Install the QMD CLI separately (
bun install -g https://github.com/tobi/qmdor grab a release) and make sure theqmdbinary is on the gateway’sPATH. - QMD needs an SQLite build that allows extensions (
brew install sqliteon macOS). - QMD runs fully locally via Bun +
node-llama-cppand auto-downloads GGUF models from HuggingFace on first use (no separate Ollama daemon required). - The gateway runs QMD in a self-contained XDG home under
~/.openclaw/agents/<agentId>/qmd/by settingXDG_CONFIG_HOMEandXDG_CACHE_HOME. - OS support: macOS and Linux work out of the box once Bun + SQLite are installed. Windows is best supported via WSL2.
How the sidecar runs
- The gateway writes a self-contained QMD home under
~/.openclaw/agents/<agentId>/qmd/(config + cache + sqlite DB). - Collections are created via
qmd collection addfrommemory.qmd.paths(plus default workspace memory files), thenqmd update+qmd embedrun on boot and on a configurable interval (memory.qmd.update.interval, default 5 m). - The gateway now initializes the QMD manager on startup, so periodic update
timers are armed even before the first
memory_searchcall. - Boot refresh now runs in the background by default so chat startup is not
blocked; set
memory.qmd.update.waitForBootSync = trueto keep the previous blocking behavior. - Searches run via
memory.qmd.searchMode(defaultqmd search --json; also supportsvsearchandquery). If the selected mode rejects flags on your QMD build, OpenClaw retries withqmd query. If QMD fails or the binary is missing, OpenClaw automatically falls back to the builtin SQLite manager so memory tools keep working. - OpenClaw does not expose QMD embed batch-size tuning today; batch behavior is controlled by QMD itself.
- First search may be slow: QMD may download local GGUF models (reranker/query
expansion) on the first
qmd queryrun. - OpenClaw sets
XDG_CONFIG_HOME/XDG_CACHE_HOMEautomatically when it runs QMD. -
If you want to pre-download models manually (and warm the same index OpenClaw uses), run a one-off query with the agent’s XDG dirs.
OpenClaw’s QMD state lives under your state dir (defaults to
~/.openclaw). You can pointqmdat the exact same index by exporting the same XDG vars OpenClaw uses:# Pick the same state dir OpenClaw uses STATE_DIR="${OPENCLAW_STATE_DIR:-$HOME/.openclaw}" export XDG_CONFIG_HOME="$STATE_DIR/agents/main/qmd/xdg-config" export XDG_CACHE_HOME="$STATE_DIR/agents/main/qmd/xdg-cache" # (Optional) force an index refresh + embeddings qmd update qmd embed # Warm up / trigger first-time model downloads qmd query "test" -c memory-root --json >/dev/null 2>&1
Config surface (memory.qmd.*)
command(defaultqmd): override the executable path.searchMode(defaultsearch): pick which QMD command backsmemory_search(search,vsearch,query).includeDefaultMemory(defaulttrue): auto-indexMEMORY.md+memory/**/*.md.paths[]: add extra directories/files (path, optionalpattern, optional stablename).sessions: opt into session JSONL indexing (enabled,retentionDays,exportDir).update: controls refresh cadence and maintenance execution: (interval,debounceMs,onBoot,waitForBootSync,embedInterval,commandTimeoutMs,updateTimeoutMs,embedTimeoutMs).limits: clamp recall payload (maxResults,maxSnippetChars,maxInjectedChars,timeoutMs).scope: same schema assession.sendPolicy. Default is DM-only (denyall,allowdirect chats); loosen it to surface QMD hits in groups/channels.match.keyPrefixmatches the normalized session key (lowercased, with any leadingagent:<id>:stripped). Example:discord:channel:.match.rawKeyPrefixmatches the raw session key (lowercased), includingagent:<id>:. Example:agent:main:discord:.- Legacy:
match.keyPrefix: "agent:..."is still treated as a raw-key prefix, but preferrawKeyPrefixfor clarity. - When
scopedenies a search, OpenClaw logs a warning with the derivedchannel/chatTypeso empty results are easier to debug. - Snippets sourced outside the workspace show up as
qmd/<collection>/<relative-path>inmemory_searchresults;memory_getunderstands that prefix and reads from the configured QMD collection root. - When
memory.qmd.sessions.enabled = true, OpenClaw exports sanitized session transcripts (User/Assistant turns) into a dedicated QMD collection under~/.openclaw/agents/<id>/qmd/sessions/, somemory_searchcan recall recent conversations without touching the builtin SQLite index. memory_searchsnippets now include aSource: <path#line>footer whenmemory.citationsisauto/on; setmemory.citations = "off"to keep the path metadata internal (the agent still receives the path formemory_get, but the snippet text omits the footer and the system prompt warns the agent not to cite it).
Example
memory: {
backend: "qmd",
citations: "auto",
qmd: {
includeDefaultMemory: true,
update: { interval: "5m", debounceMs: 15000 },
limits: { maxResults: 6, timeoutMs: 4000 },
scope: {
default: "deny",
rules: [
{ action: "allow", match: { chatType: "direct" } },
// Normalized session-key prefix (strips `agent:<id>:`).
{ action: "deny", match: { keyPrefix: "discord:channel:" } },
// Raw session-key prefix (includes `agent:<id>:`).
{ action: "deny", match: { rawKeyPrefix: "agent:main:discord:" } },
]
},
paths: [
{ name: "docs", path: "~/notes", pattern: "**/*.md" }
]
}
}
Citations & fallback
memory.citationsapplies regardless of backend (auto/on/off).- When
qmdruns, we tagstatus().backend = "qmd"so diagnostics show which engine served the results. If the QMD subprocess exits or JSON output can’t be parsed, the search manager logs a warning and returns the builtin provider (existing Markdown embeddings) until QMD recovers.
Additional memory paths¶
If you want to index Markdown files outside the default workspace layout, add explicit paths:
agents: {
defaults: {
memorySearch: {
extraPaths: ["../team-docs", "/srv/shared-notes/overview.md"]
}
}
}
Notes:
- Paths can be absolute or workspace-relative.
- Directories are scanned recursively for
.mdfiles. - Only Markdown files are indexed.
- Symlinks are ignored (files or directories).
Gemini embeddings (native)¶
Set the provider to gemini to use the Gemini embeddings API directly:
agents: {
defaults: {
memorySearch: {
provider: "gemini",
model: "gemini-embedding-001",
remote: {
apiKey: "YOUR_GEMINI_API_KEY"
}
}
}
}
Notes:
remote.baseUrlis optional (defaults to the Gemini API base URL).remote.headerslets you add extra headers if needed.- Default model:
gemini-embedding-001.
If you want to use a custom OpenAI-compatible endpoint (OpenRouter, vLLM, or a proxy),
you can use the remote configuration with the OpenAI provider:
agents: {
defaults: {
memorySearch: {
provider: "openai",
model: "text-embedding-3-small",
remote: {
baseUrl: "https://api.example.com/v1/",
apiKey: "YOUR_OPENAI_COMPAT_API_KEY",
headers: { "X-Custom-Header": "value" }
}
}
}
}
If you don't want to set an API key, use memorySearch.provider = "local" or set
memorySearch.fallback = "none".
Fallbacks:
memorySearch.fallbackcan beopenai,gemini,local, ornone.- The fallback provider is only used when the primary embedding provider fails.
Batch indexing (OpenAI + Gemini + Voyage):
- Disabled by default. Set
agents.defaults.memorySearch.remote.batch.enabled = trueto enable for large-corpus indexing (OpenAI, Gemini, and Voyage). - Default behavior waits for batch completion; tune
remote.batch.wait,remote.batch.pollIntervalMs, andremote.batch.timeoutMinutesif needed. - Set
remote.batch.concurrencyto control how many batch jobs we submit in parallel (default: 2). - Batch mode applies when
memorySearch.provider = "openai"or"gemini"and uses the corresponding API key. - Gemini batch jobs use the async embeddings batch endpoint and require Gemini Batch API availability.
Why OpenAI batch is fast + cheap:
- For large backfills, OpenAI is typically the fastest option we support because we can submit many embedding requests in a single batch job and let OpenAI process them asynchronously.
- OpenAI offers discounted pricing for Batch API workloads, so large indexing runs are usually cheaper than sending the same requests synchronously.
- See the OpenAI Batch API docs and pricing for details:
- https://platform.openai.com/docs/api-reference/batch
- https://platform.openai.com/pricing
Config example:
agents: {
defaults: {
memorySearch: {
provider: "openai",
model: "text-embedding-3-small",
fallback: "openai",
remote: {
batch: { enabled: true, concurrency: 2 }
},
sync: { watch: true }
}
}
}
Tools:
memory_search— returns snippets with file + line ranges.memory_get— read memory file content by path.
Local mode:
- Set
agents.defaults.memorySearch.provider = "local". - Provide
agents.defaults.memorySearch.local.modelPath(GGUF orhf:URI). - Optional: set
agents.defaults.memorySearch.fallback = "none"to avoid remote fallback.
How the memory tools work¶
memory_searchsemantically searches Markdown chunks (~400 token target, 80-token overlap) fromMEMORY.md+memory/**/*.md. It returns snippet text (capped ~700 chars), file path, line range, score, provider/model, and whether we fell back from local → remote embeddings. No full file payload is returned.memory_getreads a specific memory Markdown file (workspace-relative), optionally from a starting line and for N lines. Paths outsideMEMORY.md/memory/are rejected.- Both tools are enabled only when
memorySearch.enabledresolves true for the agent.
What gets indexed (and when)¶
- File type: Markdown only (
MEMORY.md,memory/**/*.md). - Index storage: per-agent SQLite at
~/.openclaw/memory/<agentId>.sqlite(configurable viaagents.defaults.memorySearch.store.path, supports{agentId}token). - Freshness: watcher on
MEMORY.md+memory/marks the index dirty (debounce 1.5s). Sync is scheduled on session start, on search, or on an interval and runs asynchronously. Session transcripts use delta thresholds to trigger background sync. - Reindex triggers: the index stores the embedding provider/model + endpoint fingerprint + chunking params. If any of those change, OpenClaw automatically resets and reindexes the entire store.
Hybrid search (BM25 + vector)¶
When enabled, OpenClaw combines:
- Vector similarity (semantic match, wording can differ)
- BM25 keyword relevance (exact tokens like IDs, env vars, code symbols)
If full-text search is unavailable on your platform, OpenClaw falls back to vector-only search.
Why hybrid?¶
Vector search is great at “this means the same thing”:
- “Mac Studio gateway host” vs “the machine running the gateway”
- “debounce file updates” vs “avoid indexing on every write”
But it can be weak at exact, high-signal tokens:
- IDs (
a828e60,b3b9895a…) - code symbols (
memorySearch.query.hybrid) - error strings (“sqlite-vec unavailable”)
BM25 (full-text) is the opposite: strong at exact tokens, weaker at paraphrases. Hybrid search is the pragmatic middle ground: use both retrieval signals so you get good results for both “natural language” queries and “needle in a haystack” queries.
How we merge results (the current design)¶
Implementation sketch:
-
Retrieve a candidate pool from both sides:
-
Vector: top
maxResults * candidateMultiplierby cosine similarity. -
BM25: top
maxResults * candidateMultiplierby FTS5 BM25 rank (lower is better). -
Convert BM25 rank into a 0..1-ish score:
-
textScore = 1 / (1 + max(0, bm25Rank)) -
Union candidates by chunk id and compute a weighted score:
-
finalScore = vectorWeight * vectorScore + textWeight * textScore
Notes:
vectorWeight+textWeightis normalized to 1.0 in config resolution, so weights behave as percentages.- If embeddings are unavailable (or the provider returns a zero-vector), we still run BM25 and return keyword matches.
- If FTS5 can’t be created, we keep vector-only search (no hard failure).
This isn’t “IR-theory perfect”, but it’s simple, fast, and tends to improve recall/precision on real notes. If we want to get fancier later, common next steps are Reciprocal Rank Fusion (RRF) or score normalization (min/max or z-score) before mixing.
Config:
agents: {
defaults: {
memorySearch: {
query: {
hybrid: {
enabled: true,
vectorWeight: 0.7,
textWeight: 0.3,
candidateMultiplier: 4
}
}
}
}
}
Embedding cache¶
OpenClaw can cache chunk embeddings in SQLite so reindexing and frequent updates (especially session transcripts) don't re-embed unchanged text.
Config:
Session memory search (experimental)¶
You can optionally index session transcripts and surface them via memory_search.
This is gated behind an experimental flag.
agents: {
defaults: {
memorySearch: {
experimental: { sessionMemory: true },
sources: ["memory", "sessions"]
}
}
}
Notes:
- Session indexing is opt-in (off by default).
- Session updates are debounced and indexed asynchronously once they cross delta thresholds (best-effort).
memory_searchnever blocks on indexing; results can be slightly stale until background sync finishes.- Results still include snippets only;
memory_getremains limited to memory files. - Session indexing is isolated per agent (only that agent’s session logs are indexed).
- Session logs live on disk (
~/.openclaw/agents/<agentId>/sessions/*.jsonl). Any process/user with filesystem access can read them, so treat disk access as the trust boundary. For stricter isolation, run agents under separate OS users or hosts.
Delta thresholds (defaults shown):
agents: {
defaults: {
memorySearch: {
sync: {
sessions: {
deltaBytes: 100000, // ~100 KB
deltaMessages: 50 // JSONL lines
}
}
}
}
}
SQLite vector acceleration (sqlite-vec)¶
When the sqlite-vec extension is available, OpenClaw stores embeddings in a
SQLite virtual table (vec0) and performs vector distance queries in the
database. This keeps search fast without loading every embedding into JS.
Configuration (optional):
agents: {
defaults: {
memorySearch: {
store: {
vector: {
enabled: true,
extensionPath: "/path/to/sqlite-vec"
}
}
}
}
}
Notes:
enableddefaults to true; when disabled, search falls back to in-process cosine similarity over stored embeddings.- If the sqlite-vec extension is missing or fails to load, OpenClaw logs the error and continues with the JS fallback (no vector table).
extensionPathoverrides the bundled sqlite-vec path (useful for custom builds or non-standard install locations).
Local embedding auto-download¶
- Default local embedding model:
hf:ggml-org/embeddinggemma-300m-qat-q8_0-GGUF/embeddinggemma-300m-qat-Q8_0.gguf(~0.6 GB). - When
memorySearch.provider = "local",node-llama-cppresolvesmodelPath; if the GGUF is missing it auto-downloads to the cache (orlocal.modelCacheDirif set), then loads it. Downloads resume on retry. - Native build requirement: run
pnpm approve-builds, picknode-llama-cpp, thenpnpm rebuild node-llama-cpp. - Fallback: if local setup fails and
memorySearch.fallback = "openai", we automatically switch to remote embeddings (openai/text-embedding-3-smallunless overridden) and record the reason.
Custom OpenAI-compatible endpoint example¶
agents: {
defaults: {
memorySearch: {
provider: "openai",
model: "text-embedding-3-small",
remote: {
baseUrl: "https://api.example.com/v1/",
apiKey: "YOUR_REMOTE_API_KEY",
headers: {
"X-Organization": "org-id",
"X-Project": "project-id"
}
}
}
}
}
Notes:
remote.*takes precedence overmodels.providers.openai.*.remote.headersmerge with OpenAI headers; remote wins on key conflicts. Omitremote.headersto use the OpenAI defaults.