Skip to main content

Configuration

Every subcommand resolves the same config file. Pass --config <path> to load a specific file (it must exist). Without --config, Rover loads the default config: the file named by ROVER_CONFIG if set, otherwise the platform config file (~/.config/rover/rover.toml on Linux/macOS) or a project-local ./rover.toml, whichever exists first. If none exists, built-in defaults apply.

This is uniform across fetch, mcp, cache, task, batch, doctor, and config show / set — so a file written by rover config set is read back by rover fetch and rover mcp without --config. rover config show prints the effective settings; rover config set <dotted.key> <value> changes one. See the CLI reference.

Every section and key is optional, and the defaults below apply when a key is absent. Unknown keys are rejected at load time (deny_unknown_fields), so a typo fails loudly instead of being silently ignored. Durations parse via humantime: "1h", "5m", "7d", "500ms".

[fetch]

KeyTypeDefaultDescription
user_agentstringRover/<version> (+https://github.com/aaronbassett/rover)UA header on every outbound HTTP request.
timeout_secsinteger15Per-request timeout in seconds. Must be > 0.

[ssrf]

KeyTypeDefaultDescription
levelenum"strict"One of strict, loopback, project, lan, none.
project_rootpath"."Used when level = "project". The descendant root for file:// URLs after std::fs::canonicalize resolves symlinks.

Level semantics

Each level is a superset of the one above it.

LevelAllows
strictPublic IPs only; http / https only.
loopbackStrict + 127.0.0.0/8 + ::1.
projectLoopback + file:// URLs descendant of project_root after symlink resolution.
lanProject + RFC1918 + IPv6 ULAs (fc00::/7).
noneTrust the user. The always-floor (link-local, multicast, broadcast, 0.0.0.0, ::) is still blocked.

Unknown level strings are rejected the first time the SSRF policy is consulted (typed SsrfError::UnknownLevel). The always-floor blocks the dangerous ranges at every level, none included. See Security & threat model.

[prompt_injection]

These keys tune the prompt-injection detectors for the content-returning MCP tools: fetch, summarize, get_metadata, and transitively batch_fetch. They act on output only. They do not control the nonce wrapper, which is always on and can't be turned off. See Trust & prompt injection for the model behind the layers and MCP tools for the wire contract.

[prompt_injection]
level = "moderate" # strict | high | moderate | low | disabled (output side)
model = "disabled" # disabled | deberta-base | deberta-small | prompt-guard-2-86m | prompt-guard-2-22m | <hf-id>
model_threshold = 0.9 # model malicious-score threshold

[prompt_injection.allowlist] # URL globs; a matching URL skips that method on OUTPUT
wrap = [] # e.g. ["https://*.internal.example.com/*"] ("*" disables entirely)
patterns = []
model = []

[prompt_injection.agent_overrides] # grant the agent per-call `security` control (default: all deny)
wrap = false
patterns = false
model = false
level = false
KeyTypeDefaultDescription
levelenum"moderate"Output action on a detector hit. One of strict, high, moderate, low, disabled.
modelstring"disabled"Model detector. One of disabled, the presets deberta-base, deberta-small, prompt-guard-2-86m, prompt-guard-2-22m, or a custom HuggingFace owner/repo id.
model_thresholdfloat0.9Malicious-score threshold above which a model window is flagged.

[prompt_injection.allowlist] holds per-method URL globs (wrap, patterns, model). A URL matching a method's list skips that method on output for that URL, and a bare "*" disables the method entirely.

[prompt_injection.agent_overrides] grants the agent per-call control through the MCP security arg. Each of wrap, patterns, model, level defaults to false. A security field is honored only when its grant here is true; otherwise it is ignored and recorded in the response's prompt_injection.overrides_attempted.

Both the level and model strings are validated at first use (guard::GuardConfig::from_config), which surfaces a typed error rather than a serde error. The TOML parser accepts unknown values; the guard rejects them when it's built.

The model detector needs Rover compiled with the injection-model feature, and it downloads an ONNX DeBERTa classifier from HuggingFace on first use. Verify the build with rover doctor. Without the feature, a configured model is ignored with a warning, but the structural wrapper and pattern detector are always compiled in, so coverage never drops to zero. Internal-inference hardening, which cleans the content Rover feeds to its own summarizer and caption models, is always on and can't be turned off. See Optional features for the feature-flag matrix.

[cache]

KeyTypeDefaultDescription
default_ttlduration"15m"TTL used when an upstream response has no Cache-Control: max-age. Kept short so cache-poisoned or stale content has a small blast radius; origins can opt into longer caching via response headers.
min_ttlduration"5m"Floor for any TTL derived from an upstream header. Must be <= default_ttl.
max_ttlduration"7d"Ceiling for any TTL. Must be >= default_ttl.
stale_while_revalidate_windowduration"5m"How long after expires_at an entry stays eligible for the stale-while-revalidate fast-path. Inside the window, fetch may return the stale row immediately and queue a background revalidate task. Beyond it, the row is treated as a cache miss and refetched synchronously, so callers never receive arbitrarily old content.
override_no_storeboolfalseWhen true, cache responses even if they sent Cache-Control: no-store.
override_no_store_domainsarray<string>[]Per-domain allowlist for override_no_store. Lowercased on load.
store_raw_htmlboolfalseWhen true, store the zstd-compressed raw HTML alongside the extracted Markdown. Enables the raw_html field in count_tokens mode=estimates.

Stale-while-revalidate behaviour

What happens when a cache entry's expires_at has passed depends on how far past it you are.

Within stale_while_revalidate_window, the MCP server returns the stale row immediately and enqueues a background revalidate task. The agent sees cache_status: "stale" and a revalidation block it can monitor.

Beyond stale_while_revalidate_window, the row is treated as a miss. fetch refetches synchronously and writes through the cache.

The CLI (rover fetch) always revalidates synchronously, regardless of the window. A one-shot CLI process has no in-process scheduler to drain the background task queue, so it can't rely on SWR at all. Set default_ttl and max_ttl to how fresh you need cached content; set stale_while_revalidate_window to bound how stale the SWR fast-path may serve. See Caching & freshness for how these interact with upstream Cache-Control.

[tokenizer]

KeyTypeDefaultDescription
defaultenum"o200k"Default tokenizer family for token counting. One of o200k (GPT-4o, the default), cl100k (GPT-4), claude, llama3, qwen3. Pick the family that matches the model you actually pay for. Token counts differ between families, so the wrong tokenizer estimates the wrong budget.

[mcp]

KeyTypeDefaultDescription
heartbeat_intervalduration"5s"Per-task heartbeat write cadence. Must be > 0.
reap_thresholdduration"60s"If a task hasn't heartbeat within this window, its owning process is considered dead and the task is marked failed. Must be > 0.

[output]

KeyTypeDefaultDescription
dirpathunsetWhere extracted assets (downloaded images, table CSVs) are written. When unset, falls back to ROVER_OUTPUT_DIR if set, otherwise ${data_local_dir}/rover/output.

[rate_limit]

All HTTP-bound code paths share a single Pacer built from this block at startup.

KeyTypeDefaultConstraintsDescription
requests_per_minute_per_domaininteger601..=6000Per-host RPM budget.
per_domain_concurrencyinteger2>= 1Max simultaneous in-flight requests per host.
global_concurrencyinteger8>= 1Max simultaneous in-flight requests across all hosts.
max_retriesinteger3<= 10Retries on transient failures (network, 5xx, 429).
initial_backoffduration"500ms"<= max_backoffFirst backoff after a transient failure.
max_backoffduration"30s"Backoff ceiling.
retry_after_ceilingduration"5m"> 0Maximum Retry-After value Rover will respect inline.
jitter_seedinteger or unsetunsetDeterministic seed for backoff jitter. Set it in tests for reproducible timing; entropy otherwise.
deferred_retry_threshold_secsinteger30A server-provided Retry-After above this threshold turns a synchronous fetch into a deferred retry task instead of sleeping in-line.

[robots]

Rover is an agent's browser, not a spider — it fetches the page that was explicitly requested, one at a time — so the robots.txt gate is off by default. robots.txt governs automated crawling; set respect = true to opt into enforcement.

KeyTypeDefaultDescription
respectboolfalseWhen true, every fetch is gated on the host's robots.txt. Off by default (Rover is an agent browser, not a crawler).
ignore_domainsarray<string>[]Hosts for which robots.txt is not fetched and rules are not enforced. Lowercased on load.
default_ttlduration"24h"TTL used when the robots.txt response has no Cache-Control: max-age.
failure_ttlduration"5m"TTL used when the robots.txt fetch fails (5xx, transport error). Fail-closed. Must be <= default_ttl.

[summarization]

KeyTypeDefaultDescription
default_backendstring"default"Name of a [backends.<name>] block to use when no backend arg is supplied. Must exist in [backends] (or be the implicit default extractive).
default_modestring"abstractive"One of extractive, abstractive, headlines.
default_stylestring"prose"One of bullet, prose, executive.
fallback_to_extractivebooltrueWhen a cloud backend fails (auth, rate-limit, model error, invalid request), retry the request through an extractive backend. Requires at least one extractive backend to exist.

[summarization.tables]

Per-table defaults consumed by the tables: {mode: "summarize"} hook in the MCP fetch tool.

KeyTypeDefaultDescription
target_tokensinteger150Target token count for each generated table summary.
focusstring"Describe what this table shows. Highlight any extreme values or notable rows."Focus prompt passed to the summarizer for every table.

[image_captions]

Defaults for caption generation when images.mode = "caption" is set in the MCP fetch tool. See Images & captioning.

KeyTypeDefaultDescription
defaultstringunsetName of a configured [captioners.<name>] block to use when no captioner override is supplied in the MCP tool call. When unset and exactly one captioner is configured, that one is used, so a single-captioner setup needs no default.
max_tokensinteger50Maximum token count per caption. Captions are descriptions, not summaries; a tight cap keeps them from eating the budget the page itself needs.
max_per_pageinteger10Maximum number of images to caption per page. Captions are generated for the first N images; the rest are dropped.
min_widthinteger200Skip images narrower than this (pixels). Filters out spacers, icons, and tracking pixels before they cost a caption.
min_heightinteger200Skip images shorter than this (pixels).
max_bytesinteger10485760Skip images larger than this (bytes; default 10 MiB). Accepts raw bytes or a humansize string ("10MiB").
max_concurrentinteger2Number of concurrent caption-generation tasks.

Example:

[image_captions]
default = "cloud"
max_tokens = 80
max_per_page = 5
min_width = 64
min_height = 64
max_bytes = "1MiB"
max_concurrent = 4

[captioners.<name>]

A free-form section: repeat it for each named captioner. It mirrors the [backends.<name>] structure.

KeyTypeRequiredDescription
kindstringyescloud. (The former local kind, a native mistralrs vision backend, was removed; for local inference use provider = "openai_compat" pointed at a local server.)
providerstringyesOne of openai, anthropic, gemini, xai, groq, deepseek, together, fireworks, openai_compat.
modelstringyesModel id: a cloud model (e.g. gpt-4o-mini) or the model your openai_compat server hosts.
base_urlstringyes for openai_compat; unused otherwiseCustom endpoint. For openai_compat, auto-normalized to end in /v1/.
api_key_envstringnoEnv var holding the API key. When unset for cloud providers, the genai library falls back to its provider-default env var. Keyless local servers can omit it.

Example:

[captioners.cloud]
kind = "cloud"
provider = "openai"
model = "gpt-4-vision"
api_key_env = "OPENAI_API_KEY"

# Local inference via an OpenAI-compatible server (ollama, LM Studio, vLLM, ...):
[captioners.ollama]
kind = "cloud"
provider = "openai_compat"
model = "llama3.2-vision"
base_url = "http://localhost:11434/v1"

[headless]

The headless browser renderer needs Rover compiled with the headless feature. Without it, these keys are inert and a requested headless render falls back to the static fetch. See JavaScript & dynamic pages for when a page actually needs a real browser.

KeyTypeDefaultDescription
auto_detect_spabooltrueWhen true and MCP fetch does not specify headless.mode, auto-detect single-page apps via heuristics and render via headless.
launch_delay_secsinteger2In Auto mode, seconds to wait after detecting a render is needed (an unrendered SPA, or a bot-protection challenge on the HTTP fetch) and before launching the browser — a breather between the lightweight fetch and the heavier browser hit. 0 disables it. Not applied in On mode, which has no detection step.
default_waitstring"domcontentloaded"When to consider a render done. domcontentloaded returns as soon as the initial HTML is parsed. networkidle0 additionally waits until the network fully settles (zero requests in flight for a continuous 500 ms, bounded by timeout_secs). Slower, but the right choice for SPAs that fetch their content over XHR after load; a single pending XHR still blocks completion.
timeoutduration"30s"Per-render timeout.
max_concurrentinteger4Number of concurrent headless render tasks.
chrome_executablestringunsetPath to the Chrome/Chromium executable. When unset, attempts auto-detection (searches PATH, common install locations).

The MCP fetch tool accepts a typed headless argument (see MCP tools) that overrides auto_detect_spa, default_wait, and timeout per call.

[backends.<name>]

A free-form section: repeat it for each named backend. See Summarisation backends for the full reference and worked examples, and Summarising pages for how backends, modes, and styles combine at call time.

KeyTypeRequiredDescription
kindstringyesextractive, cloud, or local. The local kind runs an in-process model and needs Rover compiled with the local-inference feature.
providerstringyes (cloud)One of openai, anthropic, gemini, xai, groq, deepseek, together, fireworks, openai_compat. Ignored for kind = "local".
modelstringyesFor cloud, a literal model id (e.g. gpt-4o-mini). For local, a HuggingFace repo id (requires the local-inference feature).
base_urlstringyes for openai_compat; unused otherwiseCustom endpoint. For openai_compat, auto-normalized to end in /v1/.
api_key_envstringnoEnv var holding the API key. When unset for cloud providers, the genai library falls back to its provider-default env var (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.). Unused for kind = "local".

When the [backends] map is empty, Rover installs an implicit default extractive backend, so a fresh install works offline without any configuration. Add any explicit [backends.*] block and that implicit injection is disabled.

[debug]

KeyTypeDefaultDescription
har_pathstring"" (off)Path to write a HAR-1.2 file capturing every request/response. Empty disables HAR recording.
har_body_capinteger or humansize65536 (64 KiB)Maximum response body bytes captured per entry. Accepts raw bytes (65536) or humansize strings ("64KiB", "1MiB", "1GiB").
log_levelstring"info"Default tracing filter. Overridden by RUST_LOG when set.

Environment overrides

Env varOverridesNotes
ROVER_CONFIGconfig file pathOverrides the default config location for every subcommand when no --config is passed.
ROVER_DATA_DIRdata dir (cache db, downloads)
ROVER_OUTPUT_DIRoutput.dir
ROVER_LOG_LEVELdebug.log_level
RUST_LOGtracing filterTakes precedence over debug.log_level.

rover config show annotates every leaf with its effective source: defaults, file, or env. Only the 25 leaves listed by provenance::known_leaves() are tracked. They cover every section above except the dynamic [backends.*] map and the rate-limit timing knobs (initial_backoff, max_backoff, retry_after_ceiling, jitter_seed, deferred_retry_threshold_secs, max_retries).

Worked example

[fetch]
timeout_secs = 30

[ssrf]
level = "project"
project_root = "/Users/me/code"

[prompt_injection]
level = "moderate"
model = "disabled" # set to e.g. "deberta-small" with --features injection-model

[cache]
default_ttl = "6h"
store_raw_html = true

[rate_limit]
requests_per_minute_per_domain = 30
per_domain_concurrency = 2

[robots]
respect = true
ignore_domains = ["staging.internal"]

[summarization]
default_backend = "fast"
default_mode = "abstractive"

[summarization.tables]
target_tokens = 200

[backends.fast]
kind = "cloud"
provider = "openai"
model = "gpt-4o-mini"
api_key_env = "OPENAI_API_KEY"

[backends.local]
kind = "cloud"
provider = "openai_compat"
base_url = "http://localhost:1234" # auto-normalized to /v1/
model = "qwen2.5-0.5b-instruct"

[backends.default]
kind = "extractive"

[debug]
har_path = "/tmp/rover.har"
har_body_cap = "256KiB"
log_level = "info"