Extraction

Extraction methods

Every monitor needs to know what to pull out of the response. Verid supports six extraction methods — pick the one that matches the shape of your source.

MethodBest forAnchor
CSS selectorRendered HTML pages#css
XPathComplex HTML / XML#xpath
JSONPathJSON APIs#json_path
RegexPlain text, sitemaps, raw bodies#regex
Full-page hash"Anything changed?" mode#full_page
AI / LLM promptHard-to-scrape or unstructured pages#prompt

Named fields

For css, xpath, json_path, and regex you provide a map of field name → expression. Each field is tracked independently in the diff — predicates reference fields by name, e.g. { "type": "field_changes", "field": "price" }.

"extract_config": {
  "method": "css",
  "fields": {
    "price": "[data-test=product-price]",
    "title": "h1"
  }
}

Field names can be anything (price, tag_name, incidents). Keep them short and descriptive — they show up in diffs and webhook payloads.

CSS selector

method: "css" — standard CSS selectors run against the rendered DOM.

{
  "method": "css",
  "fields": {
    "headline": "h1.article-title",
    "price": "[data-test=price]"
  }
}

Selectors match the first element. Use attribute selectors or :nth-child for precision. The page is fully rendered before selectors run when fetch_mode: "browser" is set or auto-detected.

XPath

method: "xpath" — XPath 1.0 expressions, useful when CSS can't express the structure you need (parent/ancestor traversal, text-content predicates, etc.).

{
  "method": "xpath",
  "fields": {
    "status": "//div[@id='status']/text()",
    "count":  "//table[@id='users']//tr"
  }
}

JSONPath

method: "json_path" — for JSON APIs.

{
  "method": "json_path",
  "fields": {
    "tag_name":     "$.tag_name",
    "published_at": "$.published_at"
  }
}

The response body is parsed as JSON. Use $ for the root, . for child, [*] for array iteration. If your API returns an array at the root, start with $[0].

Regex

method: "regex" — for plain text and quick counts.

{
  "method": "regex",
  "fields": {
    "url_count": "<loc>"
  }
}

Each pattern is run against the raw response body. The match count is what gets stored — predicates like field_increases_by_absolute are perfect for "more results appeared". If you need a captured value, use the first capture group: price: "\\$([0-9.]+)".

Full-page hash

method: "full_page" — no selectors. Verid hashes the entire response and triggers when the hash changes.

{ "method": "full_page" }

Use this when you don't care what changed, only that something did. Pair with any_field_changes for the simplest possible monitor. The downside: false positives from rotating ads, timestamps, or session tokens.

AI / LLM prompt

method: "prompt" — describe in plain English what to extract.

{
  "method": "prompt",
  "prompt": "Extract the product name, current price as a number, and availability status from this page.",
  "schema": {
    "name": "string",
    "price": "number",
    "available": "boolean"
  }
}

Best for unstructured pages, marketing sites that re-layout often, or anywhere selectors would be brittle. schema is optional — when provided, the model is forced to return JSON matching that shape, and the keys become field names you can reference in predicates.

LLM extractions count against your plan's monthly LLM-call quota — check /usage on the API or the Billing page in the dashboard.

Fetch mode

Independent of extraction method, every monitor has a fetch_mode:

  • auto (default) — try a static HTTP fetch first; fall back to a headless browser if the response looks empty or JS-heavy.
  • browser — always render in a stealth headless browser. Slower, but required for SPAs and pages that block bots.

Most monitors should leave this on auto. Switch to browser if your selectors return empty values despite being correct in DevTools.

Related

  • Predicates — Once you've extracted fields, choose when to fire
  • Webhooks — Receive and verify the resulting deliveries
  • Recipes — Worked examples per method