Start a crawl

POST /api/v1/crawl

Start an asynchronous crawl of an entire site. This returns a jobId immediately (HTTP 202); poll the summary endpoint until the job is done, then pull per-page data and duplicate reports. The crawl stays on the start URL's host.

Parameters

Name	Type	Description
`url` required	string	The site to crawl (query string or body). The crawl stays on this host.
`maxPages` optional	integer	Max pages to crawl. Default 50, max 200.

Request

Response

202 · application/json

{
"jobId": "crw_828120c927867e4244a921a0",
"status": "queued",
"startUrl": "https://example.com",
"maxPages": 50,
"links": {
"summary": "https://www.ranknibbler.com/api/v1/crawl/crw_828120c927867e4244a921a0/summary"
}
}

Response fields

Field	Description
`jobId`	Opaque id; pass it to the crawl result endpoints.
`status`	`queued`, then `running`, then `done` or `error`.
`startUrl`	The resolved URL the crawl started from.
`maxPages`	The effective page budget for this crawl.
`links.summary`	A ready-made URL to poll for progress and results.

Polling the crawl

Poll /api/v1/crawl/{jobId}/summary until status is done, then read the result endpoints:

Crawl summary — /api/v1/crawl/{jobId}/summary, site-wide metrics and progress.
Crawl pages — /api/v1/crawl/{jobId}/pages, every crawled URL with checks.
Duplicate tags (/duplicate-tags) and Duplicate content (/duplicate-content).