Start a crawl
POST /api/v1/crawl
Start an asynchronous crawl of an entire site. This returns a jobId immediately (HTTP 202); poll the summary endpoint until the job is done, then pull per-page data and duplicate reports. The crawl stays on the start URL's host.
Parameters
| Name | Type | Description |
|---|---|---|
url required | string | The site to crawl (query string or body). The crawl stays on this host. |
maxPages optional | integer | Max pages to crawl. Default 50, max 200. |
Request
Response
202 · application/json
- {
- "jobId": "crw_828120c927867e4244a921a0",
- "status": "queued",
- "startUrl": "https://example.com",
- "maxPages": 50,
- "links": {
- "summary": "https://www.ranknibbler.com/api/v1/crawl/crw_828120c927867e4244a921a0/summary"
- }
- }
Response fields
| Field | Description |
|---|---|
jobId | Opaque id; pass it to the crawl result endpoints. |
status | queued, then running, then done or error. |
startUrl | The resolved URL the crawl started from. |
maxPages | The effective page budget for this crawl. |
links.summary | A ready-made URL to poll for progress and results. |
Polling the crawl
Poll /api/v1/crawl/{jobId}/summary until status is done, then read the result endpoints:
- Crawl summary —
/api/v1/crawl/{jobId}/summary, site-wide metrics and progress. - Crawl pages —
/api/v1/crawl/{jobId}/pages, every crawled URL with checks. - Duplicate tags (
/duplicate-tags) and Duplicate content (/duplicate-content).