Content
GET /api/v1/content
Strip the boilerplate and return a page's real content — headings, main body text, word and character counts, six readability scores and the content-to-HTML ratio — as structured JSON or clean Markdown, ready for analysis or to feed an LLM.
Query parameters
| Name | Type | Description |
|---|---|---|
url required | string | The page to parse. |
format optional | string | json (default) or markdown. |
Request
Response
200 · application/json
- {
- "url": "https://example.com",
- "title": "Example Domain",
- "headings": [ { "level": 1, "text": "Example Domain" } ],
- "text": "This domain is for use in illustrative examples...",
- "metrics": {
- "wordCount": 412,
- "characterCount": 2180,
- "sentenceCount": 24,
- "paragraphCount": 9,
- "avgWordsPerSentence": 17.2,
- "contentToHtmlRatio": 18.4
- },
- "readability": {
- "fleschKincaidGrade": 9.8,
- "fleschReadingEase": 54.2,
- "automatedReadabilityIndex": 10.1,
- "colemanLiauIndex": 11.2,
- "gunningFog": 12.4,
- "smogIndex": 11.6,
- "fleschKincaid": 9.8
- },
- "consistency": {
- "titleToContent": 85.7,
- "descriptionToContent": 72.0
- },
- "flags": { "isLoremIpsum": false },
- "wordCount": 412,
- "markdown": null
- }
Response fields
| Field | Type | Description |
|---|---|---|
headings[] | array | Heading level (1–6) and text, in document order. |
text | string | Main body text with boilerplate removed. |
metrics.wordCount | integer | Words in the main body text. |
metrics.characterCount | integer | Non-whitespace characters. |
metrics.sentenceCount / metrics.paragraphCount | integer | Sentence and paragraph counts of the body text. |
metrics.avgWordsPerSentence | number | Average words per sentence. |
metrics.contentToHtmlRatio | number | % of the raw HTML that is visible text (the plain-text-to-HTML ratio); low values can indicate thin or bloated pages. |
readability | object | Six standard indices: fleschKincaidGrade, fleschReadingEase, automatedReadabilityIndex, colemanLiauIndex, gunningFog, smogIndex. (fleschKincaid is kept as an alias of the grade.) |
consistency.titleToContent / consistency.descriptionToContent | number (0–100) | How many of the meaningful words in the title / meta description actually appear in the body copy. Higher = the tag reflects the content (null if the tag is absent). |
flags.isLoremIpsum | boolean | true if placeholder text was detected. |
markdown | string | Markdown rendering when format=markdown. |
See Errors for status codes.