Content

GET /api/v1/content

Strip the boilerplate and return a page's real content — headings, main body text, word and character counts, six readability scores and the content-to-HTML ratio — as structured JSON or clean Markdown, ready for analysis or to feed an LLM.

Query parameters

NameTypeDescription
url requiredstringThe page to parse.
format optionalstringjson (default) or markdown.

Request

Response

200 · application/json
  1. {
  2. "url": "https://example.com",
  3. "title": "Example Domain",
  4. "headings": [ { "level": 1, "text": "Example Domain" } ],
  5. "text": "This domain is for use in illustrative examples...",
  6. "metrics": {
  7. "wordCount": 412,
  8. "characterCount": 2180,
  9. "sentenceCount": 24,
  10. "paragraphCount": 9,
  11. "avgWordsPerSentence": 17.2,
  12. "contentToHtmlRatio": 18.4
  13. },
  14. "readability": {
  15. "fleschKincaidGrade": 9.8,
  16. "fleschReadingEase": 54.2,
  17. "automatedReadabilityIndex": 10.1,
  18. "colemanLiauIndex": 11.2,
  19. "gunningFog": 12.4,
  20. "smogIndex": 11.6,
  21. "fleschKincaid": 9.8
  22. },
  23. "consistency": {
  24. "titleToContent": 85.7,
  25. "descriptionToContent": 72.0
  26. },
  27. "flags": { "isLoremIpsum": false },
  28. "wordCount": 412,
  29. "markdown": null
  30. }

Response fields

FieldTypeDescription
headings[]arrayHeading level (1–6) and text, in document order.
textstringMain body text with boilerplate removed.
metrics.wordCountintegerWords in the main body text.
metrics.characterCountintegerNon-whitespace characters.
metrics.sentenceCount / metrics.paragraphCountintegerSentence and paragraph counts of the body text.
metrics.avgWordsPerSentencenumberAverage words per sentence.
metrics.contentToHtmlRationumber% of the raw HTML that is visible text (the plain-text-to-HTML ratio); low values can indicate thin or bloated pages.
readabilityobjectSix standard indices: fleschKincaidGrade, fleschReadingEase, automatedReadabilityIndex, colemanLiauIndex, gunningFog, smogIndex. (fleschKincaid is kept as an alias of the grade.)
consistency.titleToContent / consistency.descriptionToContentnumber (0–100)How many of the meaningful words in the title / meta description actually appear in the body copy. Higher = the tag reflects the content (null if the tag is absent).
flags.isLoremIpsumbooleantrue if placeholder text was detected.
markdownstringMarkdown rendering when format=markdown.

See Errors for status codes.