Doctype & Charset Checker: Verify HTML5 and UTF-8 on Any Page

RankNibbler verifies the doctype declaration and character encoding on any URL. Catch missing doctypes that force quirks-mode rendering, legacy XHTML or HTML4 declarations, and encoding mismatches that corrupt special characters. Free, instant, no signup.

Check a page now →

What Is a Doctype?

The doctype declaration is the first line of any HTML document: <!DOCTYPE html>. It tells the browser which version of HTML the page uses and which rendering mode to apply. HTML5 uses the simple <!DOCTYPE html>; older standards (HTML 4.01, XHTML 1.0) used long, verbose declarations with DTD references.

Without a doctype, browsers fall back to "quirks mode" — a legacy rendering mode designed for 1990s websites. In quirks mode, the CSS box model works differently, layouts break unpredictably, and some modern features are disabled. The result is a page that looks fine in some browsers and broken in others. Every modern webpage should start with <!DOCTYPE html>.

Doctype Declarations Compared

Doctype	Status	Declaration
HTML5	Current — use this	`<!DOCTYPE html>`
HTML 4.01 Strict	Legacy	`<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"...>`
HTML 4.01 Transitional	Legacy	`<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"...>`
XHTML 1.0 Strict	Legacy	`<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"...>`
XHTML 1.0 Transitional	Legacy	`<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"...>`
No doctype	Quirks mode — fix immediately	(missing)

If your site uses anything other than <!DOCTYPE html>, the page will still render but may do so unpredictably across browsers. Upgrading is a one-line change with no downside.

What Is Character Encoding?

Character encoding tells the browser which character set the page uses — that is, how bytes map to letters and symbols. UTF-8 is the universal modern standard because it supports every character in every language, from ASCII to Chinese to emoji, in a variable-width format that is backwards compatible with ASCII.

The declaration lives in the <head>:

<meta charset="UTF-8">

It should be the first thing after the opening <head> tag, before any other meta tag or content. Why first? Because the browser has to know the encoding before it can parse any other text — if the declaration comes later, the first few hundred bytes could be misinterpreted.

Common Character Encoding Problems

Missing Charset Declaration

Without <meta charset>, browsers guess based on HTTP headers or content heuristics. Guesses are usually correct for English content but unreliable for international characters. Always declare explicitly.

Using a Legacy Encoding

ISO-8859-1, Windows-1252, Shift-JIS, and other legacy encodings still exist in old content. They work for their specific language but break any time a non-native character appears. Migrate to UTF-8 by re-saving files with the correct encoding.

Encoding Mismatch

The HTTP header says one encoding, the meta tag says another, the file was actually saved as a third. Browsers pick one and characters break. The fix is to ensure your server, file, and declaration all agree — standardise on UTF-8 across the stack.

BOM (Byte-Order Mark) Issues

Some editors save UTF-8 files with a byte-order mark at the start. For HTML files, this sometimes causes parsing issues (the BOM gets treated as invisible whitespace). Save UTF-8 without BOM whenever possible.

Mojibake / Garbled Characters

"Â£" showing up where "£" should appear, or "â€™" instead of an apostrophe, are classic encoding problems. The underlying cause is always a mismatch somewhere between file, server, and declaration.

Why This Matters for SEO

Doctype and charset are not direct ranking factors, but they affect rankings through several indirect paths:

Quirks mode breaks mobile layouts. Mobile-first indexing means Google crawls and ranks pages based on mobile rendering. A page that breaks on mobile in quirks mode will rank worse.
Encoding errors damage content indexing. If character encoding is wrong, Googlebot indexes garbled text. Pages with garbled content get flagged as low quality.
Accessibility fails with wrong encoding. Screen readers depend on correct encoding to announce text properly. Broken encoding breaks accessibility.
Structured data parsing. JSON-LD schema must be correctly encoded. Encoding issues can invalidate your structured data, killing rich-result eligibility.

How to Fix Doctype and Encoding Issues

Add HTML5 doctype. Ensure <!DOCTYPE html> is the first line.
Add UTF-8 charset. Place <meta charset="UTF-8"> as the first element inside <head>.
Save files as UTF-8. Configure your editor (VS Code, Sublime, etc.) to save files as UTF-8 without BOM by default.
Configure server Content-Type. Ensure your web server sends Content-Type: text/html; charset=utf-8 headers.
Test. Use this checker to verify, then view-source to manually confirm.
Audit sitewide. Use Site Audit to flag any page missing the declarations.

Related Technical SEO Tools

Deprecated HTML checker — related HTML hygiene audit.
HTTPS checker — another technical baseline.
Tech stack checker — identify CMS and framework.
Site Audit — bulk-verify doctype/charset plus 30+ other checks.