Doctype & Charset Checker: Verify HTML5 and UTF-8 on Any Page
RankNibbler verifies the doctype declaration and character encoding on any URL. Catch missing doctypes that force quirks-mode rendering, legacy XHTML or HTML4 declarations, and encoding mismatches that corrupt special characters. Free, instant, no signup.
What Is a Doctype?
The doctype declaration is the first line of any HTML document: <!DOCTYPE html>. It tells the browser which version of HTML the page uses and which rendering mode to apply. HTML5 uses the simple <!DOCTYPE html>; older standards (HTML 4.01, XHTML 1.0) used long, verbose declarations with DTD references.
Without a doctype, browsers fall back to "quirks mode" — a legacy rendering mode designed for 1990s websites. In quirks mode, the CSS box model works differently, layouts break unpredictably, and some modern features are disabled. The result is a page that looks fine in some browsers and broken in others. Every modern webpage should start with <!DOCTYPE html>.
Doctype Declarations Compared
| Doctype | Status | Declaration |
|---|---|---|
| HTML5 | Current — use this | <!DOCTYPE html> |
| HTML 4.01 Strict | Legacy | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"...> |
| HTML 4.01 Transitional | Legacy | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"...> |
| XHTML 1.0 Strict | Legacy | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"...> |
| XHTML 1.0 Transitional | Legacy | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"...> |
| No doctype | Quirks mode — fix immediately | (missing) |
If your site uses anything other than <!DOCTYPE html>, the page will still render but may do so unpredictably across browsers. Upgrading is a one-line change with no downside.
What Is Character Encoding?
Character encoding tells the browser which character set the page uses — that is, how bytes map to letters and symbols. UTF-8 is the universal modern standard because it supports every character in every language, from ASCII to Chinese to emoji, in a variable-width format that is backwards compatible with ASCII.
The declaration lives in the <head>:
<meta charset="UTF-8">
It should be the first thing after the opening <head> tag, before any other meta tag or content. Why first? Because the browser has to know the encoding before it can parse any other text — if the declaration comes later, the first few hundred bytes could be misinterpreted.
Common Character Encoding Problems
Missing Charset Declaration
Without <meta charset>, browsers guess based on HTTP headers or content heuristics. Guesses are usually correct for English content but unreliable for international characters. Always declare explicitly.
Using a Legacy Encoding
ISO-8859-1, Windows-1252, Shift-JIS, and other legacy encodings still exist in old content. They work for their specific language but break any time a non-native character appears. Migrate to UTF-8 by re-saving files with the correct encoding.
Encoding Mismatch
The HTTP header says one encoding, the meta tag says another, the file was actually saved as a third. Browsers pick one and characters break. The fix is to ensure your server, file, and declaration all agree — standardise on UTF-8 across the stack.
BOM (Byte-Order Mark) Issues
Some editors save UTF-8 files with a byte-order mark at the start. For HTML files, this sometimes causes parsing issues (the BOM gets treated as invisible whitespace). Save UTF-8 without BOM whenever possible.
Mojibake / Garbled Characters
"£" showing up where "£" should appear, or "’" instead of an apostrophe, are classic encoding problems. The underlying cause is always a mismatch somewhere between file, server, and declaration.
Why This Matters for SEO
Doctype and charset are not direct ranking factors, but they affect rankings through several indirect paths:
- Quirks mode breaks mobile layouts. Mobile-first indexing means Google crawls and ranks pages based on mobile rendering. A page that breaks on mobile in quirks mode will rank worse.
- Encoding errors damage content indexing. If character encoding is wrong, Googlebot indexes garbled text. Pages with garbled content get flagged as low quality.
- Accessibility fails with wrong encoding. Screen readers depend on correct encoding to announce text properly. Broken encoding breaks accessibility.
- Structured data parsing. JSON-LD schema must be correctly encoded. Encoding issues can invalidate your structured data, killing rich-result eligibility.
How to Fix Doctype and Encoding Issues
- Add HTML5 doctype. Ensure
<!DOCTYPE html>is the first line. - Add UTF-8 charset. Place
<meta charset="UTF-8">as the first element inside<head>. - Save files as UTF-8. Configure your editor (VS Code, Sublime, etc.) to save files as UTF-8 without BOM by default.
- Configure server Content-Type. Ensure your web server sends
Content-Type: text/html; charset=utf-8headers. - Test. Use this checker to verify, then view-source to manually confirm.
- Audit sitewide. Use Site Audit to flag any page missing the declarations.
Related Technical SEO Tools
- Deprecated HTML checker — related HTML hygiene audit.
- HTTPS checker — another technical baseline.
- Tech stack checker — identify CMS and framework.
- Site Audit — bulk-verify doctype/charset plus 30+ other checks.