ASCII & Unicode Lookup

Get decimal, hex, binary, HTML entity, CSS escape and UTF-8 bytes for any character or string

Character Lookup

String Breakdown

Enter a string to see every character's codes

ASCII Table

Unicode Block Reference

Quick Reference

  • A–Z: 65–90 (0x41–0x5A)
  • a–z: 97–122 (0x61–0x7A)
  • 0–9: 48–57 (0x30–0x39)
  • Space: 32 (0x20)
  • Tab: 9 (0x09)
  • Newline: 10 (0x0A)
  • Null: 0 (0x00)
  • DEL: 127 (0x7F)

Encodings

  • ASCII: 0–127 (7-bit)
  • Latin-1: 0–255 (8-bit)
  • UTF-8: variable 1–4 bytes
  • UTF-16: 2 or 4 bytes
  • Unicode: 0–10FFFF (hex)

ASCII & Unicode Character Lookup Tool

The definitive reference for character encoding. Look up any character's decimal, hexadecimal, binary, HTML entity, CSS escape, JavaScript string, UTF-8 byte sequence and UTF-16 representation instantly. Includes the full ASCII table, extended character sets and Unicode block reference.

Features

12 Properties Per Char

Decimal, hex, binary, HTML entity, CSS escape, JS string, UTF-8 bytes, UTF-16, Unicode code point and more.

String Breakdown

Enter any string to see a complete encoding table for every individual character.

Full ASCII Table

Browse the complete ASCII table (0–127) and extended set (128–255) organised by type.

Unicode Blocks

Reference guide to 12 major Unicode blocks with clickable navigation.

CSV Export

Export your string's character breakdown as a CSV file for offline reference.

Dual Input

Look up by typing a character, pasting text, or entering a decimal or hex code point.

Who Uses This Tool?

Web DevelopersFind HTML entities for special characters and verify UTF-8 encoding of content.
WritersIdentify the correct code for typographic characters: em dash, curly quotes, ellipsis.
Security ResearchersAnalyse Unicode homoglyphs and control characters used in phishing and injection attacks.
CS StudentsStudy character encoding, UTF-8 encoding algorithm and Unicode code point structure.

Frequently Asked Questions

What is the difference between ASCII and Unicode?
ASCII is a 7-bit encoding standard for 128 characters (English letters, digits and control codes). Unicode is a universal standard covering over 140,000 characters from all the world's writing systems. UTF-8 is the most common way of encoding Unicode.
What is UTF-8 and why is it standard?
UTF-8 is a variable-length encoding: ASCII characters use 1 byte, most European characters use 2 bytes, and other scripts use 3–4 bytes. Its backward compatibility with ASCII and space efficiency made it the dominant encoding on the web (98%+ of websites).
What is an HTML entity?
An HTML entity is a text representation of a character using an ampersand prefix and semicolon suffix (e.g., & for &, < for <). They are used to display reserved HTML characters and characters not easily typed on a keyboard.
What are control characters (0–31)?
Control characters are non-printable ASCII codes that control text flow: Tab (9), Line Feed/newline (10), Carriage Return (13), Null (0), Bell (7) etc. They are invisible but affect how text is processed and displayed.

Pro Tip

When debugging text encoding issues, check for the UTF-8 BOM (Byte Order Mark, U+FEFF) at the start of files — it can cause "mystery characters" to appear. Also watch for the difference between a regular hyphen (-), en dash (–) and em dash (—), which look similar but have different code points.

Did You Know?

128
Original ASCII Characters
ASCII (American Standard Code for Information Interchange) defined only 128 characters in 1963 — sufficient for English. When computers went global, 128 characters proved woefully inadequate for the world's 6,500+ languages. This limitation created chaos: every country invented their own "extended ASCII" incompatibly.
149,813
Unicode Characters (v15.1)
Unicode 15.1 (2023) defines 149,813 characters covering virtually every writing system on Earth, plus historical scripts, mathematical symbols, musical notation, emoji and more. The standard targets 1,114,112 total possible code points (U+0000 to U+10FFFF).
98%
of Websites Use UTF-8
UTF-8 is used by over 98% of websites — it is the dominant text encoding on the internet. UTF-8 is brilliant: it is backward-compatible with ASCII (the first 128 characters are identical), and efficiently encodes most common characters in 1–2 bytes while supporting all 1.1 million Unicode code points.

ASCII Control Characters Quick Reference

DecHexAbbrevMeaningUsage
00x00NULNullString terminator in C
70x07BELBellAudio alert (rarely used)
80x08BSBackspaceDelete previous character
90x09HTHorizontal TabIndentation, TSV files
100x0ALFLine FeedUnix/Mac newline (\n)
130x0DCRCarriage ReturnWindows uses CR+LF (\r\n)
270x1BESCEscapeANSI terminal sequences
320x20SPSpaceWord separator
1270x7FDELDeleteOriginally punched tape erasure

More Questions

Why do I sometimes see "?" or black diamonds when text is copied?
These replacement characters (often □ or ?) appear when text encoded in one character set is interpreted using a different one. Most commonly: UTF-8 text read as Latin-1, or vice versa. The solution is always to specify encoding explicitly. In web development, always include in HTML and set database connection charset to utf8mb4.
What is the difference between UTF-8, UTF-16 and UTF-32?
All three encode Unicode code points but use different byte representations. UTF-8: variable 1–4 bytes, ASCII-compatible, web standard. UTF-16: variable 2 or 4 bytes, used internally by Windows, Java and JavaScript strings. UTF-32: fixed 4 bytes per character, simple but wasteful. UTF-8 dominates for storage and transmission; UTF-16 is common in memory representations.
What are emoji and how are they encoded?
Emojis are Unicode characters assigned code points like any other character. The smiling face 😀 is U+1F600. In UTF-8, it encodes to 4 bytes: F0 9F 98 80. The visual appearance (Apple, Google, Microsoft, Samsung emoji look different) is determined by each platform's font — the Unicode standard only defines the meaning, not the appearance. Skin tone modifiers (U+1F3FB to U+1F3FF) combine with compatible characters using Unicode's combining character mechanism.

Common Mistakes

Not setting charset="UTF-8" in HTML
Without explicit charset declaration, browsers use heuristics that can misinterpret encoding, causing garbled text (mojibake) for non-ASCII characters.
Always include as the first element inside .
Using Latin-1 database columns for user content
Latin-1 (latin1 in MySQL) cannot store emoji, Chinese characters or most non-European languages. Attempting to store them causes silent data truncation or errors.
Use utf8mb4 (not utf8!) in MySQL/MariaDB — "utf8" in MySQL is broken and only supports 3-byte characters.
Hardcoding character comparisons without normalisation
The same visual character (e.g., "é") can be encoded as a single code point (U+00E9) or as "e" + combining accent (U+0065 + U+0301). These look identical but are byte-different.
Apply Unicode normalisation (NFC or NFD) before comparing or storing user text.