Media & Files

Magic Numbers: How File Signatures Identify Any File

The few bytes at the start of a file reveal what it really is — far more reliably than its extension. How magic numbers work, why they matter, and how to read them.

Every file on your computer is a stream of bytes. Nothing about that stream inherently says “I am a photo” or “I am a spreadsheet” — except, for almost every well-designed format, a small fingerprint deliberately placed at the very beginning. That fingerprint is called a magic number (or file signature), and it is the single most reliable way to know what a file actually is.

Understanding magic numbers explains a lot of everyday computing: why renaming a file rarely “converts” it, how an upload form rejects the wrong file type, how forensic tools recover deleted images, and how malware disguises itself. This guide covers what signatures are, why they beat file extensions, and how to read them yourself with the File Inspector.

What is a magic number?

A magic number is a constant, predefined sequence of bytes that a file format requires at a known position — almost always the first few bytes. When a program creates a PNG image, it writes the exact bytes 89 50 4E 47 0D 0A 1A 0A before anything else. Every PNG ever made starts that way. So when any program later wants to check “is this a PNG?”, it does not look at the name — it reads those first eight bytes and compares.

The term comes from early Unix, where developers picked recognisable constant values to tag files and data structures. Many signatures are human-readable on purpose: a PDF begins with %PDF, a ZIP archive with PK (the initials of Phil Katz, who created the format), and a Java class file with the playful CAFEBABE. Others are non-printable byte values chosen to be unlikely to occur by accident.

Why the extension can’t be trusted

A file’s extension — the .jpg or .pdf on the end — is just text in the filename, stored by the operating system, not by the file. It is a hint for humans and a shortcut for the OS to pick a default app. It is trivially editable: rename report.pdf to report.jpg and the bytes inside are completely unchanged.

This gap between the claimed type and the real type is a security problem. A common malware trick is to give an executable a harmless-looking name and icon — invoice.pdf.exe, or a .jpg whose contents actually start with 4D 5A (the MZ signature of a Windows program). A human glancing at the icon is fooled; a tool reading the magic number is not. That is exactly why upload validators, antivirus, and forensic tools all identify files by signature and treat the extension as an untrusted label.

⚠️When the real signature and the extension disagree, treat the file as suspicious. The File Inspector raises a mismatch warning for precisely this case.

A tour of common signatures

Here are the magic numbers you will run into most often, shown as hexadecimal bytes and, where printable, as text:

FormatHex signatureASCII
PNG image89 50 4E 47 0D 0A 1A 0A.PNG....
JPEG imageFF D8 FF
GIF image47 49 46 38GIF8
PDF document25 50 44 46%PDF
ZIP / DOCX / JAR50 4B 03 04PK..
GZIP archive1F 8B
Windows EXE/DLL4D 5AMZ
ELF (Linux binary)7F 45 4C 46.ELF
Java classCA FE BA BE
SQLite database53 51 4C 69 74 65SQLite

Not just the start: footers and trailers

Some formats also define bytes at the end of the file, called a footer or trailer. A JPEG, for instance, should end with FF D9 (end-of-image), and a PDF ends with %%EOF. Footers matter for two reasons. First, they let a tool detect truncation — if the start says PNG but the proper end marker is missing, the file is incomplete or corrupt. Second, the region after a valid footer is a favourite hiding place: many viewers stop reading at the end marker, so extra data appended past it stays invisible to casual inspection but is easily found by a tool that knows to look.

Containers and shared signatures

One signature does not always mean one format. Modern document formats are containers — a wrapper format holding a structured set of files inside. DOCX, XLSX, PPTX, JAR, APK, and EPUB are all ZIP archives with a particular internal layout, so they all begin with the same PK signature. Identifying which one you actually have means opening the archive and inspecting its directory: an [Content_Types].xml entry signals Office Open XML; a META-INF/MANIFEST.MF signals a Java JAR. This is why robust identification looks past the first four bytes and parses the structure — a job the File Inspector’s format parsers handle automatically.

Carving, polyglots, and spoofing

Because signatures are predictable, you can scan an entire blob of bytes for them and pull out embedded files — a forensic technique called file carving. It is how investigators recover photos from a raw disk image: find every FF D8 FF, read until the matching FF D9, and you have reconstructed a JPEG with no filesystem at all.

The flip side is deliberate trickery. A polyglot is a single file crafted to be valid as two formats at once — famously a file that is simultaneously a working PNG and a working ZIP, because the two formats look for their structures in different places. Attackers exploit the same flexibility to smuggle payloads past filters that check only one signature. Signature-aware tools defend against this by carving for all known signatures rather than trusting the first one they see.

How software puts this to work

The classic implementation is the Unix file command and its libmagic database, which matches files against thousands of signature rules. Web frameworks use the same idea to validate uploads by content rather than name; browsers sniff the first bytes of a response to decide how to render it; and operating systems consult signatures when an extension is missing. The principle is always the same: the bytes are the truth, the name is just a label.

See it for yourself

The fastest way to make this concrete is to look at real bytes. Drop any file into the File Inspector and read the first row of the hex view — you will see the signature immediately, with the format identified beside it and any extension mismatch flagged. To go the other way and look up what a given extension or MIME type should be, use the MIME Lookup. Once you have read a few signatures by hand, you will never fully trust a file extension again.

Frequently asked questions

What is a magic number in a file?

A short, fixed sequence of bytes at the very start of a file that identifies its format — for example PNG files always begin with the bytes 89 50 4E 47. Software reads these bytes to know what a file really is, independent of its name.

Can two formats share the same magic number?

Yes. DOCX, XLSX, JAR, APK and EPUB all begin with the ZIP signature 50 4B 03 04 because they are ZIP archives internally. Identifying them requires looking deeper at the archive contents, not just the first four bytes.

Why not just trust the file extension?

The extension is part of the filename and can be changed by anyone with a rename — it carries no guarantee. The magic number is written inside the file by the program that created it, so it is far harder to fake convincingly.

How do I see a file’s magic number?

Open it in a hex viewer such as the File Inspector and look at the first row of bytes, or check the identity badges, which compare the real signature against the claimed extension and flag any mismatch.

Was this article helpful?