A file is just a sequence of bytes. The icon, the extension, the thumbnail your operating system shows you — those are conveniences layered on top, and every one of them can be wrong or deliberately faked. File inspection (also called file forensics) is the practice of reading those raw bytes directly to answer the questions that really matter: What is this file actually? Where did it come from? Is it intact? Is anything hidden inside it? Is it safe?
The File Inspector packs a full forensics workbench — 23 analysis tabs, 56 format parsers, an editable hex view, entropy visualisations, executable analysis, and threat-intelligence exports — into a single page that runs entirely in your browser. This guide walks through the concepts behind each capability so you know not just what a tab does, but why it matters.
Magic bytes vs. the file extension
This is the single most important idea in file forensics, so it comes first. There are two completely different ways to claim “what a file is”:
- The extension (
.jpg,.pdf,.exe) is just text at the end of the filename. It is metadata stored by the filesystem, not by the file. Renaminginvoice.pdftoinvoice.jpgchanges nothing inside the file. - The magic bytes (or “file signature”) are the first few bytes inside the file, written by whatever program created it. They are the format’s fingerprint.
Almost every format begins with a recognisable signature:
| Format | First bytes (hex) | As text |
|---|---|---|
| PNG | 89 50 4E 47 0D 0A 1A 0A | .PNG.... |
| JPEG | FF D8 FF | — |
25 50 44 46 | %PDF | |
| ZIP / DOCX / JAR | 50 4B 03 04 | PK.. |
| Windows EXE | 4D 5A | MZ |
| ELF (Linux) | 7F 45 4C 46 | .ELF |
| GIF | 47 49 46 38 | GIF8 |
When the inspector reads a file, it checks the real signature against the claimed extension. If they disagree it raises a mismatch warning. A .jpg that begins with MZ is not a photo — it is a Windows executable wearing a costume, and that is exactly the trick used to slip malware past humans who only glance at the icon.
PK.. — because they are ZIP archives with a specific internal layout. Identifying a file often means looking past the first four bytes to the structure inside, which is what the format parsers do.Identifying the file: MIME types and 800+ formats
Once it has the magic bytes, the inspector resolves the file against a database of more than 800 file types (1,600+ extension entries) to give you three layers of identity:
- Type & description — the human-readable format name and what it is used for.
- MIME type — the standardised
type/subtypelabel (e.g.image/png,application/pdf) that browsers and servers use to decide how to handle content. - Category — image, audio, video, archive, executable, document, and so on, used to choose which deep-analysis tabs are relevant.
The Extension Lookup box at the bottom of the tool works in reverse: type any extension (.dwg, .pyc, .heic) to learn what it is without even having a file to hand.
Cryptographic hashes: fingerprints for bytes
A hash is a fixed-length fingerprint computed from a file’s contents. Change a single bit anywhere and the hash changes completely. The Hashes tab computes SHA-256, SHA-1, and SHA-512 via the browser’s Web Crypto API. Hashes answer several questions at once:
- Integrity — does the file you downloaded match the hash the publisher posted? If so, it arrived uncorrupted and untampered.
- Identity — two files with the same SHA-256 are byte-for-byte identical. Antivirus vendors and threat-intel feeds track malware by hash, so you can paste a file’s SHA-256 into a service like VirusTotal to see if it is known.
- Deduplication — spot identical files that have been renamed.
The hex editor: reading and editing raw bytes
The Hex Editor is the heart of any forensics tool. It shows the file as a grid of bytes in hexadecimal (base-16), the compact notation where one byte = two hex digits (00 to FF). A classic hex view has three columns:
- Offset — the address of each row, in hex, counting bytes from the start of the file.
0x00000010is byte 16. - Hex pane — the byte values themselves.
- ASCII pane — the same bytes shown as text where printable, with dots for non-printable values. This is where embedded strings, headers, and human-readable fragments jump out.
What you can do in the hex view
- Inspect any byte — click a byte and the Data Inspector shows how those bytes decode as int8/16/32, float, and timestamp, in both little- and big-endian. Shift-click to select a range.
- Search — find a hex pattern (
FF D8 FF) or an ASCII string, jump to an offset (Go to 0x…), or Find All occurrences. - Edit inline — change bytes, with full undo/redo, then Download Modified to save the altered file. This is how you repair a broken header or redact bytes.
- Structure templates — overlay a known layout (PNG signature, PE DOS header, ELF header, ZIP local header…) so each field is labelled instead of being a wall of hex.
- Navigation aids — bookmarks, a column ruler, a minimap of the whole file, “Copy as C array”, and Rainbow mode that colours bytes by value to reveal structure at a glance.
Entropy: spotting compression and encryption
Shannon entropy measures how unpredictable the bytes are, on a scale of 0 to 8 bits per byte. It is one of the most powerful signals in forensics because different kinds of data have characteristic entropy:
| Entropy | Typical content |
|---|---|
| Low (0–4) | Plain text, source code, simple/uncompressed data — lots of repetition |
| Medium (4–7) | Structured binaries, executables, office documents |
| High (7.5–8) | Compressed (ZIP, JPEG) or encrypted data — looks like pure noise |
A single number for the whole file is useful, but the real insight comes from where entropy changes. The inspector’s Entropy Window slides across the file and plots entropy as a line: flat low stretches are plaintext or padding; sudden peaks are compressed or encrypted regions. A small high-entropy island inside an otherwise ordinary file is a textbook indicator of a hidden or appended payload — exactly what you would see when something is smuggled into the slack space of a document.
Seeing structure: the visualisation gallery
Bytes are abstract; pictures are not. The Visualize tab renders the file several ways, each surfacing a different property:
- Byte frequency — a histogram of how often each value 0x00–0xFF appears. Text clusters in the ASCII range; encrypted data is flat.
- Entropy heatmap — the file split into blocks, brighter = higher entropy. Encrypted sections light up.
- Pixel map — one byte per pixel, coloured by value. Repeating patterns, headers, and embedded regions become visible shapes.
- Digram plot — byte N on the X axis, byte N+1 on the Y axis. Different file types produce recognisably different constellations.
- Hilbert curve — a space-filling layout that keeps nearby bytes near each other, so regions of similar data stay clustered.
Strings and indicators of compromise
Buried inside almost any binary are runs of readable text: URLs, file paths, error messages, library names, and configuration. The Strings tab extracts every printable sequence above a minimum length and auto-classifies them into categories — URLs, emails, IP addresses, file paths, and more.
For investigators these strings are indicators of compromise (IOCs): a hard-coded URL might be a malware command-and-control server; an embedded path might leak the author’s username. The Intel tab goes further and scans specifically for high-value secrets:
- Credentials — AWS keys, GitHub/Slack tokens, Google API keys, private keys, JWTs, connection strings, Bearer tokens.
- PII / privacy — SSNs, credit-card numbers, IBANs, phone numbers, plus tracking pixels and browser-fingerprinting scripts.
- Blockchain — Bitcoin, Ethereum, and Monero wallet addresses.
- Network intel — webhook URLs (Discord/Slack/Teams), hard-coded DNS, user-agents, and API endpoints.
.env secrets baked into a binary are one of the most common real-world security mistakes.File carving: finding files inside files
Because every format has a magic signature, you can scan a file’s entire body for those signatures and discover other files embedded inside it. This is file carving, and the Embedded tab does it automatically, then lets you extract and hash each find.
Carving reveals:
- Polyglots — a single file that is validly two formats at once (a famous trick is a file that is both a working PNG and a working ZIP).
- Appended data — content tacked on after a format’s official end marker, where most viewers stop reading.
- Recovered files — images and documents pulled out of disk images or memory dumps.
Format parsers: reading the structure
Identification tells you what a file is; a format parser tells you how it is built. The Format tab includes 56 parsers that walk a file’s internal structure and label each part — PNG chunks, RIFF/WAV sub-chunks, MP4 boxes, ID3 tags, ZIP central directories, SQLite page headers, certificate ASN.1 fields, and dozens more.
This matters because structure is where the truth and the anomalies live. A parser can show that a PNG’s declared dimensions don’t match its pixel data, that a chunk’s length runs past the end of the file (corruption, or a hiding spot), or that a document contains an unexpected object. The covered families span images, audio, video, documents, archives, executables, structured data (SQLite, CSV, YAML, TOML), crypto material (PEM, DER, SSH keys, certificates), game ROMs, and developer files.
Executable analysis: PE and ELF
Executables get their own dedicated tabs because they are the files most worth scrutinising.
PE Analysis (Windows .exe / .dll / .sys)
The Portable Executable format is the structure of every Windows program. The PE tab breaks out its sections (code, data, resources), its imports (which system functions it calls — a strong hint at behaviour: networking, crypto, process injection), and security flags like ASLR and DEP. It also checks 31 packer signatures. A “packer” compresses or encrypts a program so its real code is hidden until it runs; legitimate software occasionally uses packers, but malware uses them constantly to evade scanners, so a packed-plus-high-entropy executable is a red flag.
ELF Analysis (Linux / Unix binaries)
ELF is the Linux equivalent. The tab shows its sections and linked libraries, plus the hardening flags — NX (non-executable stack), PIE (position-independent, enables ASLR), and stack canaries — that tell you how defensively the binary was compiled.
Document forensics
Office documents and PDFs are a leading malware-delivery vector because they can carry active content. The Documents tab looks specifically for the dangerous parts: VBA macros inside Office files, and JavaScript or auto-launch actions inside PDFs. Seeing that an “invoice” contains an auto-executing macro tells you everything you need to know before you ever open it in Word.
Steganography detection
Steganography is hiding data inside other data so its very existence is concealed — the opposite of encryption, which hides meaning but not presence. The Security tab includes several detectors:
- LSB bit-plane viewer — images can hide a payload in the least-significant bit of each colour value, invisible to the eye. Toggling through bit planes 0–7 makes hidden patterns appear.
- Zero-width & whitespace stego — invisible Unicode characters or trailing spaces used to encode hidden text in documents.
- Homoglyph detection — letters from different alphabets that look identical (Latin “a” vs Cyrillic “а”), the basis of look-alike phishing domains.
Integrity checks and file comparison
The Integrity tab validates a file against its own format rules — checking, for example, that a PNG’s chunk CRCs are correct, that a JPEG ends with its proper marker, and whether extra data is hidden after the official end of file (a common, low-effort hiding place). The Compare tab takes a second file and produces a side-by-side hex diff with a similarity score, so you can see exactly which bytes differ between two versions or spot a tampered copy.
Reporting and threat-intel exports
Analysis is only useful if you can act on it. The inspector exports its findings in eight formats, including formats built for security tooling:
- HTML / Text / JSON / Markdown — human and machine-readable reports for tickets and notebooks.
- YARA — a rule that matches this file’s signature, so scanners can find other copies.
- Sigma — generic detection rules for SIEM log searches.
- Snort — network intrusion-detection signatures.
- STIX 2.1 — the standard structured-threat-intelligence bundle for sharing IOCs between teams and platforms.
Smart Copy shortcuts let you grab all hashes, all IOCs as CSV, or the analysis summary in one click.
Practical workflows
Putting it together, here is how the tabs combine for common tasks:
- “What is this unknown file?” — Drop it in, read the identity badges and Format tab. If the extension and magic bytes disagree, stop and investigate.
- “Is my download legit?” — Compare the SHA-256 in the Hashes tab against the one on the publisher’s site.
- “Is anything hidden in this image?” — Check the Entropy Window for spikes, run the Embedded carve, and view the LSB bit planes.
- “Triage a suspicious attachment.” — Hashes → Documents (macros/JS) → Strings/Intel (IOCs) → export a STIX bundle — without ever executing the file.
- “Repair or redact a file.” — Use the hex editor to fix a header or zero-out sensitive bytes, then Download Modified.
Because all of this runs locally, you can do it on evidence, on secrets, and on live malware without sending a single byte anywhere. Open the File Inspector and drop a file in to see every tab populate in real time.
Frequently asked questions
Does the File Inspector upload my file?
No. Every byte is read and analysed inside your browser tab using the File and Web Crypto APIs. The file never touches a server, which is why it also works offline and is safe for confidential documents and malware samples.
Extension vs magic bytes — which one is correct?
Magic bytes win. The extension is just text in the filename that anyone can rename; the magic bytes are the real signature written inside the file by the program that created it. The inspector flags a mismatch when the two disagree.
What does high entropy mean?
Entropy measures randomness from 0 to 8 bits per byte. Values near 8 mean the data is compressed or encrypted (or both); low, uneven values mean structured data like text or code. A sudden entropy spike inside an otherwise low-entropy file is a classic sign of an embedded or hidden payload.
Is it safe to open a suspicious file in the inspector?
Reading a file as data is far safer than executing it. The inspector parses bytes — it never runs the file. That said, treat unknown executables with care and analyse them on a machine you can afford to wipe.
What is file carving?
Scanning a file for the magic-byte signatures of other formats hidden inside it, then extracting those embedded files. It is how you pull a ZIP hidden at the end of a PNG, or recover images from a memory dump.