Sound, Frequency & the FFT: How a Spectrum Analyzer Sees Audio

Here is a small miracle you perform constantly without noticing: a sound reaches your ear as a single wiggling change in air pressure, one value at each instant — and yet you hear it as a rich blend of separate pitches, a bass note and a melody and a cymbal’s hiss, all at once. Pulling that single wave apart into the individual frequencies hiding inside it is one of the most useful ideas in all of science. This guide explains how it works, and what a spectrum analyzer is actually showing you.

See it live in the Audio Spectrum Analyzer & Spectrogram.

Sound is a wave; frequency is pitch

A sound is a wave of pressure travelling through the air — regions squeezed slightly tighter and pulled slightly looser, rushing past at the speed of sound. The frequency of that wiggle, how many times per second it repeats, is what we hear as pitch, measured in hertz (Hz). A slow wiggle is a low note; a fast one is a high note. Humans hear roughly 20 Hz to 20,000 Hz, from a felt rumble to a thin whistle.

The crucial fact is that frequency is multiplicative, not additive. Going up one octave always means doubling the frequency. The note A appears at 55, 110, 220, 440, 880 Hz and so on — each an octave apart. This is why musicians and analyzers alike often draw frequency on a logarithmic scale, where each octave takes the same amount of space, just like the keys of a piano.

Why a violin and a flute playing the same note sound different

Play the note A (440 Hz) on a flute and on a violin and you can instantly tell them apart, even though the pitch is identical. The reason is harmonics. A real instrument doesn’t produce a single pure frequency; it produces the main one (the fundamental) plus a stack of quieter frequencies at exact whole-number multiples: 440, 880, 1320, 1760 Hz… The particular recipe of how strong each harmonic is gives each instrument its character, or timbre.

💡A pure sine wave has only the fundamental — it sounds smooth and hollow. A square wave stacks loud odd harmonics and sounds buzzy; a sawtooth has every harmonic and sounds bright and brassy. Switch the analyzer’s tone generator between these and watch the comb of harmonics appear and disappear — that is timbre made visible.

The Fourier transform: taking the wave apart

So a sound is a single wave that is secretly a sum of many simple sine waves. The tool that reverses the sum — that takes the messy combined wave and tells you exactly which frequencies, at which strengths, are inside it — is the Fourier transform, named after the French mathematician Joseph Fourier. His radical 1800s claim was that any repeating waveform, however jagged, can be built by adding up enough plain sine waves. The Fourier transform finds that ingredient list.

Computing it the naive way is slow. The breakthrough was the FFT (Fast Fourier Transform), an algorithm popularised in 1965 that computes the same answer far faster — fast enough that your browser can analyse live audio in real time. Every spectrum analyzer, every audio app’s “visualiser,” and a huge amount of modern technology (from MP3 compression to Wi-Fi) rests on the FFT.

Spectrum vs spectrogram

Run the FFT once and you get a spectrum: a graph with frequency along the bottom and, for each frequency, how much energy the sound has there right now. A pure tone is a single spike. A musical note is a comb of spikes (its harmonics). Hiss or noise is a broad smear. It is a snapshot of this instant.

Run the FFT over and over and stack the snapshots side by side and you get a spectrogram: time on one axis, frequency on the other, and brightness showing how much energy is at each frequency at each moment. Now you can watch sound change — see a voice glide between vowels, a siren sweep up and down, or a song’s notes scroll past like a player-piano roll. The analyzer draws the newest spectrum on the right and scrolls the history to the left.

Bins, Nyquist, and the great trade-off

The FFT does not give a perfectly continuous spectrum; it divides the frequency range into a fixed number of bins, like the slots of a sorting machine. Two numbers govern everything:

frequency per bin = sample rate ÷ FFT size | highest frequency = sample rate ÷ 2 (Nyquist)

The sample rate is how many times per second the audio was measured (commonly 44,100 or 48,000). The Nyquist limit says you can only ever see frequencies up to half the sample rate — which, conveniently, lands just above the top of human hearing. The FFT size sets how many bins those frequencies are split into.

⚠️Here is the catch that defines all spectral analysis: a bigger FFT gives finer frequency bins (you can separate two close notes) but needs more time-worth of audio, so it reacts slowly and smears fast events. A smaller FFT is quick in time but coarse in frequency. You cannot have perfect time and perfect frequency resolution at once — pick where to sit on the trade-off for the job at hand. The smoothing control simply averages each bin with its recent values to steady the picture.

From frequency to musical note

Because pitch is logarithmic, turning a measured frequency into a musical note is a clean piece of maths. Modern instruments use equal temperament, dividing each octave into twelve equal semitones. Taking A4 = 440 Hz as the anchor, any frequency’s position is:

note number n = 69 + 12 · log₂(frequency ÷ 440)

Round n to the nearest whole number to get the closest note, and the leftover fraction tells you how far off it is in cents (hundredths of a semitone) — sharp if positive, flat if negative. That is exactly how a chromatic tuner works, and why the analyzer can name the loudest frequency and tell you it is, say, “A4, +6 cents.” A bigger FFT size sharpens that reading.

In practice

Once you can read a spectrum, sound stops being mysterious. A steady bright line at a fixed frequency is a hum or whine — read its value to identify a 60 Hz electrical buzz or a coil whine. A moving stack of evenly-spaced lines is a musical note and its harmonics. A broad wash is noise or breath. Open the Audio Spectrum Analyzer, feed it your voice, a song, or a generated tone, and watch the invisible structure of sound paint itself in front of you.

Frequently asked questions

What is frequency, in plain terms?

Frequency is how many times per second a sound wave repeats, measured in hertz (Hz). A faster repeat is a higher pitch: 440 Hz is the A that orchestras tune to, while a deep bass note might be 50 Hz. Doubling the frequency raises the pitch by exactly one octave, which is why 220, 440, and 880 Hz are all heard as the note A.

What is the Fourier transform / FFT?

The Fourier transform is the mathematical recipe for taking a complicated wave and figuring out which simple sine-wave frequencies, at which strengths, add up to make it. The FFT (Fast Fourier Transform) is a clever algorithm that computes it extremely fast — fast enough to run thousands of times a second on live audio, which is what a real-time spectrum analyzer does.

What is the difference between a spectrum and a spectrogram?

A spectrum is a single snapshot: how loud each frequency is at one instant. A spectrogram is a stack of spectra over time — frequency on one axis, time on the other, and brightness showing energy — so you can watch how the sound's frequencies change. Speech, music, and sirens all leave distinctive shapes on a spectrogram.

Why can't a spectrum analyzer be precise in both time and frequency?

Because measuring frequency accurately requires watching the wave for a while, and watching for a while means you can't pin down a fast change to an exact moment. A larger FFT gives finer frequency detail but blurs time; a smaller FFT is quick but coarse in frequency. This time-frequency trade-off is fundamental — it is the audio cousin of the uncertainty principle.

Was this article helpful?