Mel Spectrogram From Scratch
Key Insight
Libraries like torchaudio hand you a mel spectrogram in a single call, but rebuilding one by hand — windowing the raw waveform, running a Short-Time Fourier Transform, then applying a mel filterbank — shows there is no magic inside. That filterbank is just a fixed matrix of triangular weights, so the famous "perceptual" step is one matrix multiply that folds the STFT's many evenly-spaced frequency rows down into a handful of mel bands spaced the way human hearing is. Doing it from scratch on a 10-second clip turns the everyday habit of treating audio as an image into something you understand rather than trust.