Ivan Schutt - Software Engineer & Systems Analyst

An audio filler sound detection and removal tool built for voice-over, podcast, and audio post-production workflows. Upload audio, detect unwanted sounds using three independent algorithms, review detections on an interactive waveform, and export clean audio.

Github Repository: https://github.com/ivan-schutt/voiceanalysis-showcase

The Idea

As someone working in audio post-production, I constantly deal with filler sounds, breaths, and unwanted noises in voice recordings. I wanted a tool that could automatically detect these sounds using multiple approaches and let me review and remove them from a single interface — without switching between different software.

The core concept: independent detection algorithms feed into a unified review interface, where detections appear as labeled regions on an interactive waveform. Select what to remove, and export clean audio.

---

Acoustic Fingerprinting

The first detection mode uses acoustic fingerprinting. You build a reference database of sounds you want to detect , the system extracts 39-dimensional MFCC features and uses Dynamic Time Warping to find similar sounds in your audio, regardless of speed or pitch variations.

The idea is to capture profiles from clients and use them to detect repeated sounds across different projects.

Detections appear as labeled regions on the waveform with confidence scores. Click any detection to jump to it and hear the audio. Drag region boundaries to fine-tune start/end points.

---

Whisper Transcription

The second mode uses OpenAI's Whisper model to transcribe audio and find spoken filler words by keyword matching.

---

Breath Detection

The third mode detects breath sounds in non-speech regions using Silero VAD inversion combined with spectral flatness analysis.

---

Word Search

Beyond detection, the tool includes a word search feature. Upload multiple audio takes, transcribe them all via Whisper, then search for specific words or phrases across every take.

---

Audio Editing

The editor silences segments in place rather than cutting them , this preserves total audio duration and avoids sync issues in multi-track workflows. Configurable crossfade transitions ensure smooth edits.

VoiceAnalysis