ATC recording using SDR - deeper analysis - raw signal processing and SNR estimation
Check out our previous blog posts:
- Blog 1: Basic terminology and hardware setup description for ATC listening
- Blog 2: Where to place your antenna for ATC recordings
- Blog 3: What is the best SDR hardware choice for ATC
- Blog 4: How to setup your SDR for clean ATC audio
This blog post is more technical compared to the previous ones. In the next paragraphs we
will describe the raw signal processing pipeline. The rtl-airband software is set to produce
raw data coming from the SDR hardware in cs16 format.
Converting the raw signal into the audio format
Produced cs16
files are processed through:
cat ${signalfile}.cs16 | csdr convert_s16_f | csdr amdemod_cf | csdr fastdcblock_ff | csdr gain_ff 3 | csdr limit_ff | csdr convert_f_s16 > ${signalfile}.raw
which does:
- conversion from int to float value,
- AM demodulation
- signal level adjustments
- back to int conversion
- saving as PCM
Next, we drop all segments shorter than 1 second as they do not contain any meaningful signal. You may have noticed we are not using automatic gain control (AGC). The reason is, that the AGC does a signal deformation (rapidly changing volume and thus amount of noise). As we have the whole recording and can process it off-line, we implemented a segment base gainer.
Segment base gainer
We detect push-to-talk clicks using wavelet transform and identify particular utterances in the audio. We amplify each segment not to exceed 95% of the maximum level of the wav file (1.0 in our case). The peak levels are ignored. See the figure below:
Original raw signal is on top, amplified is on the bottom.
Voice activity detection
We detect speech parts of the audio to be further used to reliably estimate the Signal-to-Noise Ratio. The Voice Activity Detector (VAD) is based on a neural network with 2 hidden layers and 2 output classes. It was trained on 1366 hours of multilingual telephone speech corpus. The neural network output is smoothed by averaging over a 5 frame window, and we can adjust the detection threshold to control the amount of detected speech. See the figure below with indicated speech in the recording (red parts).
Signal-to-Noise Ratio estimation
The SNR estimation technique is based on the waveform amplitude distribution analysis (Chanwoo Kim, Richard M. Stern, "Robust Signal-to-Noise Ratio Estimation Based on Waveform Amplitude Distribution Analysis", Interspeech 2008). In principle, the amplitude distribution of noise is Gaussian while the amplitude distribution of speech is Gamma. We can “guess the SNR by estimating where we are between Gaussian and Gamma distributions” for our signal.
To estimate the SNR reliably we select only speech segments and avoid all the non-speech parts. We apply the SNR estimation technique which provides an SNR estimate per each voiced segment.