Hello all, Several people have asked how the pitch estimation in zita-at1 works. The basic method is to look at the autocorrelation of the signal. This is a measure of how similar a signal is to a time-shifted version of itself. It can be computed efficiently as the inverse FFT of the power spectrum. In many cases the strongest autocorrelation peak corresponds to the fundamental period. But this can easily get ambiguous as there will also be peaks at integer multiples of that period, and for strong harmonics. To avoid errors it is necessary to look also at the signal spectrum and level, and combine all that info in some way. How exactly is mostly a matter of trial and error. Which is why I need more examples. Have a look at <http://kokkinizita.linuxaudio.org/linuxaudio/pitchdet1.png> This a test of the pitch detection algorithm used in zita-at1. The X-axis is time in seconds, a new pitch estimate is made every 10.667 ms (512 samples at 48 kHz). Vertically we have autocorrelation, the Y-axis is in samples. Red is positive, blue negative. The green dots are the detected pitch period, zero means unvoiced. The blue line on top is signal level in dB. Note how this singer has a habit of letting the pitch 'droop', by up to an octave, at the end of a note. He is probably not aware of it. This happens at 28.7s, again at 30.8s, and in fact during the entire track. What should an autotuner do with this ? Turn the glide into a chromatic scale ? The real solution here would be to edit the recording, adding a fast fadeout just before the 'droop'. Even a minimal amount of reverb will hide this. The fragment from 29.7 to 30.3s is an example of a vowel with very strong harmonics which show up as the red bands below the real pitch period. In this case the 2nd and 3rd harmonic were actually about 20 dB stronger than the fundamental. This is resolved because the autocorrelation is still strongest at the fundamental pitch. The very last estimate in the next fragment (at 30.85s) is an example of where this goes wrong, the algorithm selects twice the real pitch period, assuming the first autocorrelation peak is the 2nd harmonic. This happens because there was significant energy at the subharmonic, actually leakage from another track via the headphone used by singer. The false 'voiced' detection at 30.39s is also the result of a signal leaking via the headphone. Ciao, -- FA _______________________________________________ Linux-audio-user mailing list -- linux-audio-user@xxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to linux-audio-user-leave@xxxxxxxxxxxxxxxxxxxx