On Sun, Mar 20, 2016 at 8:24 PM, ORL <orl@xxxxxxxx> wrote:
Frequency domain:
The "phase-vocoder" takes a signal over time, analyses the loudest frequences using the FFT, and re-synthesizes the audio with a new pitch. This sounds pretty good, but has the issue that a "window" of time is required for accurate re-synthesis. The biggest issue is latency - unless a very short window is used, the delay of processed audio is very noticable.
I'm trying several things with pitch correction and shifting on vocals these last days. It's for live purpose, and for rap music, so I've to come to something with a rather small latency.
What I need is basically a pitch shift up to 1.2 and/or down to 0.8. If possible, I'd like to get audible autotune correction as well. If possible, also, I would like to be able to pitch farther from time to time.
Hi Orl,
I'm sure you're aware there are a few approaches to doing pitch-shifting, and all have pros and cons:Frequency domain:
The "phase-vocoder" takes a signal over time, analyses the loudest frequences using the FFT, and re-synthesizes the audio with a new pitch. This sounds pretty good, but has the issue that a "window" of time is required for accurate re-synthesis. The biggest issue is latency - unless a very short window is used, the delay of processed audio is very noticable.
Time domain:
Another
technique is to "synchronous overlap and add" also referred to SOLA, or
PSOLA (phase SOLA). These generally have less latency (due to very small
"grains" of audio being manipulated), however depending on the content
of the audio stream, and the settings used, can sound quite bad.
While developing the live-looping program Luppp[1], I used the FAUST PSOLA pitch-shifter, tweaked the settings, and generated C++ for doing live-pitch-shifting (aka - very low latency, at the cost of quality).
In my experience, the quality of exactly +12 and -12 semitones is generally acceptable with PSOLA algorithms, as it is ~= to kicking out every 2nd sample, or adding 1 in-between each sample. The worst aliasing/noise is generally heard at a few semitones, so when testing settings I recommend listening very carefully at the +2 to +5 semi range :)
--
_______________________________________________ Linux-audio-user mailing list Linux-audio-user@xxxxxxxxxxxxxxxxxxxx http://lists.linuxaudio.org/listinfo/linux-audio-user