On Sat, Jun 07, 2014 at 10:15:41AM +0200, hermann meyer wrote: > Thanks have to go to Stephan M. Bernsee from dspdimension as well. > GxDetune is based on his work here: > http://www.dspdimension.com/admin/pitch-shifting-using-the-ft/ This sort of works, but it's not what it claims to be. The whole part that finds the exact frequency by comparing phases is completely redundant. This information is never really used. It just looks as if it is used. For example, for one octave up, you could just as well take the magnitude and phase of bin k, multiply the phase by 2 and put the result in the input bin 2*k of the inverse FFT. The result would be just the same. No frequency calculation is ever made. The net result is also equivalent to: - overlap - windowing (as in your code) but then: - downsample by 2 - repeat the result so you get the original length - add to output Which doesn't even require an FFT. The way to really use the computed frequencies would be quite different. If you have a signal at some frequency F there will be significant energy in a number of bins close to F. The correct value of F can be found by comparing the phases as explained by Bernsee. Given this F you need some way to determine which contiguous group of bins is representative of that signal (one way would be to look for minima in magnitude left and right). Now for correct frequency scaling, you need to move that whole group up or down (as determined by the ratio, e.g. 2 for one octave up) *** but without scaling the group itself ***. In other words, if bin k moves to 2*k, then bin k-1 moves to 2*k-1 etc. This requires an *interpretation* of the signal: do bins that are close together 1. represent a single frequency signal, or 2. multiple signals that are close together. In case (1) the envelope of the signal is represented by the relative magitudes and phases of the adjacent bins. To preserve this envolope (i.e. to correctly reproduce transient signals), these bins need to remain adjacent. Another way to state this that any algorithm that does frequency scaling (or time stretching) needs some way to decide if certain features of the signal need to be interpreted as significant in the time domain or in the frequency domain. The correct decision depends on how a human listener would interpret that feature. It is not even possible to *define* a frequency scaling or time stretching algorithm without at least implicitly defining a way to decide on this. The implicit assumption in the current algorithm is that each bin is an separate feature in the frequency domain, and thus needs to be scaled independently of all others. Ciao, -- FA A world of exhaustive, reliable metadata would be an utopia. It's also a pipe-dream, founded on self-delusion, nerd hubris and hysterically inflated market opportunities. (Cory Doctorow) _______________________________________________ Linux-audio-user mailing list Linux-audio-user@xxxxxxxxxxxxxxxxxxxx http://lists.linuxaudio.org/listinfo/linux-audio-user