On Sun, Feb 27, 2022 at 11:37:45AM +0100, Jeanette C. wrote: > Hm, if such images are clean, I suppose a program can be written to > translate the sonogram to values. They are not always very clean and there are two reasons for this: - the quality of the recording (filtering will help), - the complexity of the sound. > 2D representations of all kind are unfeasible really. The problem here is that some bird sounds can only be represented correctly in 2D parameter space. Some contain a clear single frequency, usually sweeping and modulated. Such modulation can be quite fast, in the tens of Hz region. Some others contain very short broadband features, and the whole notion of a single frequency is not valid at all. I've seen impulsive waveforms of only a few milliseconds in some recordings. And many bird sounds are a mix of those two extremes. A sonogram deals with both of them, that is why it is useful. So what would be needed is some form of analysis that produces less output but is still able to handle both cases and everything in between. Using classical analysis methods, there is a limit to the product of resolution in time and frequency, similar to the uncertainty principle in quantum physics. Human (and animal) hearing can in some cases go beyond that limit - this is possible only by making some a-priori assumptions about the signal. The problem is similar to one that occurs in time-stretching of audio: the algorithm must decide if some feature should be regarded as significant in the time domain or in the frequency domain. Which is why software such as rubberband has both user options and some not-so-simle internal decision making. As a simple example, take a 1 kHz sinewave that is amplitude modulated by a 10 Hz signal. The actual frequencies present then are 990, 1000, and 1010 Hz. Now how should this be analysed ? Option 1: as a modulated 1 kHz signal. When time-stretched, e.g by a factor of 2, the amplitude as a function of time is preserved, the modulation frequency becomes 5 Hz, and the output frequencies are 995, 1000, and 1005 Hz. Option 2: as three separate and unrelated frequencies. Each of them is stretched separately, and the output is 990, 1000, and 1010 Hz. So this will still sound as 10 Hz modulation, just longer. Which one is correct ? The simple fact is that both are, it is just a matter of interpretation. Deciding this is something our brains are good at, based on experience and expectations. Exactly the same question arises when trying to reduce a signal to something that can be described by a 1D function. Ciao, -- FA _______________________________________________ Linux-audio-user mailing list Linux-audio-user@xxxxxxxxxxxxxxxxxxxx https://lists.linuxaudio.org/listinfo/linux-audio-user