Re: Tinkering with Compressed Speech

Janina Sajka <janina@xxxxxxxxxxx> · Wed, 6 Oct 2004 10:56:50 -0400

Changing the speed without changing the pitch is canonically called
"time scale modification." There are f/oss libraries for doing this that
you might want to use, or at least look at. Two that immediately come to
mind are:

libSoundTouch
Mario Lang's yatm uses an earlier version of these libraries.
s designed to be a DAISY player where time-scale mods are traditionally
desireed by users.

Also, there was the following posted this week on the Linux Audio User
list:
From: Dave Phillips <dlphilp@xxxxxxxxxx>

Greetings:

 Matt Flax sent these announcement to me recently, and since I've not
had time to update the Linuxsoundapps site I'm passing them on the lists
for now:

MFFM Time Scale Modification for Audio
<http://sourceforge.net/projects/mffmtimescale/>
http://mffmtimescale.sourceforge.net/
This is an engine which stretches audio without changing its pitch. The
C++ headers are really easy to use in 3rd party apps.

MFFM Bit Stream <http://sourceforge.net/projects/mffmbitstream/>
http://mffmbitstream.sourceforge.net/
This is a library which allows one to stream bits to/from files ...
handy for those people who want to make an audio compression engine or
player.

MFFM Multimedia Time Code
<http://sourceforge.net/projects/mffmtimecode/>
http://mffmtimecode.sourceforge.net/
These C++ headers are great for managing time code ... it automaticly
expresses time code in any base units ... which must be set up by the
user ... can run any type of SMPTE, CDDA, etc. .. It is templated so you
can handle any data type ... audio, video, audio/video.... It also
includes a type II filter for audio processing ... this is very handy
for filtering audio with FIR or IIR filters.

Martin McCormick writes:
> 	I tried an experiment this last weekend to see how hard it is
> to write code that compresses audio similarly to what the old APH
> pitch restoring speech compressers used to do.  In my case, mine still
> does, but it is almost 30 years old and I know it will one day bite
> the dust.
> 
> 	I was also trying for simplicity first so I went for a program
> that gives you exactly twice the speech speed out as went in.
> 
> 	For those of you who like to play with this sort of thing, the
> easiest way to get started is to use /dev/dsp in Linux.  It behaves
> just like a file but when you write to it with 8-bit audio at 8,000
> samples per second, you get sound from your sound card if it is
> working correctly.  If you read from it, you get an 8,000
> byte-per-second stream that you can direct in to a file or whatever
> you like to do with this stream.
> 
> 	What I did was to write a little program that opens a raw
> audio file and begins counting samples.  The old speech compressors
> were based on a 20-millisecond sample of sound which is about 50
> samples per second.  This slices the audio up in to little fragments
> that are 1/50 of a second long.  What I did in my experiment was to
> pass the first 160 bytes of audio from the file to the output and then
> throw away the next 160 samples.  When the counter hit 320, I reset it
> and began passing more audio.
> 
> 	The result is audio at twice the normal tempo but still at the
> correct pitch.  It also has the distortion we find in the older
> devices.  What you actually hear is what sounds like static as the
> wave form of the voice gets cut in one place at the end of a sample
> and then resumes abruptly in a different part at the beginning of the
> next sample.
> 
> 	I guess my next experiment will be to try to make the samples
> start and end at slightly different times to attempt to preserve the
> wave form being compressed.  This should make the sound more smooth,
> I hope.
> 
> 	I remember in late high school or early college which would
> have been late sixties and early seventies for me, hearing about
> speech compressers that used a rotating head like a video recorder to
> do the audio slicing.  I may be exaggerating but the price of $50,000
> comes to mind.  These were probably modified video tape recorders
> originally built for television studios.  Only institutions with lots
> of dough could have bought one of those and I bet they were a real
> beast to maintain, kind of like the first Kersweils.
> 
> 	When the first electronic speech compressers came out in the
> early seventies, I longed for one but they still cost over a thousand
> Dollars.
> 
> 	Finally, the APH began selling their pitch-restorer device
> around 1975 at a price that was reasonable enough so that us common
> folk could buy them.
> 
> 	My test program which is definitely not a replacement yet for
> one of those devices is done totally with software and the existing
> sound card hardware.  It has less background noise than the APH box
> but the static I mentioned is pretty distracting so it exchanges one
> form of discomfort for another.
> 
> 	If you want to play with it, be sure your sound card works
> first.  If you have ALSA installed on your system, it should make your
> sound card work in the manner it is supposed to work in UNIX.  Here is
> the little program I wrote which I called cp2x.  Have fun, but don't
> blame me if your computer catches fire or eats your cat.  I don't have
> anything in here that is normally seen as dangerous.  Take the source
> code and compile it with
> 
> gcc -ocp2x cp2x.c
> 
> You could even compile with
> 
> gcc cp2x.c
> 
> and then your executable is called a.out.  Since gcc always makes
> a.out as the default executable file name, this isn't a very smart
> move if you plan to use it for more than a few seconds.  Cut here for
> source.
> 
> #include <stdio.h>
> #include <ctype.h>
> #include <strings.h>
> typedef int		boolean;		/* boolean data type */
>    #define TRUE 1
>    #define FALSE 0
> 
> main(int argc, char **argv)
> {
> FILE *soundinput;
> FILE *sounddev;
> unsigned char c = 0;
> int index = 0;
> char s4[] = "/dev/dsp";
> 
>  if ((soundinput = fopen(argv[1],"rb")) == NULL) {
>   perror(argv[1]);
>   exit(1);
> }
> 
>  if ((sounddev = fopen(s4,"w")) == NULL) {
>   perror(s4);
>   exit(1);
>  }
> 
> index = 0;
> while(fread(&c,sizeof(c),1,soundinput))
> { /*read loop*/
> if (index <160) putc (c,sounddev);
> index++;
> if (index == 320) index = 0;
> } /*read loop*/
> }
> 
> _______________________________________________
> 
> Blinux-list@xxxxxxxxxx
> https://www.redhat.com/mailman/listinfo/blinux-list

-- 

				Janina Sajka, Chair
				Accessibility Workgroup
				Free Standards Group (FSG)

janina@xxxxxxxxxxxxxxxxx	Phone: +1 202.494.7040

_______________________________________________

Blinux-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/blinux-list