[linux-audio-user] Indexing the Read Pointer in /dev/dsp

linux-audio at paypc.com (Malcolm Baldridge) · Mon May 3 17:16:39 2004

Quoting Martin McCormick <martin@xxxxxxxxxxxxxxxxxx>:

> 	I have the beginnings of a program I have written that listens
> to the digital stream from /dev/dsp and records sound to a file when
> there is any sound.  Communications types call this a VOX or Voice
> Operated relay with X being the abbreviation for relay.

[snip]

Funny, I've written a simple program (derived from the Jack "simple_client")
recent to do something similar.

What it does (for now, I'm sure I'll be adding more to it) is:

1) Monitor the command-line specified input port for a sound above the
squelch level.

2) Apply a configurable DC Bias adjustment [squelch comparison is performed
after this]

3) Keep track of the peak samples processed to pass a "scaling value" to
normalise the sound data. [This is also post-DC Bias adjustment.]

3a) I'm also looking at various dynamic compression [gain reduction] strategies.

4) Log all data to a temporary .WAV file. Note, it's sample-perfect.  The
triggering sample is successfully captured.

5) After a command-line specifiable time where the samples are BELOW the
squelch level, the recording is "closed" and a file is written thusly:

  A) A sub-process is spawned to "wash" and LAME-encode (MP3} whilst
embedding tags into the resulting mp3 file to note the origination time/date
(fully specified with time-zone).  The scaling value from the recording
program is passed to LAME to pre-normalise the sound prior to encoding.  

  B) Since the sub-process can "take as long as it wants" without disturbing
the audio monitor threads, I have also tinkered with filtering, and have
found that for my application [recording voice calls from a MERLIN phone
system], a 100Hz-4000Hz bandpass works wonderfully.

I use 8-32Kbps MP3 VBR settings, which work very well.  Speaker
intelligibility is flawless.  I record at 16 bits, 11,025 Hz, FYI.

LAME command line:

voxencode is passed three variables:

$1 = a fully specified verbose (RFC-style) Date string
$2 = Filename prefix
$3 = scaling value [1.00 or no parametre means no scaling] -- this isn't
represented in the LAME command below].

lame -S --silent --nohist -q 2 -h --vbr-new -b 8 -B 32 \
--tt "Title Prefix $1" --ta "Your Name or Label" \
--tl 'Recording Source' --ty `date +%Y` --tc "Recorded $1" \
--add-id3v2 --pad-id3v2 --tg Speech "$2.wav" "$2.mp3"

Is there a reason why you want to store the timestamp into the actual audio
stream?  There are many formats where you can place such data "out of band".

I generate the .wav and .mp3 filenames thusly:
"Prefix-YYYY-MM-DD-HH-MM-SS.wav" and .mp3 respectively.  My sub-procress
spawns to a voxencode script which decouples the post-processing from the
main capturing application which is multi-threaded.  The capturing app runs
via nohup &, so it doesn't need a TTY.

If you only want one big file instead of alot of files, I'm not sure how
you'd get what you want unless you wrote your own audio player.  I would
think that having lots of files wouldn't be problematic since most mp3/media
players can let you just "play all files in a directory" easily enough if
you just want to listen to the whole thing w/o having to do anything in the
UI between files.

Why do the Dwarves prize mithril above all else?  Quality.

=MB=

-- 
A focus on Quality.