Re: Open Source Audio Interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 1 Sep 2014, Len Ovens wrote:

On Tue, 2 Sep 2014, Kazakore wrote:

Madi can be sub-ms RT (return trip)

Really? Coax, optical or both? I used MADI routers in my work but that was more about sharing multi-channel audio across sites miles apart than low-latency monitoring... (I also have to sadly admit the MADI system is one of the ones I new the least about by the time I left that job :( )

It depends on distance, but it has almost no overhead compared to most other formats (besides analog and aes3). It depends on the dsp time used to split it up and route.

After reading some more on ethernet, it becomes easier to see why MADI can have a lot less latency than any ethernet transport.

MADI uses a network physical standard, but it does not use some of the other bits. The MADI tx/rx buffer contains one whole MADI frame at a time. The frame itself has no extra data over the size of aes3 data. So each channel is 32bits long. There is no routing information or error correction to be calculated beyond the aes3 parity bit. MADI is a physical point to point protocol. MADI does not use OS drivers to deal with any of this, but rather uses it's own HW to do this as the bit stream enters the card. This means that the audio data from the ADC can reach a DAC at the other end within the same frame if the channel count is less than full or by the next word clock if not. So the latency is one audio word effectively. When used as a computer IF, The card will of course store more than one frame as the audio driver requires. However, this is sw latency and not required by the protocol itself.

ethernet, on the other hand, uses hw that is not controled. There needs to be an OS driver to make it work. Because of the variety of HW and drivers, any audio protocol is at least layer 2. This includes routing information, and restricts data to 1500 bytes (46 audio channels) per packet. That in itself is not such a big deal and would only add one word latency if the whole thing was done in hardware. However, it is dealt with by the OS at both ends which has other things it needs to do, so the latency of the OS affects this too. This includes latency going through switches (and their OS) as well as scheduling around other network traffic. Also the possiblity of colisions exists and so dealing with audio in chunks of words makes sense. What this means in practice, is that as a computer IF, there may be no difference between MADI and a level 2 ethernet transport.

Level 3 (Ip based) transport adds one more level of sw. It expects to deal with other network traffic too. It has another OS controled level of sw that checks for packet order and may delay a packet if it thinks it is out of order. It assumes data integrity is more important than latency. Convenience is a factor as well because domain names can be used to set the link up without any other user input. This again increases latency. Latency can be tuned but it would take user action. Netjack, does this very successfully, but to work well, other traffic needs to be minimized.

In order to use ethernet hardware in any standard fashion, level 2 is the minimum the audio protocol can run at. This means the protocol needs to know the MAC of the other end point. While it would be possible for the user to enter this info, that would make the protocol hard to use and should be avoided. Rather a method of discovering other audio end points would be better. The setup of the system does not have to take place at level 2 but could be higher.

Capabilties of different physical enet IFs would need to be addressed. The protocol would need to know the speed of the slowest link if switches are involved. Any new installation will be using 1000m or higher, but the minimum would be a 100m link. A 10m link would not work because the minimum packet size for eithernet is 84bytes (with guard space). The number of audio frames would be limited to about 14k sample rate. By using a separate protocol it would be possible to use a 10m link at a higher latency. (4 audio frames at a time). I suppose that considering that alsa (or jack anyway) seems to consider 16*2 words at a time anyway, the whole protocol could work this way. Not so that 10m is supported so much as it would allow direct tunneling of IP traffic without splitting up packets.

My thought is something like this:
We control all network traffic. Lets try for 4 words of audio. For sync purposes, at each word boundry a short audio packet is sent of 10 channels. This would be close to minimum enet packet size. Then there should be room for one full size enet packet, in fact even at 100m the small sync packet could contain more than 10 channels (I have basically said 10m would not be supported, but if no network traffic was supported then 10m could do 3 or 4 channels with no word sync). So:
Word 1 - audio sync plus 10 tracks - one full network traffic packet
word 2 - audio sync plus 10 tracks - one full audio packet 40 tracks
					split between word 1 and 2
wors 3 - audio sync plus 10 tracks - one full audio packet 40 tracks
					split between word 2 and 3
word 4 - audio sync plus 10 tracks - one full audio packet 40 tracks
					split between word 3 and 4

This would allow network traffic at ~20m and 40 tracks with 4 word latency. 1000m would allow much better network performance and more channels. I don't know how this would effect CPU use. I haven't mentioned MIDI or other control, but there is space time wise to add it to the audio sync packet. As this is an open spec, I would probably use MIDI or OSC as the control for levels and routing. I have yet to run the numbers, but the ten tracks with sync is not maxed out, it may go as high as 15 or 16 while still leaving one full network packet at each word for 4x the network speed. The thing is this could be controlled on the fly. The user could choose how many channels they wish to use. The ice1712 gave 12/10 i/o on the d44/d66 as well as the d1010, but in this case that would not happen, only the channels needed could show and the user could choose to make physical inputs 7/8 look like alsa 1/2 very easily.

The driver could store more than one network traffic packet at a time and if they are smaller than full 1500 byte size send two in the same window if they are short enough.

In this whole exercise, I am trading through put to gain lower latency and (more) predictable timing. Because of HW differences and the fact that the actual hw is serviced outside our control, I don't think the IF could be used as a sync source. As is the case now, two boxes would require external sync to truely be in sync. Daisy chained boxes could be close enough without external sync to not need resampling, but not close enough to deal with two mics and the same audio.

powered from the cat5 cable should not be included IMO, but it may make sense to do so anyway, so that if someone does do it anyway, it is interchangable. :P

This in not meant to replace things like netjack, which encapsulates audio/MIDI/transport all in one or remote content protocols. This is a local solution meant generally for one room or hall. If other uses are found that is a plus.

Does this make any sense? All calc was done for 24bit/48k audio. As with ADAT, AES3 and MADI channels could be paired if 96k is required.... though I think it is flexable enough on a 1000m (even 100m) line that higher rates could be sent natively.

Enough rambling.

--
Len Ovens
www.ovenwerks.net

_______________________________________________
Linux-audio-user mailing list
Linux-audio-user@xxxxxxxxxxxxxxxxxxxx
http://lists.linuxaudio.org/listinfo/linux-audio-user




[Index of Archives]     [Linux Sound]     [ALSA Users]     [Pulse Audio]     [ALSA Devel]     [Sox Users]     [Linux Media]     [Kernel]     [Photo Sharing]     [Gimp]     [Yosemite News]     [Linux Media]

  Powered by Linux