Re: real-time process

Momchil Velikov <velco@fadata.bg> · 18 Oct 2002 14:35:18 +0300

>>>>> "Jan" == Jan Hudec <bulb@ucw.cz> writes:

Jan> On Fri, Oct 18, 2002 at 12:03:25PM +0300, Momchil Velikov wrote:
>> >>>>> "Jan" == Jan Hudec <bulb@ucw.cz> writes:
>> 
Jan> On Thu, Oct 17, 2002 at 04:54:07PM +0300, Momchil Velikov wrote:
>> >> >>>>> "Momchil" == Momchil Velikov <velco@fadata.bg> writes:
>> >> 
>> >> >>>>> "Nagaraj" == Nagaraj  <nagaraj@smartyantra.com> writes:
Momchil> FWIW, the problem is the classic producer-consumer problem with
Momchil> solutions described in _any_ OS textbook.
>> 
Jan> No, it probably isn't. I expect that the camera is giving frames at
Jan> constant rate and the video encoder wants to grab the latest if it's
Jan> completed, but does not care when it missed some.
>> 
>> Does it make any difference if the camera or the encoder misses a
>> frame ?  If not (and I think not) it is exactly a producer/consumer
>> problem, where the produces simply stop producing when there are no
>> buffers available, i.e. misses a frame.  Note that it is also better
>> for the camera to miss a frame, because the frame data does not enter
>> the computer at all, thus it does not spend bus bandwith, memory
>> bandwith, caches, whatever, etc., all or any of them.

Jan> But as far as I understood the code, the driver copies and discards the
Jan> frame, when it's not read... (Well, it might be wiser of the driver not
Jan> to initiate DMA at all, of course).

Momchil> Thus, the right solution would be to use semaphores.  But, AFAIK,
Momchil> there are no semaphores shared between the userspace and the kernel.
>> 
Jan> This is rather oversimplified. That any OS textbook will tell you, that
Jan> all the synchronization primitives are equivalent.
>> 
>> Which synchronization primitives are equivalent ? Is a barrier
>> equivalent to a condition variable ? Or one of them is not a
>> synchronization mechanism ?

Jan> In my any OS book (well, in my any OS lecture), they didn't consider
Jan> barrier. The other ones, that is semaphore, message queue and
Jan> conditional variable (and mutual exclusion, but it's a binary semaphore
Jan> which is a special case of semaphore).

See, "semaphore", "message queue", "mutual exclusion" can mean many
things.  One have to think about concrete specifications in order to
compare them. What is a message queue ? System V message queue ? POSIX
message queue ? SOCK_DGRAM socket ?

Having said that, a POSIX binary semaphore (a concrete specification)
is not equivalent to a POSIX mutex (another concrete specification).

IMHO, when implementing some concurrent program, one have to choose
the synchronization primitives will most constrained semantics that
would suffice, because more general solutions tend to be most
expensive. Of course, customized synchornization that fits only to the
problem in hand is most desirable from the perfomance point of view,
but one has to draw the line somewhere instead of implementing
everything with load-linked/store-conditional and memory barriers.

Semaphores look a good compromise for this problem.

>> True, everything can be implemented with mutexes and conditions, but
>> anyone in the real world has to consider the quality of implementation
>> issues too.

Jan> Yes, it should. But then message queue is most appropriate since it can
Jan> also pass the actual data along. That's what a file descriptor with
Jan> appropriately implemented poll is.

It involves copying.  Theoretically it is possible to have the read
system call avoid copying for whole overwriten pages (by exchanging
page table entries (and flushing TLBs :-( )), but this may work well
on some systems, work not so well on others and not work AT ALL when
the source buffer is actually device memory.

>> For example native semaphores are never of lower performance than
>> mutex/cond implementantion (otherwise they would be implemented with
>> mutex/cond).  It is not at all accidentally that POSIX has separate
>> semaphore primitives.  Just think of a broadcast on a condition
>> variable and how all the woken up processes IN TURN lock the mutex,
>> polluting the mutex cache line, which begins wildly bouncing back and
>> forth between CPUs.  A semaphore post operation can be implemented
>> WITHOUT ANY WRITES to the semaphore if there are waiters.  You can't
>> get much more scalable.

Jan> ... oh well, in kernel they ARE. In fact, kernel has wait queues, that
Jan> are properly atomic without need for mutex (spin lock), because the
Jan> condition can be tested between announcig going to sleep and actually
Jan> yielding. But the semaphore has to be spin-locked anyway...

Jan> The sigwait mechanizm should actually be correct synchronization.
>> 
Jan> Having POLLIN on the descriptor iff a complete buffer is ready would be
Jan> better.
>> 
Momchil> Probably futexes can do the work.
>> 
Jan> Why when character devices already are perfect message queues?
>> 
>> Character devices are for I/O. ABUSING them for concurrency control
>> can be justified ONLY when there are no other primitives of adequate
>> performance.

Jan> You are doing it too;-) Ioctl is operation on a device.

Indeed, I'm abusing ioctls. But, that's fine, they are accustomed to
being abused :)

Jan> Well, what
Jan> really matters here is the context switch.

Context switch has negligible overhead compared to a frame copy. And,
of course, block does not equal context switch. See, the driver has no
need of separate thread. It performs it's work in the context of the
encoder or in interrupt context, thus no context switches are
involved.  Don't get mislead by the driver pseudocode I posted - I
said "driver sequence of actions" exactly in order to describe the
events that happen but not the actual program.

Jan> Thus I still think there is
Jan> no performance gain in using ioctl over poll. And there is a convenience
Jan> gain in poll. (There is a performance loss in signals however, because
Jan> signals are hell slow on linux).

No copy.

>> >> Alternatively (and better),
>> 
Jan> Don't agree with better. It adds more ioctl crap. It would be
Jan> better if it was poll instead of ad-hoc ioctl.
>> 
>> IOCTLs are crap, true.
>> 
Jan> But yes, mmap has advantage being no-copy.
>> 
>> That's what I mean by "better".  

Jan> In that, yes.

Jan> I still don't like the ioctls for synchronization, since the pocess has
Jan> to also poll for the network to accept the data. And this would force it
Jan> to have a helper thread just because it does not integrate with poll.
>> 
>> Hmm, how come that the MPEG encoder has to poll the network on the
>> read side ?

Jan> No,... the network write side... but buffers in TCP stack are not
Jan> unlimited. They can fill and then the stack can refuse to accept more
Jan> data for sending.

>> [snip]
>> 
>> >> Have the driver allocate 2 buffers and mmap() them into the process.
>> >> Have the driver create 2 semaphores (initially zero) and let the app
>> >> post and wait on them with ioctls.
>> >> 
>> >> void *buf[2];
>> >> 
>> >> buf [0] = mmap (fd, ...);
>> >> buf [1] = buf [0] + HALF_BUFFER_SIZE; /* GCC extension :) */
>> >> 
>> >> no = 0;
>> >> while (!done ())
>> >> {
>> >> /* Let the driver know a buffer is available. DMA starts
>> >> if not started already.  */
>> >> ioctl (fd, POSTSEM_0);
>> >> /* Wait until DMA interrupts and the interrupt handler signals the
>> >> semaphore.  Driver continues filling the other buffer.  */
>> >> ioctl (fd, WAITSEM_1);
>> >> 
>> >> /* Data is in buffer, no copying needed.  */
>> >> do_stuff (buf [no]);
>> >> 
>> >> /* Switch buffers.  */
>> >> no = !no;   
>> >> }
>> 
>> 
Jan> You seem to have the semaphores wrong.
>> 
>> I DON'T THINK SO.
>> 
>> Can you describe a scenario where the system would deadlock ? Or you
>> just do not understand the above pseudocode ?
>> 
>> Here's the driver sequence of actions:
>> 
>> sem_wait (SEM_0);
>> 
>> fill_buffer (0);
>> sem_post (SEM_1);
>> 
>> fill_buffer (1);
>> sem_post (SEM_1);
>> 
>> no = 0;
>> while (!done ())
>> {
>> sem_wait (SEM_0);
>> fill_buffer (no);
>> sem_post (SEM_1);
>> no = !no;
>> }
>> 
>> See ?

Jan> Oh, see. You have just one semaphore pair and believe that both sides
Jan> will always have same idea of which buffer is actual.

Well, yes, they have to independently keep track of the current buffer
to read/write. And that's good - one less cache line to share (as
opposed to having a couple of shared variables ``next_read_idx'',
``next_write_idx''.

Jan> If they do at the start, they will, but I am paranoid and think,
Jan> that in reality, impossible things happen.

Heh, impossible things does not happen by definition :)

>> [irrelevant textbook example snipped]
>> 
Jan> But that is proper solution for producer-consument. This is NOT
Jan> producer-consument.
>> 
>> See above. It is.

Jan> As the driver is written, it is not. But it can be and probably the
Jan> driver should be modified so it was, because that would save some bus
Jan> bandwidth.

Hmm, let's see - two processes, one produces data, one consumes data,
both work with different rates and communicate through a bounded
buffer - yep, it is producer/consumer.

>> >> One can add buffers to compensate for jitter. On a real-time OS two
>> >> buffers ought to be enough.  Ok ?
>> 
Jan> That's drivers choice! Driver allocates them (event here).
>> 
>> The only need for more than two buffers is to compensate for
>> scheduling delays in the consumer.

Jan> I agree. I just say that number of buffers is decided by the kernel
Jan> side.

Ok.

~velco
--
Kernelnewbies: Help each other learn about the Linux kernel.
Archive:       http://mail.nl.linux.org/kernelnewbies/
FAQ:           http://kernelnewbies.org/faq/