>>>>> "Jan" == Jan Hudec <bulb@ucw.cz> writes: Jan> On Fri, Oct 18, 2002 at 12:03:25PM +0300, Momchil Velikov wrote: >> >>>>> "Jan" == Jan Hudec <bulb@ucw.cz> writes: >> Jan> On Thu, Oct 17, 2002 at 04:54:07PM +0300, Momchil Velikov wrote: >> >> >>>>> "Momchil" == Momchil Velikov <velco@fadata.bg> writes: >> >> >> >> >>>>> "Nagaraj" == Nagaraj <nagaraj@smartyantra.com> writes: Momchil> FWIW, the problem is the classic producer-consumer problem with Momchil> solutions described in _any_ OS textbook. >> Jan> No, it probably isn't. I expect that the camera is giving frames at Jan> constant rate and the video encoder wants to grab the latest if it's Jan> completed, but does not care when it missed some. >> >> Does it make any difference if the camera or the encoder misses a >> frame ? If not (and I think not) it is exactly a producer/consumer >> problem, where the produces simply stop producing when there are no >> buffers available, i.e. misses a frame. Note that it is also better >> for the camera to miss a frame, because the frame data does not enter >> the computer at all, thus it does not spend bus bandwith, memory >> bandwith, caches, whatever, etc., all or any of them. Jan> But as far as I understood the code, the driver copies and discards the Jan> frame, when it's not read... (Well, it might be wiser of the driver not Jan> to initiate DMA at all, of course). Momchil> Thus, the right solution would be to use semaphores. But, AFAIK, Momchil> there are no semaphores shared between the userspace and the kernel. >> Jan> This is rather oversimplified. That any OS textbook will tell you, that Jan> all the synchronization primitives are equivalent. >> >> Which synchronization primitives are equivalent ? Is a barrier >> equivalent to a condition variable ? Or one of them is not a >> synchronization mechanism ? Jan> In my any OS book (well, in my any OS lecture), they didn't consider Jan> barrier. The other ones, that is semaphore, message queue and Jan> conditional variable (and mutual exclusion, but it's a binary semaphore Jan> which is a special case of semaphore). See, "semaphore", "message queue", "mutual exclusion" can mean many things. One have to think about concrete specifications in order to compare them. What is a message queue ? System V message queue ? POSIX message queue ? SOCK_DGRAM socket ? Having said that, a POSIX binary semaphore (a concrete specification) is not equivalent to a POSIX mutex (another concrete specification). IMHO, when implementing some concurrent program, one have to choose the synchronization primitives will most constrained semantics that would suffice, because more general solutions tend to be most expensive. Of course, customized synchornization that fits only to the problem in hand is most desirable from the perfomance point of view, but one has to draw the line somewhere instead of implementing everything with load-linked/store-conditional and memory barriers. Semaphores look a good compromise for this problem. >> True, everything can be implemented with mutexes and conditions, but >> anyone in the real world has to consider the quality of implementation >> issues too. Jan> Yes, it should. But then message queue is most appropriate since it can Jan> also pass the actual data along. That's what a file descriptor with Jan> appropriately implemented poll is. It involves copying. Theoretically it is possible to have the read system call avoid copying for whole overwriten pages (by exchanging page table entries (and flushing TLBs :-( )), but this may work well on some systems, work not so well on others and not work AT ALL when the source buffer is actually device memory. >> For example native semaphores are never of lower performance than >> mutex/cond implementantion (otherwise they would be implemented with >> mutex/cond). It is not at all accidentally that POSIX has separate >> semaphore primitives. Just think of a broadcast on a condition >> variable and how all the woken up processes IN TURN lock the mutex, >> polluting the mutex cache line, which begins wildly bouncing back and >> forth between CPUs. A semaphore post operation can be implemented >> WITHOUT ANY WRITES to the semaphore if there are waiters. You can't >> get much more scalable. Jan> ... oh well, in kernel they ARE. In fact, kernel has wait queues, that Jan> are properly atomic without need for mutex (spin lock), because the Jan> condition can be tested between announcig going to sleep and actually Jan> yielding. But the semaphore has to be spin-locked anyway... Jan> The sigwait mechanizm should actually be correct synchronization. >> Jan> Having POLLIN on the descriptor iff a complete buffer is ready would be Jan> better. >> Momchil> Probably futexes can do the work. >> Jan> Why when character devices already are perfect message queues? >> >> Character devices are for I/O. ABUSING them for concurrency control >> can be justified ONLY when there are no other primitives of adequate >> performance. Jan> You are doing it too;-) Ioctl is operation on a device. Indeed, I'm abusing ioctls. But, that's fine, they are accustomed to being abused :) Jan> Well, what Jan> really matters here is the context switch. Context switch has negligible overhead compared to a frame copy. And, of course, block does not equal context switch. See, the driver has no need of separate thread. It performs it's work in the context of the encoder or in interrupt context, thus no context switches are involved. Don't get mislead by the driver pseudocode I posted - I said "driver sequence of actions" exactly in order to describe the events that happen but not the actual program. Jan> Thus I still think there is Jan> no performance gain in using ioctl over poll. And there is a convenience Jan> gain in poll. (There is a performance loss in signals however, because Jan> signals are hell slow on linux). No copy. >> >> Alternatively (and better), >> Jan> Don't agree with better. It adds more ioctl crap. It would be Jan> better if it was poll instead of ad-hoc ioctl. >> >> IOCTLs are crap, true. >> Jan> But yes, mmap has advantage being no-copy. >> >> That's what I mean by "better". Jan> In that, yes. Jan> I still don't like the ioctls for synchronization, since the pocess has Jan> to also poll for the network to accept the data. And this would force it Jan> to have a helper thread just because it does not integrate with poll. >> >> Hmm, how come that the MPEG encoder has to poll the network on the >> read side ? Jan> No,... the network write side... but buffers in TCP stack are not Jan> unlimited. They can fill and then the stack can refuse to accept more Jan> data for sending. >> [snip] >> >> >> Have the driver allocate 2 buffers and mmap() them into the process. >> >> Have the driver create 2 semaphores (initially zero) and let the app >> >> post and wait on them with ioctls. >> >> >> >> void *buf[2]; >> >> >> >> buf [0] = mmap (fd, ...); >> >> buf [1] = buf [0] + HALF_BUFFER_SIZE; /* GCC extension :) */ >> >> >> >> no = 0; >> >> while (!done ()) >> >> { >> >> /* Let the driver know a buffer is available. DMA starts >> >> if not started already. */ >> >> ioctl (fd, POSTSEM_0); >> >> /* Wait until DMA interrupts and the interrupt handler signals the >> >> semaphore. Driver continues filling the other buffer. */ >> >> ioctl (fd, WAITSEM_1); >> >> >> >> /* Data is in buffer, no copying needed. */ >> >> do_stuff (buf [no]); >> >> >> >> /* Switch buffers. */ >> >> no = !no; >> >> } >> >> Jan> You seem to have the semaphores wrong. >> >> I DON'T THINK SO. >> >> Can you describe a scenario where the system would deadlock ? Or you >> just do not understand the above pseudocode ? >> >> Here's the driver sequence of actions: >> >> sem_wait (SEM_0); >> >> fill_buffer (0); >> sem_post (SEM_1); >> >> fill_buffer (1); >> sem_post (SEM_1); >> >> no = 0; >> while (!done ()) >> { >> sem_wait (SEM_0); >> fill_buffer (no); >> sem_post (SEM_1); >> no = !no; >> } >> >> See ? Jan> Oh, see. You have just one semaphore pair and believe that both sides Jan> will always have same idea of which buffer is actual. Well, yes, they have to independently keep track of the current buffer to read/write. And that's good - one less cache line to share (as opposed to having a couple of shared variables ``next_read_idx'', ``next_write_idx''. Jan> If they do at the start, they will, but I am paranoid and think, Jan> that in reality, impossible things happen. Heh, impossible things does not happen by definition :) >> [irrelevant textbook example snipped] >> Jan> But that is proper solution for producer-consument. This is NOT Jan> producer-consument. >> >> See above. It is. Jan> As the driver is written, it is not. But it can be and probably the Jan> driver should be modified so it was, because that would save some bus Jan> bandwidth. Hmm, let's see - two processes, one produces data, one consumes data, both work with different rates and communicate through a bounded buffer - yep, it is producer/consumer. >> >> One can add buffers to compensate for jitter. On a real-time OS two >> >> buffers ought to be enough. Ok ? >> Jan> That's drivers choice! Driver allocates them (event here). >> >> The only need for more than two buffers is to compensate for >> scheduling delays in the consumer. Jan> I agree. I just say that number of buffers is decided by the kernel Jan> side. Ok. ~velco -- Kernelnewbies: Help each other learn about the Linux kernel. Archive: http://mail.nl.linux.org/kernelnewbies/ FAQ: http://kernelnewbies.org/faq/