On Fri, Oct 18, 2002 at 02:35:18PM +0300, Momchil Velikov wrote: > >>>>> "Jan" == Jan Hudec <bulb@ucw.cz> writes: > > Jan> On Fri, Oct 18, 2002 at 12:03:25PM +0300, Momchil Velikov wrote: > >> >>>>> "Jan" == Jan Hudec <bulb@ucw.cz> writes: > >> > Jan> On Thu, Oct 17, 2002 at 04:54:07PM +0300, Momchil Velikov wrote: > >> >> >>>>> "Momchil" == Momchil Velikov <velco@fadata.bg> writes: > >> >> > >> >> >>>>> "Nagaraj" == Nagaraj <nagaraj@smartyantra.com> writes: > Momchil> FWIW, the problem is the classic producer-consumer problem with > Momchil> solutions described in _any_ OS textbook. > >> > Jan> No, it probably isn't. I expect that the camera is giving frames at > Jan> constant rate and the video encoder wants to grab the latest if it's > Jan> completed, but does not care when it missed some. > >> > >> Does it make any difference if the camera or the encoder misses a > >> frame ? If not (and I think not) it is exactly a producer/consumer > >> problem, where the produces simply stop producing when there are no > >> buffers available, i.e. misses a frame. Note that it is also better > >> for the camera to miss a frame, because the frame data does not enter > >> the computer at all, thus it does not spend bus bandwith, memory > >> bandwith, caches, whatever, etc., all or any of them. > > Jan> But as far as I understood the code, the driver copies and discards the > Jan> frame, when it's not read... (Well, it might be wiser of the driver not > Jan> to initiate DMA at all, of course). > > Momchil> Thus, the right solution would be to use semaphores. But, AFAIK, > Momchil> there are no semaphores shared between the userspace and the kernel. > >> > Jan> This is rather oversimplified. That any OS textbook will tell you, that > Jan> all the synchronization primitives are equivalent. > >> > >> Which synchronization primitives are equivalent ? Is a barrier > >> equivalent to a condition variable ? Or one of them is not a > >> synchronization mechanism ? > > Jan> In my any OS book (well, in my any OS lecture), they didn't consider > Jan> barrier. The other ones, that is semaphore, message queue and > Jan> conditional variable (and mutual exclusion, but it's a binary semaphore > Jan> which is a special case of semaphore). > > See, "semaphore", "message queue", "mutual exclusion" can mean many > things. One have to think about concrete specifications in order to > compare them. What is a message queue ? System V message queue ? POSIX > message queue ? SOCK_DGRAM socket ? > > Having said that, a POSIX binary semaphore (a concrete specification) > is not equivalent to a POSIX mutex (another concrete specification). Yes, they are, in the sense that you can use posix binary semaphore (as only synchronization function) to implement posix mutex iterface and vice versa. It will probably be less efficient. > IMHO, when implementing some concurrent program, one have to choose > the synchronization primitives will most constrained semantics that > would suffice, because more general solutions tend to be most > expensive. Of course, customized synchornization that fits only to the > problem in hand is most desirable from the perfomance point of view, > but one has to draw the line somewhere instead of implementing > everything with load-linked/store-conditional and memory barriers. ... the problem is, that with kernel-level threads, you have to use kernel-level synchronization primitives, so you must rely on what is available in the system library. > Semaphores look a good compromise for this problem. Well, maybe actually yes, because poll is not good for synchronizing mmaped io... damn, they still talk about aio on lkml, but I haven't yet found out how it's used. That may include something that's useful in this case. > >> True, everything can be implemented with mutexes and conditions, but > >> anyone in the real world has to consider the quality of implementation > >> issues too. > > Jan> Yes, it should. But then message queue is most appropriate since it can > Jan> also pass the actual data along. That's what a file descriptor with > Jan> appropriately implemented poll is. > > It involves copying. Theoretically it is possible to have the read > system call avoid copying for whole overwriten pages (by exchanging > page table entries (and flushing TLBs :-( )), but this may work well > on some systems, work not so well on others and not work AT ALL when > the source buffer is actually device memory. No, it won't work, because the read buffer would have to have same alignment as the data in cache. > >> For example native semaphores are never of lower performance than > >> mutex/cond implementantion (otherwise they would be implemented with > >> mutex/cond). It is not at all accidentally that POSIX has separate > >> semaphore primitives. Just think of a broadcast on a condition > >> variable and how all the woken up processes IN TURN lock the mutex, > >> polluting the mutex cache line, which begins wildly bouncing back and > >> forth between CPUs. A semaphore post operation can be implemented > >> WITHOUT ANY WRITES to the semaphore if there are waiters. You can't > >> get much more scalable. > > Jan> ... oh well, in kernel they ARE. In fact, kernel has wait queues, that > Jan> are properly atomic without need for mutex (spin lock), because the > Jan> condition can be tested between announcig going to sleep and actually > Jan> yielding. But the semaphore has to be spin-locked anyway... > > Jan> The sigwait mechanizm should actually be correct synchronization. > >> > Jan> Having POLLIN on the descriptor iff a complete buffer is ready would be > Jan> better. > >> > Momchil> Probably futexes can do the work. > >> > Jan> Why when character devices already are perfect message queues? > >> > >> Character devices are for I/O. ABUSING them for concurrency control > >> can be justified ONLY when there are no other primitives of adequate > >> performance. > > Jan> You are doing it too;-) Ioctl is operation on a device. > > Indeed, I'm abusing ioctls. But, that's fine, they are accustomed to > being abused :) > > Jan> Well, what > Jan> really matters here is the context switch. > > Context switch has negligible overhead compared to a frame copy. And, > of course, block does not equal context switch. See, the driver has no > need of separate thread. It performs it's work in the context of the > encoder or in interrupt context, thus no context switches are > involved. Don't get mislead by the driver pseudocode I posted - I > said "driver sequence of actions" exactly in order to describe the > events that happen but not the actual program. Agree, that solution using mmap will definitely be faster than anything using read. But I can imagine a notifier using auxiliary stream and poll, that would be a little slower, but allow to handle several things in single thread... (when it's producer/consumer, it can probably block on both input from driver (in ioctl) and output to net (write), so it shouldn't matter. > Jan> Thus I still think there is > Jan> no performance gain in using ioctl over poll. And there is a convenience > Jan> gain in poll. (There is a performance loss in signals however, because > Jan> signals are hell slow on linux). > > No copy. Well, you could have a signal stream, on which you would poll and have the actual data mmaped. Devices are free to do about anything in their poll handler... I admit it's a bit more complicated than ioctl ------------------------------------------------------------------------------- Jan 'Bulb' Hudec <bulb@ucw.cz> -- Kernelnewbies: Help each other learn about the Linux kernel. Archive: http://mail.nl.linux.org/kernelnewbies/ FAQ: http://kernelnewbies.org/faq/