On Mon, 2009-11-23 at 12:09 -0500, Devin Heitmueller wrote: > On Mon, Nov 23, 2009 at 7:12 AM, Andy Walls <awalls@xxxxxxxxx> wrote: > > 5. If you don't give an MDL back to the firmware, it never uses it > > again. That's why you see the sweep-up log messages. As soon as an MDL > > is skipped *on the order of the depth* of q_busy times, when looking for > > the currently DMA_DONE'd MDL, that skipped MDL must have been dropped. > > It is picked up and put back into rotation then. > > Perhaps I am misinterpreting the definition of "sweep-up" in this > context. Don't the buffers get forcefully returned to the pool at > that point? You've got it right. > If so, why would I see the same error over and over long > after the CPU utilization has dropped back to a reasonable level. > > I feel like I must be missing something here. > > 1. CPU load goes up (ok) > 2. Packets start to get dropped (expected) > 3. CPU load goes back down (ok) > 4. Packets continue to get dropped and never recycled, even after > minutes of virtually no CPU load? > > I can totally appreciate the notion that the video would look choppy > when the system is otherwise under high load, but my expectation would > be that once the load drops back to 0, the video should look fine > (perhaps with some small window of time where it is still recovering). OK the messages are coming from the sweep up implementation, let's assume it's not working right (which is reasonable to me). The sweep up algorithm relies on an assumption. Assumption: The firmware uses MDL on a FIFO basis based on the order in which we submitted the MDLs to the firmware. Thus, the order of MDLs in the q_busy linked list should match how the firmware returns them. Given that the decision to perform sweep up is based on the absolute current depth of q_busy (was the buffer skipped q_busy.depth - 1 times or more?), it turns out that a. For large numbers of MDLs on q_busy, the assumption needs to only be approximately true. b. For very small numbers of MDLs on q_busy (i.e. 2), the assumption needs to be strictly true or errant sweep-ups happen. So I suspect for the case of the CX23418 firmware only having 1 MDL and use giving it another MDL, the CX23418 might use the one we just gave it first - violating the assumption amd causing errant sweep ups. The fix is simple: don't sweep up a skipped buffer that meets the current skipped > = (q_busy.depth -1) criteria in the case of q_full.depth > q_free.depth + qbusy.depth Which says if we've got a lot of MDLs tied up waiting for the application to read them, don't both sweeping up potentially lost buffers until the q_busy.depth increases. Since "lost" MDLs stay on q_busy and are counted in q_busy.depth, this will always end up returning MDLs to the firmware as the application reads data and returns MDLs. Of course that's all speculation about the problem. If you could reproduce the condition and then # echo 271 > /sys/modules/cx18/parameters/debug to record the sequence of CX18_DE_SET_MDLs and DMA_DONE sequence numbers for the YUIV stream, it might provide conifrmation of what is going on. 271 is high volume messages for info, warning, mailbox, and dma debug messages. It will write a lot to your logs. Regards, Andy -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html