Hi Klaus, I've tested your patch on x86_64 and ARM (I picked the weakest HW I have). I've gathered some statistics: http://pastebin.com/kXEP9Bp7 Notes. A "good" tuner there means the one that returns bigger chunks on average and generally consumes less CPU cycles with a vanilla (not patched) VDR. CPU usage means the CPU usage of the TS buffer thread reported by top -d2. Bit rates are average and coarse. I didn't include unconditional delay dumps. They are "smooth" like those 8 and 15 Mbit/s ARM figures. CPU usage on x86_64 is typically 1.5-2 times less. Some observations: 1. Small reads are really expensive and must be avoided. 2. "Good" tuners are not that "good" at higher bit rates. 3. An "non-delayed" read() (the last read() in the sequence poll()->read()->sleep()->poll()->read()->poll()->read()) is very likely to return a small chunk. With bit rates higher that 30 Mbit/s I got device buffer overflows on ARM - ring buffer processing couldn't catch up. So, generally, it solves the issue, but is not as efficient as with an unconditional delay. Compared to the unconditional patch: 1. CPU usage is higher and not as steady. 2. The read() sizes are not smooth too. Those CPU usage percents (without the patch) may look small, but it's 20-35% of the whole VDR CPU usage and are several times less with the patch, conditional or not. I understand, an unconditional delay here looks a bit scaring. How about increasing the threshold value to, say, 100000, or 500 * TS_SIZE or even higher? In other words, treat delayed read() as a normal operation, and not delayed one as an emergency case. I can test it on x86_64 with 50-60 Mbit/s. What do you think of it? Best, glenvt18. 2016-03-10 12:41 GMT+03:00 Klaus Schmidinger <Klaus.Schmidinger@xxxxxxx>: > On 10.03.2016 02:54, glenvt18 wrote: >> >> Hi folks, >> >> I've found that with some DVB tuner drivers poll() returns when there >> are only small (1-3) number of TS packets available to read(). This >> results in high CPU usage of the TS buffer thread which is busy >> reading small chunks of data in cTSBuffer::Action(). 2 out of 3 tuners >> I tested were affected with this issue. Even with a "good" tuner TS >> buffer thread CPU usage is up to 10% per HD stream on ARM Cortex-A7. >> With the proposed patch it is below 1-2% on all ARM and x86_64 >> platforms I've tested. The delay value of 10 ms can be considered >> safe: >> >> media/dvb-core/dmxdev.h:109 >> #define DVR_BUFFER_SIZE (10*188*1024) >> >> It will take a tuner to receive (10*188*1024)*8*(1000/10) / 1000000 = >> 1540 Mbit/s to overflow the device buffer within 10 ms interval. A >> smaller delay is not enough for ARM. cDvbDevice has a ring buffer of >> 5MB which is larger. >> >> This patch was made against VDR 2.3.1, but it can be applied to VDR >> 2.2.0 as well. >> >> Please review. >> glenvt18 >> --- >> Index: b/device.c >> =================================================================== >> --- a/device.c 2015-09-18 01:04:12.000000000 +0300 >> +++ b/device.c 2016-03-10 03:38:50.078400715 +0300 >> @@ -1768,6 +1768,8 @@ >> break; >> } >> } >> + else >> + cCondWait::SleepMs(10); >> } >> } >> } > > > I'm not too fond of the idea of introducing an additional, unconditional > wait here. The actual problem should be fixed within the drivers. > However, maybe waiting in case there is only a small number of TS packets > available is acceptable: > > --- device.c 2015/09/05 11:42:17 4.2 > +++ device.c 2016/03/10 09:34:11 > @@ -1768,6 +1768,8 @@ > break; > } > } > + else if (r < MIN_TS_PACKETS_FOR_FRAME_DETECTOR * TS_SIZE) > + cCondWait::SleepMs(10); > } > } > } > > The number MIN_TS_PACKETS_FOR_FRAME_DETECTOR * TS_SIZE is just a random pick > to avoid using a concrete literal number here. > Can you please test if this still solves your problem? > > Klaus > > _______________________________________________ > vdr mailing list > vdr@xxxxxxxxxxx > http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr _______________________________________________ vdr mailing list vdr@xxxxxxxxxxx http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr