Re: [PATCH] Fix TS buffer thread high CPU usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



UPDATE.

Here are 60 Mbit/s (entire transponder) test results:

http://pastebin.com/nHDN8nsW

- the same behaviour as for 30- Mbit/s, no huge chunks on the second
read. Threshold value of 70000 was chosen for demonstration only. It's
just a bit smaller than the mean which is about 75000. Notice that the
max value is 1.6x higher in the case of vanilla VDR. So, the threshold
value of about 100000 is OK.

Best,
glenvt18.


2016-03-14 19:05 GMT+03:00 glenvt18 <glenvt18@xxxxxxxxx>:
> Hi Klaus,
>
> I've tested your patch on x86_64 and ARM (I picked the weakest HW I
> have). I've gathered some statistics:
>
> http://pastebin.com/kXEP9Bp7
>
> Notes.
> A "good" tuner there means the one that returns bigger chunks on
> average and generally consumes less CPU cycles with a vanilla (not
> patched) VDR.
> CPU usage means the CPU usage of the TS buffer thread reported by top -d2.
> Bit rates are average and coarse.
> I didn't include unconditional delay dumps. They are "smooth" like
> those 8 and 15 Mbit/s ARM figures. CPU usage on x86_64 is typically
> 1.5-2 times less.
>
> Some observations:
> 1. Small reads are really expensive and must be avoided.
> 2. "Good" tuners are not that "good" at higher bit rates.
> 3. An "non-delayed" read() (the last read() in the sequence
> poll()->read()->sleep()->poll()->read()->poll()->read()) is very
> likely to return a small chunk.
>
> With bit rates higher that 30 Mbit/s I got device buffer overflows on
> ARM - ring buffer processing couldn't catch up.
>
> So, generally, it solves the issue, but is not as efficient as with an
> unconditional delay. Compared to the unconditional patch:
> 1. CPU usage is higher and not as steady.
> 2. The read() sizes are not smooth too.
>
> Those CPU usage percents (without the patch) may look small, but it's
> 20-35% of the whole VDR CPU usage and are several times less with the
> patch, conditional or not.
>
> I understand, an unconditional delay here looks a bit scaring. How
> about increasing the threshold value to, say, 100000, or 500 * TS_SIZE
> or even higher? In other words, treat delayed read() as a normal
> operation, and not delayed one as an emergency case. I can test it on
> x86_64 with 50-60 Mbit/s. What do you think of it?
>
> Best,
> glenvt18.
>
> 2016-03-10 12:41 GMT+03:00 Klaus Schmidinger <Klaus.Schmidinger@xxxxxxx>:
>> On 10.03.2016 02:54, glenvt18 wrote:
>>>
>>> Hi folks,
>>>
>>> I've found that with some DVB tuner drivers poll() returns when there
>>> are only small (1-3) number of TS packets available to read(). This
>>> results in high CPU usage of the TS buffer thread which is busy
>>> reading small chunks of data in cTSBuffer::Action(). 2 out of 3 tuners
>>> I tested were affected with this issue. Even with a "good" tuner TS
>>> buffer thread CPU usage is up to 10% per HD stream on ARM Cortex-A7.
>>> With the proposed patch it is below 1-2% on all ARM and x86_64
>>> platforms I've tested. The delay value of 10 ms can be considered
>>> safe:
>>>
>>> media/dvb-core/dmxdev.h:109
>>> #define DVR_BUFFER_SIZE (10*188*1024)
>>>
>>> It will take a tuner to receive (10*188*1024)*8*(1000/10) / 1000000 =
>>> 1540 Mbit/s to overflow the device buffer within 10 ms interval. A
>>> smaller delay is not enough for ARM. cDvbDevice has a ring buffer of
>>> 5MB which is larger.
>>>
>>> This patch was made against VDR 2.3.1, but it can be applied to VDR
>>> 2.2.0 as well.
>>>
>>> Please review.
>>> glenvt18
>>> ---
>>> Index: b/device.c
>>> ===================================================================
>>> --- a/device.c    2015-09-18 01:04:12.000000000 +0300
>>> +++ b/device.c    2016-03-10 03:38:50.078400715 +0300
>>> @@ -1768,6 +1768,8 @@
>>>                       break;
>>>                       }
>>>                    }
>>> +              else
>>> +                 cCondWait::SleepMs(10);
>>>                 }
>>>              }
>>>        }
>>
>>
>> I'm not too fond of the idea of introducing an additional, unconditional
>> wait here. The actual problem should be fixed within the drivers.
>> However, maybe waiting in case there is only a small number of TS packets
>> available is acceptable:
>>
>> --- device.c    2015/09/05 11:42:17     4.2
>> +++ device.c    2016/03/10 09:34:11
>> @@ -1768,6 +1768,8 @@
>>                      break;
>>                      }
>>                   }
>> +              else if (r < MIN_TS_PACKETS_FOR_FRAME_DETECTOR * TS_SIZE)
>> +                 cCondWait::SleepMs(10);
>>                }
>>             }
>>       }
>>
>> The number MIN_TS_PACKETS_FOR_FRAME_DETECTOR * TS_SIZE is just a random pick
>> to avoid using a concrete literal number here.
>> Can you please test if this still solves your problem?
>>
>> Klaus
>>
>> _______________________________________________
>> vdr mailing list
>> vdr@xxxxxxxxxxx
>> http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr

_______________________________________________
vdr mailing list
vdr@xxxxxxxxxxx
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr




[Index of Archives]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Util Linux NG]     [Xfree86]     [Big List of Linux Books]     [Fedora Users]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux