Re: [PATCH] Fix TS buffer thread high CPU usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18.03.2016 22:14, glenvt18 wrote:
UPDATE.

Here are 60 Mbit/s (entire transponder) test results:

http://pastebin.com/nHDN8nsW

- the same behaviour as for 30- Mbit/s, no huge chunks on the second
read. Threshold value of 70000 was chosen for demonstration only. It's
just a bit smaller than the mean which is about 75000. Notice that the
max value is 1.6x higher in the case of vanilla VDR. So, the threshold
value of about 100000 is OK.

Well, I guess then we can just as well make this unconditional after all.
Because any concrete threshold is just guesswork and may only apply to
a particular environment.

I have added your original patch to my VDR now and will observe how it
behaves with my devices. If all goes well, I'll add it to the official
source.

Please send me your full name for the CONTRIBUTORS file (somebody has to
take the blame if this ever causes trouble ;-).

Klaus


2016-03-14 19:05 GMT+03:00 glenvt18 <glenvt18@xxxxxxxxx>:
Hi Klaus,

I've tested your patch on x86_64 and ARM (I picked the weakest HW I
have). I've gathered some statistics:

http://pastebin.com/kXEP9Bp7

Notes.
A "good" tuner there means the one that returns bigger chunks on
average and generally consumes less CPU cycles with a vanilla (not
patched) VDR.
CPU usage means the CPU usage of the TS buffer thread reported by top -d2.
Bit rates are average and coarse.
I didn't include unconditional delay dumps. They are "smooth" like
those 8 and 15 Mbit/s ARM figures. CPU usage on x86_64 is typically
1.5-2 times less.

Some observations:
1. Small reads are really expensive and must be avoided.
2. "Good" tuners are not that "good" at higher bit rates.
3. An "non-delayed" read() (the last read() in the sequence
poll()->read()->sleep()->poll()->read()->poll()->read()) is very
likely to return a small chunk.

With bit rates higher that 30 Mbit/s I got device buffer overflows on
ARM - ring buffer processing couldn't catch up.

So, generally, it solves the issue, but is not as efficient as with an
unconditional delay. Compared to the unconditional patch:
1. CPU usage is higher and not as steady.
2. The read() sizes are not smooth too.

Those CPU usage percents (without the patch) may look small, but it's
20-35% of the whole VDR CPU usage and are several times less with the
patch, conditional or not.

I understand, an unconditional delay here looks a bit scaring. How
about increasing the threshold value to, say, 100000, or 500 * TS_SIZE
or even higher? In other words, treat delayed read() as a normal
operation, and not delayed one as an emergency case. I can test it on
x86_64 with 50-60 Mbit/s. What do you think of it?

Best,
glenvt18.

2016-03-10 12:41 GMT+03:00 Klaus Schmidinger <Klaus.Schmidinger@xxxxxxx>:
On 10.03.2016 02:54, glenvt18 wrote:

Hi folks,

I've found that with some DVB tuner drivers poll() returns when there
are only small (1-3) number of TS packets available to read(). This
results in high CPU usage of the TS buffer thread which is busy
reading small chunks of data in cTSBuffer::Action(). 2 out of 3 tuners
I tested were affected with this issue. Even with a "good" tuner TS
buffer thread CPU usage is up to 10% per HD stream on ARM Cortex-A7.
With the proposed patch it is below 1-2% on all ARM and x86_64
platforms I've tested. The delay value of 10 ms can be considered
safe:

media/dvb-core/dmxdev.h:109
#define DVR_BUFFER_SIZE (10*188*1024)

It will take a tuner to receive (10*188*1024)*8*(1000/10) / 1000000 =
1540 Mbit/s to overflow the device buffer within 10 ms interval. A
smaller delay is not enough for ARM. cDvbDevice has a ring buffer of
5MB which is larger.

This patch was made against VDR 2.3.1, but it can be applied to VDR
2.2.0 as well.

Please review.
glenvt18
---
Index: b/device.c
===================================================================
--- a/device.c    2015-09-18 01:04:12.000000000 +0300
+++ b/device.c    2016-03-10 03:38:50.078400715 +0300
@@ -1768,6 +1768,8 @@
                       break;
                       }
                    }
+              else
+                 cCondWait::SleepMs(10);
                 }
              }
        }


I'm not too fond of the idea of introducing an additional, unconditional
wait here. The actual problem should be fixed within the drivers.
However, maybe waiting in case there is only a small number of TS packets
available is acceptable:

--- device.c    2015/09/05 11:42:17     4.2
+++ device.c    2016/03/10 09:34:11
@@ -1768,6 +1768,8 @@
                      break;
                      }
                   }
+              else if (r < MIN_TS_PACKETS_FOR_FRAME_DETECTOR * TS_SIZE)
+                 cCondWait::SleepMs(10);
                }
             }
       }

The number MIN_TS_PACKETS_FOR_FRAME_DETECTOR * TS_SIZE is just a random pick
to avoid using a concrete literal number here.
Can you please test if this still solves your problem?

Klaus

_______________________________________________
vdr mailing list
vdr@xxxxxxxxxxx
http://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr




[Index of Archives]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Util Linux NG]     [Xfree86]     [Big List of Linux Books]     [Fedora Users]     [Fedora Women]     [ALSA Devel]     [Linux USB]

  Powered by Linux