Re: [PATCH] Add thread safety to cRingBufferLinear

Marko Mäkelä <marko.makela@xxxxxx> · Sat, 25 Feb 2023 16:36:19 +0200

Tue, Feb 21, 2023 at 10:47:28AM +0100, Klaus Schmidinger wrote:
On 19.02.23 18:29, Patrick Lerda wrote:
...
I had definitively a few crashes related to this class. Thread safety 
issues are often not easily reproducible. Is your environment 100% 
reliable?

My VDR runs for weeks, even months 24/7 without problems.
I only restart it when I have a new version.

How many threads would be created or destroyed per day, in your typical 
usage? If we assume a couple thousand such events per day, that would be 
roughly a million events per year. It could take a thousand or a million 
years before a low-probability crash could be reproduced in this way.  
Even if it occurred, would you be guaranteed to thoroughly debug it?  
With the next scheduled recording approaching in a few minutes?

I was thinking that it could be helpful to implement some automated 
testing of restarts. I made a simple experiment, with a tuner stick 
plugged into the USB port of an AMD64 laptop (ARM would be much better 
for reproducing many race conditions), and no aerial cable:

mkdir /dev/shm/v
touch /dev/shm/v/sources.conf /dev/shm/v/channels.conf
i=0
while ./vdr --no-kbd -L. -Pskincurses -c /dev/shm/v -v /dev/shm/v
do
   echo -n "$i"
   i=$((i+1))
done

First, I thought of using an unpatched VDR. The easiest way to trigger 
shutdown would seem to be SIGHUP. I did not figure out how to automate 
the sending of that signal. Instead, I thought I would apply a crude 
patch to the code, like this:

diff --git a/vdr.c b/vdr.c
index 1bdc51ab..b35c4aeb 100644
--- a/vdr.c
+++ b/vdr.c
@@ -1024,6 +1024,7 @@ int main(int argc, char *argv[])
            dsyslog("SD_WATCHDOG ping");
            }
 #endif
+        EXIT(0);
         // Handle channel and timer modifications:
         {
           // Channels and timers need to be stored in a consistent manner,

I did not check if this would actually exercise the thread creation and 
shutdown. Maybe not sufficiently, since I do not see any skincurses 
output on the screen.

Several such test loops against a vanilla VDR code base could be run 
concurrently, using different DVB tuners, configuration directories, and 
SVDRP ports. The test harness could issue HITK commands to randomly 
switch channels, start and stop recordings, and finally restart VDR. As 
long as the process keeps returning the expected exit status on restart, 
the harness would restart it.

It should be possible to cover tens or hundreds of thousands VDR 
restarts per day, and much more if the startup and shutdown logic was 
streamlined to shorten any timeouts. In my environment, each iteration 
with the above patch took about 3 seconds, which I find somewhat 
excessive.

Should a problem be caught in this way, we should be able to get a core 
dump of a crash, or we could attach GDB to a hung process to examine 
what is going on.

Patrick, did you try reproducing any VDR problems under "rr record" 
(https://rr-project.org/)? Debugging in "rr replay" would give access to 
the exact sequence of events. For those race conditions that can be 
reproduced in that way, debugging becomes almost trivial. (Just set some 
data watchpoints and reverse-continue from the final state.)

	Marko

_______________________________________________
vdr mailing list
vdr@xxxxxxxxxxx
https://www.linuxtv.org/cgi-bin/mailman/listinfo/vdr