Re: Using vdr-dpg package for bug hunting?

schorpp <thomas.schorpp@xxxxxxxxx> · Sun, 1 Dec 2024 18:22:30 +0100

Am 01.12.24 um 16:56 schrieb Marko Mäkelä:
Sun, Dec 01, 2024 at 02:18:02PM +0100, schorpp wrote:
HA! I've got this bitch of intermittent bug finally:

Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0xad0ffb40 (LWP 27522)]
0x08136f6a in cFrameDetector::Analyze (this=0x9ade480, Data=<optimized 
out>, Length=296852) at remux.c:1567
1567                           uint32_t Delta = ptsValues[0] / 
(framesPerPayloadUnit +  parser->IFrameTemporalReferenceOffset());

It would have been useful to include the disassembly of the function.
Maybe alos the output fo the following, if the values are known to the 
debugger:

print ptsValues[0]
print framesPerPayloadUnit
print parser->iFrameTemporalReferenceOffset

AAAh sorry I forgot so many things, the last time I used assembler was 
tracing a kernel driver bug years ago.

Now we must wait for days again for the still not identified use case 
the exception is risen for again :/

I was curious about this. I am able to reproduce SIGFPE on both x86-64 
and i386 when compiling the following C program without optimization:

#include <stdint.h>
#include <stdio.h>
#include <inttypes.h>
int main()
{
   int a = 0, b = -1;
   uint32_t pts = 1U << 31;
   int32_t delta = ((int32_t)pts) / (a + b);
   printf("%" PRIi32 "\n", delta);
   return 0;
}

Initially I had "uint32_t delta" and no type cast, and PRIu32, to 
exactly match the data types that are involved in VDR. That variant 
would produce the incorrect result 0. This of course is a bad 
approximation for the code in VDR, because above it is possible to 
perform all the arithmetics at compilation time. In VDR, the values 
would be determined at runtime.

yes.

Can we just catch and handle this C++ exception somehow?

Curiously, if I compile the above program with GCC 14.2.0 -O2, then it 
will return the incorrect result -2147483648 instead of an approximation 
like 2147483647 (which would be one less than the correct result, which 
cannot be represented in int32_t). If I look at the disassembly, the 
compiler would have performed an incorrect constant folding for "delta".

If I compile the program with -fsanitize=undefined, it will flag an error:

runtime error: division of -2147483648 by -1 cannot be represented in 
type 'int'

For the non-optimized case, for both i386 and x86-64, I see that the 
SIGFPE is being raised by an idiv instruction that is preceded by ctld 
a.k.a. cdq: https://www.felixcloutier.com/x86/cwd:cdq:cqo

Interesting :)

Is this a case for the gcc-dev mailinglist? Then CC them if on list.

Aussume it is a divide by zero exception?

I don't know if your case involves the idiv instruction, but https:// 
www.felixcloutier.com/x86/idiv mentions that #DE may be raised both on a 
division by zero and on overflow.

Can you post the output of "disassemble" and "info registers" for the 
innermost stack frame?

yes, if it occours again

     Marko

y
tom