The program where I ran into this problem is too large to justify
posting it, so I don't expect answers that identify precisely what is
going wrong - I'm just looking for advice as to what to look for. If I
don't find the problem soon, I'll try shrinking the program down to a
point where it can be posted.
SD_start_time is a double object with automatic storage duration defined
in process_a_granule() whose address is passed to process_a_scan(),
where that pointer parameter is called scan_time, and then to
compute_SD_start_time(), where the pointer is called SD_start_time.
At optimization level 2, as indicated by the following output from gdb,
scan_time is optimized out of the function call interface for
process_a_scan(). The address for SD_start_time is simply
passed directly to compute_SD_start_time(). The problem is that it is
sometimes the incorrect address. In this particular case, 33 scans were
processed without a hitch, and then it failed for the 34th:
Program received signal SIGSEGV, Segmentation fault.
0x00000000004129d4 in compute_SD_start_time (pkt_header=0x7fffffffd870,
SD_start_time=0x1fff88b30) at compute_SD_start_time.c:109
109 *SD_start_time = pkt_header->pkt_TAI_time -
global_time_offset_array[index];
(gdb) print SD_start_time
$1 = (PGSt_double *) 0x1fff88b30
(gdb) print *SD_start_time
Cannot access memory at address 0x1fff88b30
(gdb) up
#1 0x000000000041a01c in process_a_scan (scan_number=<value optimized
out>,
pkt=<value optimized out>, scan_rate=<value optimized out>,
scan_time=<value optimized out>, L1A_scan=<value optimized out>,
scan_meta=0x7ffffff88a10, eng_data=0x7ffffff88b80,
failed_pkts=0x7fffffffd908, pkt_header=0x7fffffffd870,
scan_pixel=0x7ffffff85b20, L0_file=0x7fffffce617c) at
process_a_scan.c:420
up
420 compute_SD_start_time (pkt_header, scan_time);
(gdb) up
#2 0x0000000000419627 in process_a_granule (L0_file=0,
gran_start_time=386024405, gran_end_time=386024705,
pcf_config=0x7fffffffd100, eng_data=0x7ffffff88b80,
pkt_header=<value optimized out>, pkt=<value optimized out>,
failed_pkts=0x7fffffffd908) at process_a_granule.c:276
276 L1A_status = process_a_scan(&prev_scan_number, pkt,
(gdb) print SD_start_time
$2 = 386024455.18252099
(gdb) print &SD_start_time
$3 = (PGSt_double *) 0x7ffffff88b30
(gdb) l 275
...
276 L1A_status = process_a_scan(&prev_scan_number, pkt,
277 &pcf_config->scan_rate, &SD_start_time,
278 &scan_data, &scan_metadata, eng_data, failed_pkts,
279 pkt_header, &pixel_quality_data, &L0_file);
(gdb) down
#1 0x000000000041a01c in process_a_scan (scan_number=<value optimized
out>,
pkt=<value optimized out>, scan_rate=<value optimized out>,
scan_time=<value optimized out>, L1A_scan=<value optimized out>,
scan_meta=0x7ffffff88a10, eng_data=0x7ffffff88b80,
failed_pkts=0x7fffffffd908, pkt_header=0x7fffffffd870,
scan_pixel=0x7ffffff85b20, L0_file=0x7fffffce617c) at
process_a_scan.c:420
So, &SD_start_time in process_a_granule() is 0x7ffffff88b30, but
SD_start_time in compute_SD_start_time() is 0x1fff88b30.
I'm pretty impressed by this optimization, since all three functions are
defined in different translation units, so it could only be performed at
link time. However, no matter how impressive it is, if it doesn't
produce the right results, it's no good. The overwhelming majority of
the time it works as intended, so the problem must be triggered by input
data that is unusual in some way. What kinds of things could I do to
figure out why the wrong address is sometimes (but not always) sent to
compute_SD_start_time()?
If the incorrect pointer were being transmitted by my own code, it would
be easy to figure this out; but the optimizer removed the code I wrote
to pass the pointer, and used some alternative method of it's own
choosing, and I don't know how to track that down.