On Tue, 07 Jul 2009, Carlos O'Donell wrote: > On Tue, Jul 7, 2009 at 12:21 PM, John David > Anglin<dave@xxxxxxxxxxxxxxxxxx> wrote: > >> So if I characterise the problem you think you're seeing: on mmap of a > >> file at a memory location to be determined by the kernel, a sequential > >> set of reads of the mapped location eventually turns up a zero where > >> there should be data? Yes, it does sound like a caching issue. > > > > Yes. The loop is terminated by a null tag: > > > > while (dyn->d_tag != DT_NULL) > > { > > ... > > } > > > > However, the core dump doesn't show a null tag before the STRTAB tag > > that caused the segmentation fault. > > Do you mean "after" the STRTAB tag? I assume the library on-disk has a > DT_NULL, otherwise it would always fail. I'm sure that there is a null tag after the STRTAB. The segmentation fault occurred because the get operation failed after processing the first NEEDED tag and before the STRTAB tag. The loop goes sequentially through the array of DT objects in the recently mmap'd data and inserts pointers to these objects into the dynamic loaders link map for the file (in the l_info field). There were no null tags between the NEEDED entry and the STRTAB entry in the mmap'd data in the core dump. The DT objects are near the end of the mmap'd data. I would guess that the loop terminated early because the l_info array is all zeros except for the first NEEDED entry. It appears correct. The loop might have terminated early because of a cache issue, or possibly the value loaded from memory somehow got corrupted. Another possibility would be the mmap operation wasn't complete when the memory was examined by the dynamic loader. When the core dump was done, the operation was complete. I think it's less likely that a cache issue affected the memory used by the dynamic loader (l_info field) as the data before and after in the map seemed reasonable. The fact PA8700 processors are also experiencing similar problems would seem to suggest that this isn't a PA8800 L2 issue unless we have multiple problems. I think we need to try running a recent kernel on gsyprf11 for a while to see if we can capture a similar event. Dave -- J. David Anglin dave.anglin@xxxxxxxxxxxxxx National Research Council of Canada (613) 990-0752 (FAX: 952-6602) -- To unsubscribe from this list: send the line "unsubscribe linux-parisc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html