On Mon, Mar 14, 2022 at 02:04:35PM -0600, Christopher Hodgkins wrote: > NOTE: This question is about kernel 4.15. All line numbers and symbol > names correspond to the Git source at tag v4.15. > > Hi all, > I've been running some benchmarks using ext4 files on PMEM (first-gen > Intel Optane) as "anonymous" memory, and I've run into a weird error. > For reference, the way this works is that we have a runtime that at > startup `fallocate`s a large PMEM-backed file and maps the whole thing > R/W with MAP_SYNC, and then it interposes on calls to `mmap` in > userspace to return page-sized chunks of PMEM when anonymous memory is > requested. > > The error I have encountered is the nondeterministic delivery of > SIGBUS on the first access to an untouched page of the mapped region > (which since the file is passed to the application sequentially, is > also typically the first uninitialized extent in the file at time of > crash). The accesses are aligned and within a mapped region according > to smaps, which eliminates the only documented reasons for delivery of > SIGBUS that I'm aware of. First thing to check is whether it occurs with XFS+DAX on that kernel. That will tell you if it's an infrastructure or ext4 problem. Second thing to do is to test a current 5.17-rc8 kernel to see if the problem reproduces on a current kernel. i.e. determine if the problem has actually been fixed or not. If it reproduces on a current kernel, then update the bug report with all that information and post the code that reproduces the problem so we can look at it more detail. > I did a bit of digging with FTrace, and the course of events at a > crash seems to be as follows. Multiple (>2) threads start faulting in > the page, and go through the "synchronous page fault" path. They all > return error-free from the fdatasync() call at dax.c:1588 and call > dax_insert_pfn_mkwrite. The first thread to exit that function returns > NOPAGE (success) and the others all return SIGBUS, and each raises the > userspace signal on the return path. > > My best guess for why this occurs is that the unsuccessful calls all > bounce with EBUSY (because of the successful one?) in insert_pfn > (which tails into the call to vm_insert_mixed_mkwrite at dax.c:1548), > and then dax_fault_return maps that to SIGBUS. The signal is > definitely spurious -- as mentioned, one of the threads returns > success, and if I catch the signal with GDB, the faulting access can > be successfully performed after the signal is caught. Also, as > mentioned above, the error is nondeterministic -- it happens maybe one > out of every five runs. To clarify some other things that could make a > difference, the pages are normal-sized (not huge) and the SIGBUS isn't > due to PMEM failure (ie HWPOISON). > > I'm on an old kernel (4.15) so if this is really an error in the > kernel code it may be fixed on the current series. If that's the case, > just point me to a patch or release number where it was fixed and I'll > be happy. git bisect is your friend, and it doesn't require any upstream developer time for you to run the bisect and determine where it was fixed... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx