NOTE: This question is about kernel 4.15. All line numbers and symbol names correspond to the Git source at tag v4.15. Hi all, I've been running some benchmarks using ext4 files on PMEM (first-gen Intel Optane) as "anonymous" memory, and I've run into a weird error. For reference, the way this works is that we have a runtime that at startup `fallocate`s a large PMEM-backed file and maps the whole thing R/W with MAP_SYNC, and then it interposes on calls to `mmap` in userspace to return page-sized chunks of PMEM when anonymous memory is requested. The error I have encountered is the nondeterministic delivery of SIGBUS on the first access to an untouched page of the mapped region (which since the file is passed to the application sequentially, is also typically the first uninitialized extent in the file at time of crash). The accesses are aligned and within a mapped region according to smaps, which eliminates the only documented reasons for delivery of SIGBUS that I'm aware of. I did a bit of digging with FTrace, and the course of events at a crash seems to be as follows. Multiple (>2) threads start faulting in the page, and go through the "synchronous page fault" path. They all return error-free from the fdatasync() call at dax.c:1588 and call dax_insert_pfn_mkwrite. The first thread to exit that function returns NOPAGE (success) and the others all return SIGBUS, and each raises the userspace signal on the return path. My best guess for why this occurs is that the unsuccessful calls all bounce with EBUSY (because of the successful one?) in insert_pfn (which tails into the call to vm_insert_mixed_mkwrite at dax.c:1548), and then dax_fault_return maps that to SIGBUS. The signal is definitely spurious -- as mentioned, one of the threads returns success, and if I catch the signal with GDB, the faulting access can be successfully performed after the signal is caught. Also, as mentioned above, the error is nondeterministic -- it happens maybe one out of every five runs. To clarify some other things that could make a difference, the pages are normal-sized (not huge) and the SIGBUS isn't due to PMEM failure (ie HWPOISON). I'm on an old kernel (4.15) so if this is really an error in the kernel code it may be fixed on the current series. If that's the case, just point me to a patch or release number where it was fixed and I'll be happy. It may also be an error in my code -- I will be less happy in that case, but please still point it out or ask questions for clarification if you think I'm doing something wrong to cause this. Thanks, George Hodgkins