Re: Fwd: Spurious SIGBUS when threads race to insert a DAX page

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 15 Mar 2022 08:00:51 +1100

On Mon, Mar 14, 2022 at 02:04:35PM -0600, Christopher Hodgkins wrote:
> NOTE: This question is about kernel 4.15. All line numbers and symbol
> names correspond to the Git source at tag v4.15.
> 
> Hi all,
> I've been running some benchmarks using ext4 files on PMEM (first-gen
> Intel Optane) as "anonymous" memory, and I've run into a weird error.
> For reference, the way this works is that we have a runtime that at
> startup `fallocate`s a large PMEM-backed file and maps the whole thing
> R/W with MAP_SYNC, and then it interposes on calls to `mmap` in
> userspace to return page-sized chunks of PMEM when anonymous memory is
> requested.
> 
> The error I have encountered is the nondeterministic delivery of
> SIGBUS on the first access to an untouched page of the mapped region
> (which since the file is passed to the application sequentially, is
> also typically the first uninitialized extent in the file at time of
> crash). The accesses are aligned and within a mapped region according
> to smaps, which eliminates the only documented reasons for delivery of
> SIGBUS that I'm aware of.

First thing to check is whether it occurs with XFS+DAX on that
kernel. That will tell you if it's an infrastructure or ext4
problem.

Second thing to do is to test a current 5.17-rc8 kernel to see if
the problem reproduces on a current kernel. i.e. determine if the
problem has actually been fixed or not.

If it reproduces on a current kernel, then update the bug report
with all that information and post the code that reproduces the
problem so we can look at it more detail.

> I did a bit of digging with FTrace, and the course of events at a
> crash seems to be as follows. Multiple (>2) threads start faulting in
> the page, and go through the "synchronous page fault" path. They all
> return error-free from the fdatasync() call at dax.c:1588 and call
> dax_insert_pfn_mkwrite. The first thread to exit that function returns
> NOPAGE (success) and the others all return SIGBUS, and each raises the
> userspace signal on the return path.
> 
> My best guess for why this occurs is that the unsuccessful calls all
> bounce with EBUSY (because of the successful one?) in insert_pfn
> (which tails into the call to vm_insert_mixed_mkwrite at dax.c:1548),
> and then dax_fault_return maps that to SIGBUS. The signal is
> definitely spurious -- as mentioned, one of the threads returns
> success, and if I catch the signal with GDB, the faulting access can
> be successfully performed after the signal is caught. Also, as
> mentioned above, the error is nondeterministic -- it happens maybe one
> out of every five runs. To clarify some other things that could make a
> difference, the pages are normal-sized (not huge) and the SIGBUS isn't
> due to PMEM failure (ie HWPOISON).
> 
> I'm on an old kernel (4.15) so if this is really an error in the
> kernel code it may be fixed on the current series. If that's the case,
> just point me to a patch or release number where it was fixed and I'll
> be happy.

git bisect is your friend, and it doesn't require any upstream
developer time for you to run the bisect and determine where it was
fixed...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx