Re: [PATCH] xfs: shutdown on failure to add page to log bio

Brian Foster <bfoster@xxxxxxxxxx> · Wed, 25 Mar 2020 07:24:17 -0400

On Wed, Mar 25, 2020 at 10:24:24AM +1100, Dave Chinner wrote:
> On Tue, Mar 24, 2020 at 12:57:00PM -0400, Brian Foster wrote:
> > If the bio_add_page() call fails, we proceed to write out a
> > partially constructed log buffer. This corrupts the physical log
> > such that log recovery is not possible. Worse, persistent
> > occurrences of this error eventually lead to a BUG_ON() failure in
> > bio_split() as iclogs wrap the end of the physical log, which
> > triggers log recovery on subsequent mount.
> 
> I'm a little unclear on how this can happen - the iclogbuf can only
> be 256kB - 64 pages - and we always allocation a bio with enough
> bvecs to hold 64 pages. And the ic_data buffer we are adding to the
> bio is also statically allocated so I'm left to wonder exactly how
> this is failing.
> 
> i.e. this looks like code that shouldn't ever fail, yet it
> apparently is, and I have no idea what is causing that failure...
> 

It shouldn't fail in current upstream. The problem occurred on a large
page (64k) system without commit 59bb47985c1d ("mm, sl[aou]b: guarantee
natural alignment for kmalloc(power-of-two)"). The large page config
means default sized log buffers (32k) allocate out of slab and slab
allocs are not naturally aligned due to the lack of the aforementioned
commit (plus additional mm debug options, such as slub debug, kasan).
IOW, the 32k slab looks like this:

kmalloc-32k           75     75  33792   15    8 : tunables    0    0    0 : slabdata      5      5      0

Note the 33k object size. This means that 32k slab allocations can start
at a non-32k aligned physical offset in a page. So for example if we
allocate a 32k log buffer that lands at physical offset 48k of the
underlying page, xlog_map_iclog_data() will attempt to attach 2 physical
pages (16k from each) to the bio. Meanwhile the bio was originally
allocated and initialized based on a bvec count of
howmany(log->l_iclog_size, PAGE_SIZE), which assumes a 32k log buffer
only requires a single bvec.

The primary fix for this problem was to include the slab alignment
patch. That essentially changes the object size in the above example
from 33k to 64k for reasons described in its commit log. This error
handling patch was simply based on the observation that if the
bio_add_page() call from XFS fails, for whatever reason, we fall over
rather ungracefully.

Brian

> That said, shutting down on failure is the right thing to do, so the
> code looks good. I just want to know how the bio_add_page() failure
> is occurring.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx
>