On Wed, Mar 25, 2020 at 10:24:24AM +1100, Dave Chinner wrote: > On Tue, Mar 24, 2020 at 12:57:00PM -0400, Brian Foster wrote: > > If the bio_add_page() call fails, we proceed to write out a > > partially constructed log buffer. This corrupts the physical log > > such that log recovery is not possible. Worse, persistent > > occurrences of this error eventually lead to a BUG_ON() failure in > > bio_split() as iclogs wrap the end of the physical log, which > > triggers log recovery on subsequent mount. > > I'm a little unclear on how this can happen - the iclogbuf can only > be 256kB - 64 pages - and we always allocation a bio with enough > bvecs to hold 64 pages. And the ic_data buffer we are adding to the > bio is also statically allocated so I'm left to wonder exactly how > this is failing. > > i.e. this looks like code that shouldn't ever fail, yet it > apparently is, and I have no idea what is causing that failure... > It shouldn't fail in current upstream. The problem occurred on a large page (64k) system without commit 59bb47985c1d ("mm, sl[aou]b: guarantee natural alignment for kmalloc(power-of-two)"). The large page config means default sized log buffers (32k) allocate out of slab and slab allocs are not naturally aligned due to the lack of the aforementioned commit (plus additional mm debug options, such as slub debug, kasan). IOW, the 32k slab looks like this: kmalloc-32k 75 75 33792 15 8 : tunables 0 0 0 : slabdata 5 5 0 Note the 33k object size. This means that 32k slab allocations can start at a non-32k aligned physical offset in a page. So for example if we allocate a 32k log buffer that lands at physical offset 48k of the underlying page, xlog_map_iclog_data() will attempt to attach 2 physical pages (16k from each) to the bio. Meanwhile the bio was originally allocated and initialized based on a bvec count of howmany(log->l_iclog_size, PAGE_SIZE), which assumes a 32k log buffer only requires a single bvec. The primary fix for this problem was to include the slab alignment patch. That essentially changes the object size in the above example from 33k to 64k for reasons described in its commit log. This error handling patch was simply based on the observation that if the bio_add_page() call from XFS fails, for whatever reason, we fall over rather ungracefully. Brian > That said, shutting down on failure is the right thing to do, so the > code looks good. I just want to know how the bio_add_page() failure > is occurring. > > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >