On Tue, Mar 24, 2020 at 01:29:49PM -0400, Brian Foster wrote: > On Tue, Mar 24, 2020 at 10:18:59AM -0700, Darrick J. Wong wrote: > > On Tue, Mar 24, 2020 at 12:57:00PM -0400, Brian Foster wrote: > > > If the bio_add_page() call fails, we proceed to write out a > > > partially constructed log buffer. This corrupts the physical log > > > such that log recovery is not possible. Worse, persistent > > > occurrences of this error eventually lead to a BUG_ON() failure in > > > bio_split() as iclogs wrap the end of the physical log, which > > > triggers log recovery on subsequent mount. > > > > > > Rather than warn about writing out a corrupted log buffer, shutdown > > > the fs as is done for any log I/O related error. This preserves the > > > consistency of the physical log such that log recovery succeeds on a > > > subsequent mount. Note that this was observed on a 64k page debug > > > kernel without upstream commit 59bb47985c1d ("mm, sl[aou]b: > > > guarantee natural alignment for kmalloc(power-of-two)"), which > > > demonstrated frequent iclog bio overflows due to unaligned (slab > > > allocated) iclog data buffers. > > > > Fixes: tag? > > > > I suppose you could argue it fixes commit 79b54d9bfcdcd ("xfs: use bios > directly to write log buffers"), but I didn't include a tag because this > is not really fixing a reproducible bug. It's fixing up the error > handling based on a bad combination of patches in a distro kernel. > Perhaps I'm just not clear on when we do or don't want a fixes tag..? [Summarizing what I rambled about on IRC:] >From my perspective, this looks like you concluded that the WARN_ON_ONCE wasn't sufficient to deal with the error (because the physical log got corrupted), so you're adding branch code to shut down the log. Granted, it should only happen if bio_add_page fails, but as that's not part of xfs, we have to code defensively enough to avoid breaking the filesystem. Looks ok, will add fixes tag and send it to the testcloud... Reviewed-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> --D > Brian > > > Otherwise, looks ok to me. > > > > --D > > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > > --- > > > fs/xfs/xfs_log.c | 14 ++++++++++---- > > > 1 file changed, 10 insertions(+), 4 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c > > > index 2a90a483c2d6..ebb6a5c95332 100644 > > > --- a/fs/xfs/xfs_log.c > > > +++ b/fs/xfs/xfs_log.c > > > @@ -1705,16 +1705,22 @@ xlog_bio_end_io( > > > > > > static void > > > xlog_map_iclog_data( > > > - struct bio *bio, > > > - void *data, > > > + struct xlog_in_core *iclog, > > > size_t count) > > > { > > > + struct xfs_mount *mp = iclog->ic_log->l_mp; > > > + struct bio *bio = &iclog->ic_bio; > > > + void *data = iclog->ic_data; > > > + > > > do { > > > struct page *page = kmem_to_page(data); > > > unsigned int off = offset_in_page(data); > > > size_t len = min_t(size_t, count, PAGE_SIZE - off); > > > > > > - WARN_ON_ONCE(bio_add_page(bio, page, len, off) != len); > > > + if (bio_add_page(bio, page, len, off) != len) { > > > + xfs_force_shutdown(mp, SHUTDOWN_LOG_IO_ERROR); > > > + break; > > > + } > > > > > > data += len; > > > count -= len; > > > @@ -1762,7 +1768,7 @@ xlog_write_iclog( > > > if (need_flush) > > > iclog->ic_bio.bi_opf |= REQ_PREFLUSH; > > > > > > - xlog_map_iclog_data(&iclog->ic_bio, iclog->ic_data, count); > > > + xlog_map_iclog_data(iclog, count); > > > if (is_vmalloc_addr(iclog->ic_data)) > > > flush_kernel_vmap_range(iclog->ic_data, count); > > > > > > -- > > > 2.21.1 > > > > > >