Re: generic/068 crash on 5.18-rc2?

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Tue, May 03, 2022 at 05:31:01AM +0100, Matthew Wilcox wrote:
> On Mon, May 02, 2022 at 08:25:34PM -0700, Darrick J. Wong wrote:
> > On Mon, May 02, 2022 at 08:20:00AM -0400, Brian Foster wrote:
> > > On Sat, Apr 30, 2022 at 10:40:31PM +0100, Matthew Wilcox wrote:
> > > > On Sat, Apr 30, 2022 at 04:44:07AM +0100, Matthew Wilcox wrote:
> > > > > (I do not love this, have not even compiled it; it's late.  We may be
> > > > > better off just storing next_folio inside the folio_iter).
> > > > 
> > > > Does anyone have a preference for fixing this between Option A:
> > > > 
> > > 
> > > After seeing the trace in my previous mail and several thousand
> > > successful iterations of the test hack, I had reworked it into this
> > > (which survived weekend testing until it ran into some other XFS problem
> > > that looks unrelated):
> > > 
> > > diff --git a/include/linux/bio.h b/include/linux/bio.h
> > > index 278cc81cc1e7..aa820e09978e 100644
> > > --- a/include/linux/bio.h
> > > +++ b/include/linux/bio.h
> > > @@ -269,6 +269,7 @@ struct folio_iter {
> > >  	size_t offset;
> > >  	size_t length;
> > >  	/* private: for use by the iterator */
> > > +	struct folio *_next;
> > >  	size_t _seg_count;
> > >  	int _i;
> > >  };
> > > @@ -279,6 +280,7 @@ static inline void bio_first_folio(struct folio_iter *fi, struct bio *bio,
> > >  	struct bio_vec *bvec = bio_first_bvec_all(bio) + i;
> > >  
> > >  	fi->folio = page_folio(bvec->bv_page);
> > > +	fi->_next = folio_next(fi->folio);
> > >  	fi->offset = bvec->bv_offset +
> > >  			PAGE_SIZE * (bvec->bv_page - &fi->folio->page);
> > >  	fi->_seg_count = bvec->bv_len;
> > > @@ -290,13 +292,15 @@ static inline void bio_next_folio(struct folio_iter *fi, struct bio *bio)
> > >  {
> > >  	fi->_seg_count -= fi->length;
> > >  	if (fi->_seg_count) {
> > > -		fi->folio = folio_next(fi->folio);
> > > +		fi->folio = fi->_next;
> > > +		fi->_next = folio_next(fi->folio);
> > >  		fi->offset = 0;
> > >  		fi->length = min(folio_size(fi->folio), fi->_seg_count);
> > >  	} else if (fi->_i + 1 < bio->bi_vcnt) {
> > >  		bio_first_folio(fi, bio, fi->_i + 1);
> > >  	} else {
> > >  		fi->folio = NULL;
> > > +		fi->_next = NULL;
> > >  	}
> > >  }
> > > 
> > > So FWIW, that is just to say that I find option A to be cleaner and more
> > > readable.
> > 
> > Me too.  I'll queue up the usual nightly tests with that patch added and
> > we'll see how that does.
> 
> I've just pushed essentially that patch to my for-next tree in case
> anybody does any testing with that.  I'll give it a couple of days
> before creating a folio-5.18f tag and asking Linus to pull the first two
> commits on
> 
> git://git.infradead.org/users/willy/pagecache.git for-next
> 
> That is, commits
> 
> 1a4c97e2dd5b ("block: Do not call folio_next() on an unreferenced folio")
> 095099da208b ("mm/readahead: Fix readahead with large folios")

Hmm.  Well, I added 1a4c97 to my tree last night, and it seems to have
cleared up all but two of the problems I saw with the for-next branch.

generic/388 still fails (40 minutes in) with:

WARN_ON_ONCE(atomic_read(&iop->write_bytes_pending));
VM_BUG_ON_FOLIO(i_blocks_per_folio(inode, folio) > 1 && !iop, folio);

Which I think is the same problem where the fs goes down, XFS throws an
error back to iomap_do_writepages, and it tries to discard a folio that
already had writeback running on it.

There's also the same problem I reported a few days ago in xfs/501
on a 64k-page ARM64 VM where:

run fstests xfs/501 at 2022-05-02 21:17:31
XFS: Assertion failed: IS_ALIGNED((unsigned long)lv->lv_buf, sizeof(uint64_t)), file: fs/xfs/xfs_log_cil.c, line: 430
XFS: Assertion failed: IS_ALIGNED((unsigned long)buf, sizeof(uint64_t)), file: fs/xfs/xfs_log.c, line: 137
XFS: Assertion failed: IS_ALIGNED((unsigned long)buf, sizeof(uint64_t)), file: fs/xfs/xfs_log.c, line: 137

But I think that's a new bug that came in with all the log buffer
alignment changes in the 5.19 branch.

Oh.  My tree still had the "disable large folios" patch in it.  I guess
the "successful" results are mostly invalid then.

--D

> (more than happy to update anything about those patches)



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux