On Sat, Apr 30, 2022 at 04:44:07AM +0100, Matthew Wilcox wrote: > On Thu, Apr 28, 2022 at 11:53:18AM -0400, Brian Foster wrote: > > The above is the variant of generic/068 failure I was reproducing and > > used to bisect [1]. With some additional tracing added to ioend > > completion, what I'm seeing is that the bio_for_each_folio_all() bvec > > iteration basically seems to go off the rails. What happens more > > specifically is that at some point during the loop, bio_next_folio() > > actually lands into the second page of the just processed folio instead > > of the actual next folio (i.e. as if it's walking to the next page from > > the head page of the folio instead of to the next 16k folio). I suspect > > completion is racing with some form of truncation/reclaim/invalidation > > here, what exactly I don't know, that perhaps breaks down the folio and > > renders the iteration (bio_next_folio() -> folio_next()) unsafe. To test > > that theory, I open coded and modified the loop to something like the > > following: > > > > for (bio_first_folio(&fi, bio, 0); fi.folio; ) { > > f = fi.folio; > > l = fi.length; > > bio_next_folio(&fi, bio); > > iomap_finish_folio_write(inode, f, l, error); > > folio_count++; > > } > > > > ... to avoid accessing folio metadata after writeback is cleared on it > > and this seems to make the problem disappear (so far, I'll need to let > > this spin for a while longer to be completely confident in that). > > _Oh_. > > It's not even a terribly weird race, then. It's just this: > > CPU 0 CPU 1 > truncate_inode_partial_folio() > folio_wait_writeback(); > bio_next_folio(&fi, bio) > iomap_finish_folio_write(fi.folio) > folio_end_writeback(folio) > split_huge_page() > bio_next_folio() > ... oops, now we only walked forward one page instead of the entire folio. > Yep, though once I noticed and turned on the mm_page_free tracepoint, it looked like it was actually the I/O completion path breaking down the compound folio: kworker/10:1-440 [010] ..... 355.369899: iomap_finish_ioend: 1090: bio 00000000bc8445c7 index 192 fi (00000000dc8c03bd 0 16384 32768 27) ... kworker/10:1-440 [010] ..... 355.369905: mm_page_free: page=00000000dc8c03bd pfn=0x182190 order=2 kworker/10:1-440 [010] ..... 355.369907: iomap_finish_ioend: 1090: bio 00000000bc8445c7 index 1 fi (00000000f8b5d9b3 0 4096 16384 27) I take that to mean the truncate path executes while the completion side holds a reference, folio_end_writeback() ends up dropping the last reference, falls into the free/split path and the iteration breaks from there. Same idea either way, I think. Brian > So ... I think we can fix this with: > > +++ b/include/linux/bio.h > @@ -290,7 +290,8 @@ static inline void bio_next_folio(struct folio_iter *fi, struct bio *bio) > { > fi->_seg_count -= fi->length; > if (fi->_seg_count) { > - fi->folio = folio_next(fi->folio); > + fi->folio = (struct folio *)folio_page(fi->folio, > + (fi->offset + fi->length) / PAGE_SIZE); > fi->offset = 0; > fi->length = min(folio_size(fi->folio), fi->_seg_count); > } else if (fi->_i + 1 < bio->bi_vcnt) { > > (I do not love this, have not even compiled it; it's late. We may be > better off just storing next_folio inside the folio_iter). >