On Thu, Apr 28, 2022 at 11:53:18AM -0400, Brian Foster wrote: > The above is the variant of generic/068 failure I was reproducing and > used to bisect [1]. With some additional tracing added to ioend > completion, what I'm seeing is that the bio_for_each_folio_all() bvec > iteration basically seems to go off the rails. What happens more > specifically is that at some point during the loop, bio_next_folio() > actually lands into the second page of the just processed folio instead > of the actual next folio (i.e. as if it's walking to the next page from > the head page of the folio instead of to the next 16k folio). I suspect > completion is racing with some form of truncation/reclaim/invalidation > here, what exactly I don't know, that perhaps breaks down the folio and > renders the iteration (bio_next_folio() -> folio_next()) unsafe. To test > that theory, I open coded and modified the loop to something like the > following: > > for (bio_first_folio(&fi, bio, 0); fi.folio; ) { > f = fi.folio; > l = fi.length; > bio_next_folio(&fi, bio); > iomap_finish_folio_write(inode, f, l, error); > folio_count++; > } > > ... to avoid accessing folio metadata after writeback is cleared on it > and this seems to make the problem disappear (so far, I'll need to let > this spin for a while longer to be completely confident in that). _Oh_. It's not even a terribly weird race, then. It's just this: CPU 0 CPU 1 truncate_inode_partial_folio() folio_wait_writeback(); bio_next_folio(&fi, bio) iomap_finish_folio_write(fi.folio) folio_end_writeback(folio) split_huge_page() bio_next_folio() ... oops, now we only walked forward one page instead of the entire folio. So ... I think we can fix this with: +++ b/include/linux/bio.h @@ -290,7 +290,8 @@ static inline void bio_next_folio(struct folio_iter *fi, struct bio *bio) { fi->_seg_count -= fi->length; if (fi->_seg_count) { - fi->folio = folio_next(fi->folio); + fi->folio = (struct folio *)folio_page(fi->folio, + (fi->offset + fi->length) / PAGE_SIZE); fi->offset = 0; fi->length = min(folio_size(fi->folio), fi->_seg_count); } else if (fi->_i + 1 < bio->bi_vcnt) { (I do not love this, have not even compiled it; it's late. We may be better off just storing next_folio inside the folio_iter).