Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > Yes, I saw David Howells resolution suggestion. I think that one > was buggy. It would wait for a page under writeback, and then go on to > the *next* one without writing it back. I don't thin kthat was right. You're right. Vishal's patch introduced it into afs and I copied it across and didn't notice it either then or on review of Vishal's patch. He inserted the extra for-loop as he's now extracting a batch, but kept the continue that used to repeat the extraction - except now it continues the wrong loop. So afs will need fixing too. The simplest ways I think are to just decrement the loop counter before continuing or to stick a goto in back to the beginning of the loop (which is what you did in cifs). But I'm not sure that's the correct thing to do. The previous code dropped the found folio and then repeated the search in case the folio got truncated, migrated or punched. I suspect that's probably what we should do. Also, thinking about it again, I'm not sure whether fetching a batch with filemap_get_folios_tag() like this in {afs,cifs}_writepages_region() is necessarily the right thing to do. There are three cases I'm thinking of: (1) A single folio is returned. This is trivial. (2) A run of contiguous folios are returned - {afs,cifs}_extend_writeback() is likely to write them back, in which case the batch is probably not useful. Note that *_extend_writeback() walks the xarray directly itself as it wants contiguous folios and doesn't want to extract any folio it's not going to use. (3) A list of scattered folios is returned. Granted this is more efficient if nothing else interferes - but there could be other writes in the gaps that we then skip over, other flushes that render some of our list clean or page invalidations. This is a change in behaviour, but I'm not sure that matters too much since a flush/sync can only be expected to write back what's modified at the time it is initiated. Further, processing each entry in the list is potentially very slow because we're doing a write across the network for each one (cifs might bump this into the background, but it might also have to (re)open a file handle on the server and wait for credits first to even begin the transaction). Which means all of the folios in the batch may then get pinned for a long period of time - up to 14x for the last folio in the batch - which could prevent things like page migration. Further, we might not get to write out all the folios in the batch as *_extend_writeback() might hit the wbc limit first. > That said, I'm not at all convinced my version is right either. I > can't test it, and that means I probably messed up. It looked sane to > me when I did it, and it builds cleanly, but I honestly doubt myself. It doesn't seem to work. A write seems to end in lots of: CIFS: VFS: No writable handle in writepages rc=-9 being emitted. I'll poke further into it - there's always the possibility that some other patch is interfering. David