Adding Jens to the cc. As you well know, he added this code, so I'm mystified why you didn't cc him. Also adding linux-fsdevel (I presume this was a mistake and you inadvertently cc'd f2fs-devel instead.) On Mon, Mar 03, 2025 at 10:34:26AM +1030, Qu Wenruo wrote: > [SPINLOCK AND END WRITEBACK] > > Although folio_end_writeback() can be called in an interruption context > (by the in_task() check), surprisingly it may not be suitable to be > called inside a spinlock (in task context): It's poor practice to do that; you're introducing a dependency between your fs lock and the i_mapping lock, which is already pretty intertwined with ... every other lock in the system. > For example the following call chain can lead to sleep: > > spin_lock() > folio_end_writeback() > |- folio_end_dropbehind_write() > |- folio_launder() > Which can sleep. > > My question is, can and should we allow folio_end_writeback() inside a > spinlock? > > [BTRFS' ASYNC EXTENT PROBLEM] > > This is again a btrfs specific behavior, that we have to call > folio_end_writeback() inside a spinlock to make sure really only one > thread can clear the writeback flag of a folio. > > I know iomap is doing a pretty good job preventing early finished > writeback to clear the folio writeback flag, meanwhile we're still > submitting other blocks inside the folio. > > Iomap goes holding one extra writeback count before starting marking > blocks writeback and submitting them. > And after all blocks are submitted, reduce the writeback count (and call > folio_end_writeback() if it's the last one holding the writeback flag). > > But the iomap solution requires that, all blocks inside a folio to be > submitted at the same time. I aactually don't like the iomap solution as it's currently written; it just hasn't risen high enough up my todo list to fix it. What we should do is initialise folio->ifs->write_bytes_pending to bitmap_weight(ifs->state, blocks_per_folio) * block_size in iomap_writepage_map() [this is oversimplified; we'd need to handle eof and so on too] That would address your problem as well as do fewer atomic operations. > This is not true in btrfs, due to the design of btrfs' async extent, > which can mark the blocks for writeback really at any timing, as we keep > the lock of folios and queue them into a workqueue to do compression, > then only after the compression is done, we submit and mark them > writeback and unlock. > > If we do not hold a spinlock calling folio_end_writeback(), but only > checks if we're the last holder and call folio_end_writeback() out of > the spinlock, we can hit the following race where folio_end_writeback() > can be called on the same folio: > > 0 32K 64K > |<- AE 1 ->|<- AE 2 ->| > > Thread A (AE 1) | Thread B (AE 2) > --------------------------------------+------------------------------ > submit_one_async_extent() | > |- process_one_folio() | > |- subpage_set_writeback() | > | > /* IO finished */ | > end_compressed_writeback() | > |- btrfs_folio_clear_writeback() | > |- spin_lock() | > | /* this is the last writeback | > | holder, should end the | > | folio writeback flag */ | > |- last = true | > |- spin_unlock() | > | | submit_one_async_extent() > | | |- process_one_folio() > | | |- subpage_set_writeback() This seems weird. Why are you setting the "subpage" writeback bit while the folio writeback bit is still set? That smells like a bug-in-waiting if not an actual bug to me. Surely it should wait on the folio writeback bit to clear? > | | /* IO finished */ > | | end_compressed_writeback() > | | |btrfs_folio_clear_writeback() > | | | /* Again the last holder */ > | | |- spin_lock() > | | |- last = true > | | |- spin_unlock() > |- folio_end_writeback() | |- folio_end_writeback() > > I know the most proper solution would to get rid of the delayed unlock > and submission, but mark blocks for writeback without the async extent > mechanism completely, then follow the iomap's solution. > > But that will be a huge change (although we will go that path > eventually), meanwhile the folio_end_writeback() inside spinlock needs > to be fixed asap. I'd suggest the asap solution is for btrfs to mark itself as not supporting dropbehind. > So my question is, can we allow folio_end_writeback() to be called > inside a spinlock, at the cost of screwing up the folio reclaim for > DONTCACHE? > > Thanks, > Qu >