On 2024/8/2 8:05, Dave Chinner wrote: > On Wed, Jul 31, 2024 at 05:13:04PM +0800, Zhang Yi wrote: >> From: Zhang Yi <yi.zhang@xxxxxxxxxx> >> >> Commit '1cea335d1db1 ("iomap: fix sub-page uptodate handling")' fix a >> race issue when submitting multiple read bios for a page spans more than >> one file system block by adding a spinlock(which names state_lock now) >> to make the page uptodate synchronous. However, the race condition only >> happened between the read I/O submitting and completeing threads, > > when we do writeback on a folio that has multiple blocks on it we > can submit multiple bios for that, too. Hence the write completions > can race with each other and write submission, too. > > Yes, write bio submission and completion only need to update ifs > accounting using an atomic operation, but the same race condition > exists even though the folio is fully locked at the point of bio > submission. > > >> it's >> sufficient to use page lock to protect other paths, e.g. buffered write > ^^^^ folio >> path. >> >> After large folio is supported, the spinlock could affect more >> about the buffered write performance, so drop it could reduce some >> unnecessary locking overhead. > > From the point of view of simple to understand and maintain code, I > think this is a bad idea. The data structure is currently protected > by the state lock in all situations, but this change now makes it > protected by the state lock in one case and the folio lock in a > different case. Yeah, I agree that this is a side-effect of this change, after this patch, we have to be careful to distinguish between below two cases B1 and B2 as Willy mentioned. B. If ifs_set_range_uptodate() is called from iomap_set_range_uptodate(), either we know: B1. The caller of iomap_set_range_uptodate() holds the folio lock, and this is the only place that can call ifs_set_range_uptodate() for this folio B2. The caller of iomap_set_range_uptodate() holds the state lock > > Making this change also misses the elephant in the room: the > buffered write path still needs the ifs->state_lock to update the > dirty bitmap. Hence we're effectively changing the serialisation > mechanism for only one of the two ifs state bitmaps that the > buffered write path has to update. > > Indeed, we can't get rid of the ifs->state_lock from the dirty range > updates because iomap_dirty_folio() can be called without the folio > being locked through folio_mark_dirty() calling the ->dirty_folio() > aop. > Sorry, I don't understand, why folio_mark_dirty() could be called without folio lock (isn't this supposed to be a bug)? IIUC, all the file backed folios must be locked before marking dirty. Are there any exceptions or am I missing something? > IOWs, getting rid of the state lock out of the uptodate range > changes does not actually get rid of it from the buffered IO patch. > we still have to take it to update the dirty range, and so there's > an obvious way to optimise the state lock usage without changing any > of the bitmap access serialisation behaviour. i.e. We combine the > uptodate and dirty range updates in __iomap_write_end() into a > single lock context such as: > > iomap_set_range_dirty_uptodate() > { > struct iomap_folio_state *ifs = folio->private; > struct inode *inode: > unsigned int blks_per_folio; > unsigned int first_blk; > unsigned int last_blk; > unsigned int nr_blks; > unsigned long flags; > > if (!ifs) > return; > > inode = folio->mapping->host; > blks_per_folio = i_blocks_per_folio(inode, folio); > first_blk = (off >> inode->i_blkbits); > last_blk = (off + len - 1) >> inode->i_blkbits; > nr_blks = last_blk - first_blk + 1; > > spin_lock_irqsave(&ifs->state_lock, flags); > bitmap_set(ifs->state, first_blk, nr_blks); > bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks); > spin_unlock_irqrestore(&ifs->state_lock, flags); > } > > This means we calculate the bitmap offsets only once, we take the > state lock only once, and we don't do anything if there is no > sub-folio state. > > If we then fix the __iomap_write_begin() code as Willy pointed out > to elide the erroneous uptodate range update, then we end up only > taking the state lock once per buffered write instead of 3 times per > write. > > This patch only reduces it to twice per buffered write, so doing the > above should provide even better performance without needing to > change the underlying serialisation mechanism at all. > Thanks for the suggestion. I've thought about this solution too, but I didn't think we need the state_lock when setting ifs dirty bit since the folio lock should work, so I changed my mind and planed to drop all ifs state_lock in the write path (please see the patch 6). Please let me know if I'm wrong. Thanks, Yi.