Re: [PATCH 5/6] iomap: drop unnecessary state_lock when setting ifs uptodate bits

Zhang Yi <yi.zhang@xxxxxxxxxxxxxxx> · Fri, 2 Aug 2024 10:57:41 +0800

On 2024/8/2 8:05, Dave Chinner wrote:
> On Wed, Jul 31, 2024 at 05:13:04PM +0800, Zhang Yi wrote:
>> From: Zhang Yi <yi.zhang@xxxxxxxxxx>
>>
>> Commit '1cea335d1db1 ("iomap: fix sub-page uptodate handling")' fix a
>> race issue when submitting multiple read bios for a page spans more than
>> one file system block by adding a spinlock(which names state_lock now)
>> to make the page uptodate synchronous. However, the race condition only
>> happened between the read I/O submitting and completeing threads,
> 
> when we do writeback on a folio that has multiple blocks on it we
> can submit multiple bios for that, too. Hence the write completions
> can race with each other and write submission, too.
> 
> Yes, write bio submission and completion only need to update ifs
> accounting using an atomic operation, but the same race condition
> exists even though the folio is fully locked at the point of bio
> submission.
> 
> 
>> it's
>> sufficient to use page lock to protect other paths, e.g. buffered write
>                     ^^^^ folio
>> path.
>>
>> After large folio is supported, the spinlock could affect more
>> about the buffered write performance, so drop it could reduce some
>> unnecessary locking overhead.
> 
> From the point of view of simple to understand and maintain code, I
> think this is a bad idea. The data structure is currently protected
> by the state lock in all situations, but this change now makes it
> protected by the state lock in one case and the folio lock in a
> different case.

Yeah, I agree that this is a side-effect of this change, after this patch,
we have to be careful to distinguish between below two cases B1 and B2 as
Willy mentioned.

B. If ifs_set_range_uptodate() is called from iomap_set_range_uptodate(),
   either we know:
B1. The caller of iomap_set_range_uptodate() holds the folio lock, and this
    is the only place that can call ifs_set_range_uptodate() for this folio
B2. The caller of iomap_set_range_uptodate() holds the state lock

> 
> Making this change also misses the elephant in the room: the
> buffered write path still needs the ifs->state_lock to update the
> dirty bitmap. Hence we're effectively changing the serialisation
> mechanism for only one of the two ifs state bitmaps that the
> buffered write path has to update.
> 
> Indeed, we can't get rid of the ifs->state_lock from the dirty range
> updates because iomap_dirty_folio() can be called without the folio
> being locked through folio_mark_dirty() calling the ->dirty_folio()
> aop.
> 

Sorry, I don't understand, why folio_mark_dirty() could be called without
folio lock (isn't this supposed to be a bug)? IIUC, all the file backed
folios must be locked before marking dirty. Are there any exceptions or am
I missing something?

> IOWs, getting rid of the state lock out of the uptodate range
> changes does not actually get rid of it from the buffered IO patch.
> we still have to take it to update the dirty range, and so there's
> an obvious way to optimise the state lock usage without changing any
> of the bitmap access serialisation behaviour. i.e.  We combine the
> uptodate and dirty range updates in __iomap_write_end() into a
> single lock context such as:
> 
> iomap_set_range_dirty_uptodate()
> {
> 	struct iomap_folio_state *ifs = folio->private;
> 	struct inode *inode:
>         unsigned int blks_per_folio;
>         unsigned int first_blk;
>         unsigned int last_blk;
>         unsigned int nr_blks;
> 	unsigned long flags;
> 
> 	if (!ifs)
> 		return;
> 
> 	inode = folio->mapping->host;
> 	blks_per_folio = i_blocks_per_folio(inode, folio);
> 	first_blk = (off >> inode->i_blkbits);
> 	last_blk = (off + len - 1) >> inode->i_blkbits;
> 	nr_blks = last_blk - first_blk + 1;
> 
> 	spin_lock_irqsave(&ifs->state_lock, flags);
> 	bitmap_set(ifs->state, first_blk, nr_blks);
> 	bitmap_set(ifs->state, first_blk + blks_per_folio, nr_blks);
> 	spin_unlock_irqrestore(&ifs->state_lock, flags);
> }
> 
> This means we calculate the bitmap offsets only once, we take the
> state lock only once, and we don't do anything if there is no
> sub-folio state.
> 
> If we then fix the __iomap_write_begin() code as Willy pointed out
> to elide the erroneous uptodate range update, then we end up only
> taking the state lock once per buffered write instead of 3 times per
> write.
> 
> This patch only reduces it to twice per buffered write, so doing the
> above should provide even better performance without needing to
> change the underlying serialisation mechanism at all.
> 

Thanks for the suggestion. I've thought about this solution too, but I
didn't think we need the state_lock when setting ifs dirty bit since the
folio lock should work, so I changed my mind and planed to drop all ifs
state_lock in the write path (please see the patch 6). Please let me
know if I'm wrong.

Thanks,
Yi.