Re: [PATCH 3/3] ext4: Use generic_quota_read()

Jan Kara <jack@xxxxxxx> · Wed, 8 Jun 2022 16:21:38 +0200

On Wed 08-06-22 02:42:21, Matthew Wilcox wrote:
> On Mon, Jun 06, 2022 at 10:38:14AM +0200, Jan Kara wrote:
> > On Sun 05-06-22 15:38:15, Matthew Wilcox (Oracle) wrote:
> > > The comment about the page cache is rather stale; the buffer cache will
> > > read into the page cache if the buffer isn't present, and the page cache
> > > will not take any locks if the page is present.
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx>
> > 
> > This will not work for couple of reasons, see below. BTW, I don't think the
> > comment about page cache was stale (but lacking details I admit ;). As far
> > as I remember (and it was really many years ago - definitely pre-git era)
> > the problem was (mainly on the write side) that before current state of the
> > code we were using calls like vfs_read() / vfs_write() to get quota
> > information and that was indeed prone to deadlocks.
> 
> Ah yes, vfs_write() might indeed be prone to deadlocks.  Particularly
> if we're doing it under the dq_mutex and any memory allocation might
> have recursed into reclaim ;-)
> 
> I actually found the commit in linux-fullhistory.  Changelog for
> context:
> 
> commit b72debd66a6ed
> Author: Jan Kara <jack@xxxxxxx>
> Date:   Mon Jan 3 04:12:24 2005 -0800
> 
>     [PATCH] Fix of quota deadlock on pagelock: quota core

\o/ to you history searching skills :)

> > > @@ -6924,20 +6882,21 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type,
> > >  		return -EIO;
> > >  	}
> > >  
> > > -	do {
> > > -		bh = ext4_bread(handle, inode, blk,
> > > -				EXT4_GET_BLOCKS_CREATE |
> > > -				EXT4_GET_BLOCKS_METADATA_NOFAIL);
> > > -	} while (PTR_ERR(bh) == -ENOSPC &&
> > > -		 ext4_should_retry_alloc(inode->i_sb, &retries));
> > > -	if (IS_ERR(bh))
> > > -		return PTR_ERR(bh);
> > > -	if (!bh)
> > > +	folio = read_mapping_folio(inode->i_mapping, off / PAGE_SIZE, NULL);
> > > +	if (IS_ERR(folio))
> > > +		return PTR_ERR(folio);
> > > +	head = folio_buffers(folio);
> > > +	if (!head)
> > > +		head = alloc_page_buffers(&folio->page, sb->s_blocksize, false);
> > > +	if (!head)
> > >  		goto out;
> > > +	bh = head;
> > > +	while ((bh_offset(bh) + sb->s_blocksize) <= (off % PAGE_SIZE))
> > > +		bh = bh->b_this_page;
> > 
> > We miss proper handling of blocks that are currently beyond i_size
> > (we are extending the quota file), plus we also miss any mapping of buffers
> > to appropriate disk blocks here...
> > 
> > It could be all fixed by replicating what we do in ext4_write_begin() but
> > I'm not quite convinced using inode's page cache is really worth it...
> 
> Ah, yes, write_begin.  Of course that's what I should have used.
> 
> I'm looking at this from the point of view of removing buffer_heads
> where possible.  Of course, it's not possible for ext4 while the journal
> relies on buffer_heads, but if we can steer filesystems away from using
> sb_bread() (or equivalents), I think that's a good thing.

Well, ext4 uses sb_bread() (sb_getblk()) for all its metadata so quota
code, which is rather well localized, is the least of your worries I'm
afraid ;).

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR