On Wed 08-06-22 02:42:21, Matthew Wilcox wrote: > On Mon, Jun 06, 2022 at 10:38:14AM +0200, Jan Kara wrote: > > On Sun 05-06-22 15:38:15, Matthew Wilcox (Oracle) wrote: > > > The comment about the page cache is rather stale; the buffer cache will > > > read into the page cache if the buffer isn't present, and the page cache > > > will not take any locks if the page is present. > > > > > > Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > > > > This will not work for couple of reasons, see below. BTW, I don't think the > > comment about page cache was stale (but lacking details I admit ;). As far > > as I remember (and it was really many years ago - definitely pre-git era) > > the problem was (mainly on the write side) that before current state of the > > code we were using calls like vfs_read() / vfs_write() to get quota > > information and that was indeed prone to deadlocks. > > Ah yes, vfs_write() might indeed be prone to deadlocks. Particularly > if we're doing it under the dq_mutex and any memory allocation might > have recursed into reclaim ;-) > > I actually found the commit in linux-fullhistory. Changelog for > context: > > commit b72debd66a6ed > Author: Jan Kara <jack@xxxxxxx> > Date: Mon Jan 3 04:12:24 2005 -0800 > > [PATCH] Fix of quota deadlock on pagelock: quota core \o/ to you history searching skills :) > > > @@ -6924,20 +6882,21 @@ static ssize_t ext4_quota_write(struct super_block *sb, int type, > > > return -EIO; > > > } > > > > > > - do { > > > - bh = ext4_bread(handle, inode, blk, > > > - EXT4_GET_BLOCKS_CREATE | > > > - EXT4_GET_BLOCKS_METADATA_NOFAIL); > > > - } while (PTR_ERR(bh) == -ENOSPC && > > > - ext4_should_retry_alloc(inode->i_sb, &retries)); > > > - if (IS_ERR(bh)) > > > - return PTR_ERR(bh); > > > - if (!bh) > > > + folio = read_mapping_folio(inode->i_mapping, off / PAGE_SIZE, NULL); > > > + if (IS_ERR(folio)) > > > + return PTR_ERR(folio); > > > + head = folio_buffers(folio); > > > + if (!head) > > > + head = alloc_page_buffers(&folio->page, sb->s_blocksize, false); > > > + if (!head) > > > goto out; > > > + bh = head; > > > + while ((bh_offset(bh) + sb->s_blocksize) <= (off % PAGE_SIZE)) > > > + bh = bh->b_this_page; > > > > We miss proper handling of blocks that are currently beyond i_size > > (we are extending the quota file), plus we also miss any mapping of buffers > > to appropriate disk blocks here... > > > > It could be all fixed by replicating what we do in ext4_write_begin() but > > I'm not quite convinced using inode's page cache is really worth it... > > Ah, yes, write_begin. Of course that's what I should have used. > > I'm looking at this from the point of view of removing buffer_heads > where possible. Of course, it's not possible for ext4 while the journal > relies on buffer_heads, but if we can steer filesystems away from using > sb_bread() (or equivalents), I think that's a good thing. Well, ext4 uses sb_bread() (sb_getblk()) for all its metadata so quota code, which is rather well localized, is the least of your worries I'm afraid ;). Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR