On Fri, Nov 04, 2022 at 10:48:08AM +0530, Chandan Babu R wrote: > From: "Darrick J. Wong" <darrick.wong@xxxxxxxxxx> > > commit 78bba5c812cc651cee51b64b786be926ab7fe2a9 upstream. > > While QAing the new xfs_repair quotacheck code, I uncovered a quota > corruption bug resulting from a bad interaction between dquot buffer > initialization and quotacheck. The bug can be reproduced with the > following sequence: > > # mkfs.xfs -f /dev/sdf > # mount /dev/sdf /opt -o usrquota > # su nobody -s /bin/bash -c 'touch /opt/barf' > # sync > # xfs_quota -x -c 'report -ahi' /opt > User quota on /opt (/dev/sdf) > Inodes > User ID Used Soft Hard Warn/Grace > ---------- --------------------------------- > root 3 0 0 00 [------] > nobody 1 0 0 00 [------] > > # xfs_io -x -c 'shutdown' /opt > # umount /opt > # mount /dev/sdf /opt -o usrquota > # touch /opt/man2 > # xfs_quota -x -c 'report -ahi' /opt > User quota on /opt (/dev/sdf) > Inodes > User ID Used Soft Hard Warn/Grace > ---------- --------------------------------- > root 1 0 0 00 [------] > nobody 1 0 0 00 [------] > > # umount /opt Looks good to me, now the series is complete: Acked-by: Darrick J. Wong <djwong@xxxxxxxxxx> --D > > Notice how the initial quotacheck set the root dquot icount to 3 > (rootino, rbmino, rsumino), but after shutdown -> remount -> recovery, > xfs_quota reports that the root dquot has only 1 icount. We haven't > deleted anything from the filesystem, which means that quota is now > under-counting. This behavior is not limited to icount or the root > dquot, but this is the shortest reproducer. > > I traced the cause of this discrepancy to the way that we handle ondisk > dquot updates during quotacheck vs. regular fs activity. Normally, when > we allocate a disk block for a dquot, we log the buffer as a regular > (dquot) buffer. Subsequent updates to the dquots backed by that block > are done via separate dquot log item updates, which means that they > depend on the logged buffer update being written to disk before the > dquot items. Because individual dquots have their own LSN fields, that > initial dquot buffer must always be recovered. > > However, the story changes for quotacheck, which can cause dquot block > allocations but persists the final dquot counter values via a delwri > list. Because recovery doesn't gate dquot buffer replay on an LSN, this > means that the initial dquot buffer can be replayed over the (newer) > contents that were delwritten at the end of quotacheck. In effect, this > re-initializes the dquot counters after they've been updated. If the > log does not contain any other dquot items to recover, the obsolete > dquot contents will not be corrected by log recovery. > > Because quotacheck uses a transaction to log the setting of the CHKD > flags in the superblock, we skip quotacheck during the second mount > call, which allows the incorrect icount to remain. > > Fix this by changing the ondisk dquot initialization function to use > ordered buffers to write out fresh dquot blocks if it detects that we're > running quotacheck. If the system goes down before quotacheck can > complete, the CHKD flags will not be set in the superblock and the next > mount will run quotacheck again, which can fix uninitialized dquot > buffers. This requires amending the defer code to maintaine ordered > buffer state across defer rolls for the sake of the dquot allocation > code. > > For regular operations we preserve the current behavior since the dquot > items require properly initialized ondisk dquot records. > > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx> > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > Signed-off-by: Chandan Babu R <chandan.babu@xxxxxxxxxx> > --- > fs/xfs/libxfs/xfs_defer.c | 10 ++++++- > fs/xfs/xfs_dquot.c | 56 ++++++++++++++++++++++++++++++--------- > 2 files changed, 52 insertions(+), 14 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c > index 22557527cfdb..8cc3faa62404 100644 > --- a/fs/xfs/libxfs/xfs_defer.c > +++ b/fs/xfs/libxfs/xfs_defer.c > @@ -234,10 +234,13 @@ xfs_defer_trans_roll( > struct xfs_log_item *lip; > struct xfs_buf *bplist[XFS_DEFER_OPS_NR_BUFS]; > struct xfs_inode *iplist[XFS_DEFER_OPS_NR_INODES]; > + unsigned int ordered = 0; /* bitmap */ > int bpcount = 0, ipcount = 0; > int i; > int error; > > + BUILD_BUG_ON(NBBY * sizeof(ordered) < XFS_DEFER_OPS_NR_BUFS); > + > list_for_each_entry(lip, &tp->t_items, li_trans) { > switch (lip->li_type) { > case XFS_LI_BUF: > @@ -248,7 +251,10 @@ xfs_defer_trans_roll( > ASSERT(0); > return -EFSCORRUPTED; > } > - xfs_trans_dirty_buf(tp, bli->bli_buf); > + if (bli->bli_flags & XFS_BLI_ORDERED) > + ordered |= (1U << bpcount); > + else > + xfs_trans_dirty_buf(tp, bli->bli_buf); > bplist[bpcount++] = bli->bli_buf; > } > break; > @@ -289,6 +295,8 @@ xfs_defer_trans_roll( > /* Rejoin the buffers and dirty them so the log moves forward. */ > for (i = 0; i < bpcount; i++) { > xfs_trans_bjoin(tp, bplist[i]); > + if (ordered & (1U << i)) > + xfs_trans_ordered_buf(tp, bplist[i]); > xfs_trans_bhold(tp, bplist[i]); > } > > diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c > index 9596b86e7de9..6231b155e7f3 100644 > --- a/fs/xfs/xfs_dquot.c > +++ b/fs/xfs/xfs_dquot.c > @@ -205,16 +205,18 @@ xfs_qm_adjust_dqtimers( > */ > STATIC void > xfs_qm_init_dquot_blk( > - xfs_trans_t *tp, > - xfs_mount_t *mp, > - xfs_dqid_t id, > - uint type, > - xfs_buf_t *bp) > + struct xfs_trans *tp, > + struct xfs_mount *mp, > + xfs_dqid_t id, > + uint type, > + struct xfs_buf *bp) > { > struct xfs_quotainfo *q = mp->m_quotainfo; > - xfs_dqblk_t *d; > - xfs_dqid_t curid; > - int i; > + struct xfs_dqblk *d; > + xfs_dqid_t curid; > + unsigned int qflag; > + unsigned int blftype; > + int i; > > ASSERT(tp); > ASSERT(xfs_buf_islocked(bp)); > @@ -238,11 +240,39 @@ xfs_qm_init_dquot_blk( > } > } > > - xfs_trans_dquot_buf(tp, bp, > - (type & XFS_DQ_USER ? XFS_BLF_UDQUOT_BUF : > - ((type & XFS_DQ_PROJ) ? XFS_BLF_PDQUOT_BUF : > - XFS_BLF_GDQUOT_BUF))); > - xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1); > + if (type & XFS_DQ_USER) { > + qflag = XFS_UQUOTA_CHKD; > + blftype = XFS_BLF_UDQUOT_BUF; > + } else if (type & XFS_DQ_PROJ) { > + qflag = XFS_PQUOTA_CHKD; > + blftype = XFS_BLF_PDQUOT_BUF; > + } else { > + qflag = XFS_GQUOTA_CHKD; > + blftype = XFS_BLF_GDQUOT_BUF; > + } > + > + xfs_trans_dquot_buf(tp, bp, blftype); > + > + /* > + * quotacheck uses delayed writes to update all the dquots on disk in an > + * efficient manner instead of logging the individual dquot changes as > + * they are made. However if we log the buffer allocated here and crash > + * after quotacheck while the logged initialisation is still in the > + * active region of the log, log recovery can replay the dquot buffer > + * initialisation over the top of the checked dquots and corrupt quota > + * accounting. > + * > + * To avoid this problem, quotacheck cannot log the initialised buffer. > + * We must still dirty the buffer and write it back before the > + * allocation transaction clears the log. Therefore, mark the buffer as > + * ordered instead of logging it directly. This is safe for quotacheck > + * because it detects and repairs allocated but initialized dquot blocks > + * in the quota inodes. > + */ > + if (!(mp->m_qflags & qflag)) > + xfs_trans_ordered_buf(tp, bp); > + else > + xfs_trans_log_buf(tp, bp, 0, BBTOB(q->qi_dqchunklen) - 1); > } > > /* > -- > 2.35.1 >