Re: [PATCH 1/2] xfs: log new intent items created as part of finishing recovered intent items

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Thu, 17 Sep 2020 10:45:24 -0700

On Thu, Sep 17, 2020 at 10:07:42AM +0100, Christoph Hellwig wrote:
> On Wed, Sep 16, 2020 at 08:28:49PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > 
> > During a code inspection, I found a serious bug in the log intent item
> > recovery code when an intent item cannot complete all the work and
> > decides to requeue itself to get that done.  When this happens, the
> > item recovery creates a new incore deferred op representing the
> > remaining work and attaches it to the transaction that it allocated.  At
> > the end of _item_recover, it moves the entire chain of deferred ops to
> > the dummy parent_tp that xlog_recover_process_intents passed to it, but
> > fail to log a new intent item for the remaining work before committing
> > the transaction for the single unit of work.
> > 
> > xlog_finish_defer_ops logs those new intent items once recovery has
> > finished dealing with the intent items that it recovered, but this isn't
> > sufficient.  If the log is forced to disk after a recovered log item
> > decides to requeue itself and the system goes down before we call
> > xlog_finish_defer_ops, the second log recovery will never see the new
> > intent item and therefore has no idea that there was more work to do.
> > It will finish recovery leaving the filesystem in a corrupted state.
> > 
> > The same logic applies to /any/ deferred ops added during intent item
> > recovery, not just the one handling the remaining work.
> 
> Looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> 
> I wonder how we could come up with a reliable reproducer for this,
> though..

Yeah, I've never actually seen this trip in practice.  I suppose we
could add an error injection point to force the log and bail out midway
through recovery, but that won't help much on unfixed kernels.

--D