On Wed, Sep 16, 2020 at 08:28:49PM -0700, Darrick J. Wong wrote: > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > During a code inspection, I found a serious bug in the log intent item > recovery code when an intent item cannot complete all the work and > decides to requeue itself to get that done. When this happens, the > item recovery creates a new incore deferred op representing the > remaining work and attaches it to the transaction that it allocated. At > the end of _item_recover, it moves the entire chain of deferred ops to > the dummy parent_tp that xlog_recover_process_intents passed to it, but > fail to log a new intent item for the remaining work before committing > the transaction for the single unit of work. > > xlog_finish_defer_ops logs those new intent items once recovery has > finished dealing with the intent items that it recovered, but this isn't > sufficient. If the log is forced to disk after a recovered log item > decides to requeue itself and the system goes down before we call > xlog_finish_defer_ops, the second log recovery will never see the new > intent item and therefore has no idea that there was more work to do. > It will finish recovery leaving the filesystem in a corrupted state. > > The same logic applies to /any/ deferred ops added during intent item > recovery, not just the one handling the remaining work. Looks good: Reviewed-by: Christoph Hellwig <hch@xxxxxx> I wonder how we could come up with a reliable reproducer for this, though..