On Thu, Sep 17, 2020 at 10:07:42AM +0100, Christoph Hellwig wrote: > On Wed, Sep 16, 2020 at 08:28:49PM -0700, Darrick J. Wong wrote: > > From: Darrick J. Wong <darrick.wong@xxxxxxxxxx> > > > > During a code inspection, I found a serious bug in the log intent item > > recovery code when an intent item cannot complete all the work and > > decides to requeue itself to get that done. When this happens, the > > item recovery creates a new incore deferred op representing the > > remaining work and attaches it to the transaction that it allocated. At > > the end of _item_recover, it moves the entire chain of deferred ops to > > the dummy parent_tp that xlog_recover_process_intents passed to it, but > > fail to log a new intent item for the remaining work before committing > > the transaction for the single unit of work. > > > > xlog_finish_defer_ops logs those new intent items once recovery has > > finished dealing with the intent items that it recovered, but this isn't > > sufficient. If the log is forced to disk after a recovered log item > > decides to requeue itself and the system goes down before we call > > xlog_finish_defer_ops, the second log recovery will never see the new > > intent item and therefore has no idea that there was more work to do. > > It will finish recovery leaving the filesystem in a corrupted state. > > > > The same logic applies to /any/ deferred ops added during intent item > > recovery, not just the one handling the remaining work. > > Looks good: > > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > > I wonder how we could come up with a reliable reproducer for this, > though.. Yeah, I've never actually seen this trip in practice. I suppose we could add an error injection point to force the log and bail out midway through recovery, but that won't help much on unfixed kernels. --D