On Tue, Feb 21, 2017 at 08:25:30AM -0500, Brian Foster wrote: > On Tue, Feb 21, 2017 at 10:30:57AM +0100, Carlos Maiolino wrote: > > > > + if (lip->li_flags & XFS_LI_FAILED) { > > > > + printk("#### ITEM BUFFER FAILED PREVIOUSLY, inode: %llu\n", > > > > + ip->i_ino); > > > > + error = xfs_imap_to_bp(ip->i_mount, NULL, &ip->i_imap, > > > > + &dip, &bp, XBF_TRYLOCK, 0); > > > > + > > > > + if (error) { > > > > + rval = XFS_ITEM_FLUSHING; > > > > + goto out_unlock; > > > > + } > > > > + > > > > + if (!(bp->b_flags & XBF_WRITE_FAIL)) { > > > > + rval = XFS_ITEM_FLUSHING; > > > > + xfs_buf_relse(bp); > > > > + goto out_unlock; > > > > + } > > > > + > > > > + if (!xfs_buf_delwri_queue(bp, buffer_list)) { > > > > + printk("#### QUEUEING AGAIN\n"); > > > > + rval = XFS_ITEM_FLUSHING; > > > > + } > > > > + > > > > > > We need to clear the LI_FAILED state once we've queued it for retry. > > > > I tried to do this to clear the flag: > > > > if (!xfs_buf_delwri_queue(bp, buffer_list)) { > > lip->li_flags &= ~XFS_LI_FAILED; > > printk("#### QUEUEING AGAIN\n"); > > rval = XFS_ITEM_FLUSHING; > > } > > > > There is something wrong here that I'm trying to figure out. When I clear the > > flag, my reproducer ends up in a stack overflow, see below (I removed the linked > > in modules from the stack to save some space). I just started to investigate > > why such overflow happened, but I thought it might be worth to post giving that > > I'm still learning how buffers and the log items work together, maybe you or > > somebody else might have an idea. > > > > Nothing stands out to me on a quick look, but it's been a while since > I've looked at this. It does look like we should clear the failed flag > regardless of whether the queue succeeds or not, though (here you clear > it only when the queue fails). FWIW, I think this is the wrong way to handle a buffer error, both from a philosophical POV and from an XFS IO model POV. That is: IO submission should not need to handle a previous write failure that didn't clean up buffer and object state correctly. The IO completion handler of the failed IO should have done all the required cleanup so the submission path should not require any modifications at all. Keep in mind the way inode cluster flushing works: 1. flush first inode in cluster 2. call xfs-iflush_cluster() 3. flush lock and flush all inodes on cluster buffer Now, this means the other 31 dirty inodes in the cluster that were clustered and flush locked will now get hit your new code when the AIL tries to push them and find the flush locked. It will do buffer lookups for 31 inodes that it didn't need to, i.e. failing to get an inode flush lock in AIL pushing is a /very common occurrence/ whilst buffer writeback failure is extremely rare. i.e. we need to unlock and mark the inodes as failed from the buffer IO completion callbacks, not poll inode buffers for failure if we can't get a flush lock on an inode. i.e. we need a "soft-error" equivalent of xfs_iflush_abort() that doesn't shutdown the fs or remove objects from the AIL unless the filesystem has already been shut down.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html