On Tue, Mar 07, 2017 at 10:56:05AM -0500, Josef Bacik wrote: > On Tue, 2017-03-07 at 12:22 +1100, Dave Chinner wrote: > > On Mon, Mar 06, 2017 at 02:00:13PM -0500, Josef Bacik wrote: > > > > > > so you retry once, and the second time through we will return false > > > if > > > we are unmounting, which means that xfs_buf_do_callbacks gets > > > called. > > > > > > Now you say that xfs_buf_do_callbacks() being called when we > > > errored > > > out is dangerous, but I don't understand why. We have this code > > > > > > if (need_ail) { > > > struct xfs_log_item *log_items[need_ail]; > > > int i = 0; > > > spin_lock(&ailp->xa_lock); > > > for (blip = lip; blip; blip = blip->li_bio_list) { > > > iip = INODE_ITEM(blip); > > > if (iip->ili_logged && > > > blip->li_lsn == iip->ili_flush_lsn) { > > > log_items[i++] = blip; > > > } > > > ASSERT(i <= need_ail); > > > } > > > /* xfs_trans_ail_delete_bulk() drops the AIL lock. > > > */ > > > xfs_trans_ail_delete_bulk(ailp, log_items, i, > > > SHUTDOWN_CORRUPT_INCORE); > > > } > > [ Your cut-n-paste is horribly mangled in something non-ascii. ] > > > > > > > > and need_ail is set if we didn't actually end up on disk properly. > > > So > > > we delete the thing and mark the fs as fucked. Am I mis-reading > > > what > > > is going on here? > > Yes. That won't shut down the filesystem on an inode write error. It > > will simply remove the inode from the AIL, the error will do dropped > > on the floor, and the filesystem will not be shut down even though > > it's now in a corrupt state. xfs_trans_ail_delete_bulk() only shuts > > down the filesytem if it doesn't find the item being removed form > > the AIL in the AIL. > > > > As I keep telling people, what needs to happen is this: > > > > 1. buffer write error occurs > > 2. retries exhausted, mark buffer as errored, run callbacks > > 3. xfs_iflush_done() (and other iodone callbacks) need to > > look at the buffer error state to determine what to do. > > 4. if the buffer is errored and recovery is not possible, > > the callback needs to shut down the filesytem. > > 5. the callback then needs to call flush abort function for > > item (e.g. xfs_iflush_abort()) rather than the normal > > IO completion processing path. > > > > > > > > This is what I did to our 4.6 tree (in testing > > > still, not actually pushed to production) > > > > > > if (bp->b_flags & XBF_ASYNC) { > > > ASSERT(bp->b_iodone != NULL); > > > > > > trace_xfs_buf_item_iodone_async(bp, _RET_IP_); > > > > > > xfs_buf_ioerror(bp, 0); /* errno of 0 unsets the > > > flag > > > */ > > > > > > if (!(bp->b_flags & (XBF_STALE|XBF_WRITE_FAIL))) { > > > bp->b_flags |= XBF_WRITE | XBF_ASYNC > > > XBF_DONE | XBF_WRITE_FAIL; > > > xfs_buf_submit(bp); > > > } else { > > > + xfs_buf_do_callbacks(bp); > > > + bp->b_fspriv = NULL; > > > + bp->b_iodone = NULL; > > > xfs_buf_relse(bp); > > > } > > > > > > return; > > > } > > It's likely you'll end up with silent on disk corruption if you do > > that because it's just tossing the error state away. > > I think I'm getting a handle on this, but I wonder if we don't have > this problem currently? We get an error on a buffer, we retry it and > it fails again. If we don't have XFS_ERR_RETRY_FOREVER set then we > error out and do the normal callbacks as if nothing ever happened, and > we get the silent corruption without shutting down the file system. > How is the code as it stands without XFS_ERR_RETRY_FOREVER safe? > Thanks, > We shouldn't get into a situation where we run the callbacks when the I/O has failed and the fs isn't shut down. IOWs, we do run the callbacks only when the I/O succeeds, or it fails and the fs is shutdown as a result (never when the I/O fails but we haven't shut down, which is the problematic case the patch above introduces). That tunable error handling logic (XFS_ERR_RETRY_FOREVER) basically determines when an I/O error is considered permanent, which means retries are futile and we have no choice but to shutdown the fs. E.g., if an error is deemed permanent, we jump to permanent_error: /* * Permanent error - we need to trigger a shutdown if we haven't already * to indicate that inconsistency will result from this action. */ permanent_error: xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR); out_stale: xfs_buf_stale(bp); bp->b_flags |= XBF_DONE; trace_xfs_buf_error_relse(bp, _RET_IP_); return false; } ... which shuts down the filesystem before we go any farther. Brian > Josef -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html