On Fri, Jun 08, 2018 at 08:07:09AM -0400, Brian Foster wrote: > On Fri, Jun 08, 2018 at 09:27:13AM +1000, Dave Chinner wrote: > > On Thu, Jun 07, 2018 at 08:41:25AM -0400, Brian Foster wrote: > > > If a delwri queue occurs of a buffer that was previously submitted > > > from a delwri queue but has not yet been removed from a wait list, > > > the queue sets _XBF_DELWRI_Q without changing the state of ->b_list. > > > This occurs, for example, if another thread beats the submitter > > > thread to the buffer lock after I/O completion. Once the submitter > > > acquires the lock, it removes the buffer from the wait list and > > > leaves a buffer with _XBF_DELWRI_Q set but not populated on a list. > > > This results in a lost buffer submission and in turn can result in > > > assert failures due to _XBF_DELWRI_Q being set on buffer reclaim or > > > filesystem lockups if the buffer happens to cover an item in the > > > AIL. > > > > I just so happened to have this ASSERT happen over night on > > generic/232 testing some code I wrote yesterday. It never ceases to > > amaze me how bugs that have been around for ages always seem to be > > hit at the same time by different people in completely different > > contexts.... > > > > Interesting, out of curiosity was this in a memory limited environment? No. had GBs of free RAM, but it's on a fakenuma setup so a node could have been low on free memory and hence running the quota shrinker.... .... > > This seems all a bit broken. > > Yes, the premise of this was to do something that didn't break it > further. ;) I figured using sync I/O would also address the problem, but > would introduce terrible submission->completion serialization... *nod* > So essentially take apart sync buffer I/O so we can do batched > submission/completion. That sounds like a nice idea to me. *nod* > Feedback on the code to follow. That aside, are you planning to turn > this into a real patch submission or would you like me to do it? I've got plenty of other things to do - if you want to take it an run I'm fine with that :P > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > > index a9678c2f3810..40f441e96dc5 100644 > > --- a/fs/xfs/xfs_buf.c > > +++ b/fs/xfs/xfs_buf.c > > @@ -1453,27 +1453,20 @@ _xfs_buf_ioapply( > > } > > > > /* > > - * Asynchronous IO submission path. This transfers the buffer lock ownership and > > - * the current reference to the IO. It is not safe to reference the buffer after > > - * a call to this function unless the caller holds an additional reference > > - * itself. > > + * Internal I/O submission helpers for split submission and waiting actions. > > */ > > -void > > -xfs_buf_submit( > > +static int > > +__xfs_buf_submit( > > It looks like the buffer submission refactoring could be a separate > patch from the delwri queue race fix. Yup, it probably should be. > > @@ -1483,12 +1476,8 @@ xfs_buf_submit( > > bp->b_io_error = 0; > > > > /* > > - * The caller's reference is released during I/O completion. > > - * This occurs some time after the last b_io_remaining reference is > > - * released, so after we drop our Io reference we have to have some > > - * other reference to ensure the buffer doesn't go away from underneath > > - * us. Take a direct reference to ensure we have safe access to the > > - * buffer until we are finished with it. > > + * I/O needs a reference, because the caller may go away before we are > > + * done with the IO. Completion will deal with it. > > */ > > xfs_buf_hold(bp); > > I think this should be lifted in the callers. For one, it's confusing to > follow. Both paths take a reference here, both take it because the IO itself needs a reference. The only difference is when it is dropped. > Second, it looks like xfs_buf_submit() unconditionally drops a > reference whereas __xfs_buf_submit() may not acquire one (i.e. when we > return an error). > > ISTM that the buffer reference calls could be balanced in the top-level > submit functions rather than split between the common submission path > and unique sync completion path. Yes, that may end up being cleaner, because the delwri path has to take a reference across submit/complete anyway. I just chopped quickl away at the code without thinking about this stuff too deeply. I did say "untested", right? :P > > void * > > @@ -2045,14 +2035,21 @@ xfs_buf_delwri_submit_buffers( > > * at this point so the caller can still access it. > > */ > > bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_WRITE_FAIL); > > - bp->b_flags |= XBF_WRITE | XBF_ASYNC; > > + bp->b_flags |= XBF_WRITE; > > We set XBF_ASYNC below in the specific case, but this doesn't tell us > anything about whether it might have already been set on the buffer. Is > it not the responsibility of this function to set/clear XBF_ASYNC > appropriately? XBF_ASYNC should be clear (as the buffer is not under IO) buti, yes, I agree that it would be a good idea to clear it anyway. > > if (wait_list) { > > + /* > > + * Split synchronous IO - we wait later, so we need ai > > + * reference until we run completion processing and drop > > + * the buffer lock ourselves > > + */ > > Might as well merge this with the comment above, which needs fixing > anyways since we no longer "do all IO submission async." > > > xfs_buf_hold(bp); > > I think the buffer reference counting is now broken here. More than likely. :P .... > > list_move_tail(&bp->b_list, wait_list); > > - } else > > + __xfs_buf_submit(bp); > > I suspect we need to handle submission errors here, otherwise we wait on > a buffer that was never submitted. Yup. > One final thought.. ISTM that the nature of batched sync buffer > submission means that once we wait on one or two of those buffers, > there's a good chance many of the remaining buffer physical I/Os will > have completed by the time we get to the associated iowait. That means > that the current behavior of large (sync) delwri buffer completions > running in the async completion workqueue most likely changes to running > from __xfs_buf_iowait()->xfs_buf_ioend() context. I'm not sure that > really matters, but just something to note. Yup, that's something that crossed my mind. But we only wait for buffers to complete in log recovery, quota check and the /quota shrinker/ (now the generic/232 failure makes sense, doesn't it? :). Hence I don't think there's a performance issue we need to worry about - correctness is more important... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html