Re: io_submit() blocks for writes for substantial amount of time

Christoph Hellwig <hch@xxxxxxxxxxxxx> · Tue, 19 Sep 2017 07:58:27 -0700

On Tue, Sep 19, 2017 at 08:27:05AM -0400, Brian Foster wrote:
> > Please advise, is this a known bug? When can it happen? Is there a way
> > to work it around to avoid blocking?
> > 
> 
> I'm not sure how either could be considered a bug based on the stack
> trace information alone. Allocations may require reading metadata and
> reads are synchronous. This all seems like pretty basic filesystem
> behavior.
> 
> I suppose performance may be a separate question. For the latter issue,
> I'd be curious whether leaving more free space available in the
> filesystem would help avoid running into busy extents. Perhaps having
> more memory and thus a larger buffer cache for btree blocks could help
> mitigate the former issue..? The deterministic workaround for both is to
> preallocate the associated file. If the file would be too large, another
> option may be to set an extent size hint to allocate the file in larger
> chunks and amortize the cost of the allocations over multiple writes.

Note that Linux 4.13 and later support a RWF_NOWAIT flag, that will
return -EAGAIN from io_submit for these conditions so they can be
handled by a thread pool.

Note that until a few years ago we performed all allocations from
a workqueue, this was changed by:

commit cf11da9c5d374962913ca5ba0ce0886b58286224
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date:   Tue Jul 15 07:08:24 2014 +1000

    xfs: refine the allocation stack switch

to only defer btree splits to a workqueue.  With that previous scheme
there might have been an option to defer AIO allocations to a workqueue,
but the main issue with that is that the worker thread which is then
going to do the actual data transfer would have to "borrow" the
mm_struct from the submitter.  That's the primary reason why something
like that was never implemented in mainline Linux.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html