On 09/19/2017 05:58 PM, Christoph Hellwig wrote:
On Tue, Sep 19, 2017 at 08:27:05AM -0400, Brian Foster wrote:
Please advise, is this a known bug? When can it happen? Is there a way
to work it around to avoid blocking?
I'm not sure how either could be considered a bug based on the stack
trace information alone. Allocations may require reading metadata and
reads are synchronous. This all seems like pretty basic filesystem
behavior.
I suppose performance may be a separate question. For the latter issue,
I'd be curious whether leaving more free space available in the
filesystem would help avoid running into busy extents. Perhaps having
more memory and thus a larger buffer cache for btree blocks could help
mitigate the former issue..? The deterministic workaround for both is to
preallocate the associated file. If the file would be too large, another
option may be to set an extent size hint to allocate the file in larger
chunks and amortize the cost of the allocations over multiple writes.
Note that Linux 4.13 and later support a RWF_NOWAIT flag, that will
return -EAGAIN from io_submit for these conditions so they can be
handled by a thread pool.
Note that until a few years ago we performed all allocations from
a workqueue, this was changed by:
commit cf11da9c5d374962913ca5ba0ce0886b58286224
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Jul 15 07:08:24 2014 +1000
xfs: refine the allocation stack switch
to only defer btree splits to a workqueue. With that previous scheme
there might have been an option to defer AIO allocations to a workqueue,
but the main issue with that is that the worker thread which is then
going to do the actual data transfer would have to "borrow" the
mm_struct from the submitter. That's the primary reason why something
like that was never implemented in mainline Linux.
For DIO, does it really need the mm_struct? It can just pin the pages
and pass them to the workqueue function.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html