Re: agcount for 2TB, 4TB and 8TB drives

Avi Kivity <avi@xxxxxxxxxxxx> · Sun, 15 Oct 2017 12:36:03 +0300

On 10/15/2017 01:42 AM, Dave Chinner wrote:
On Fri, Oct 13, 2017 at 11:13:24AM +0300, Avi Kivity wrote:
On 10/11/2017 01:55 AM, Dave Chinner wrote:
On Tue, Oct 10, 2017 at 12:07:42PM +0300, Avi Kivity wrote:
On 10/10/2017 01:03 AM, Dave Chinner wrote:
On 10/09/2017 02:23 PM, Dave Chinner wrote:
On Mon, Oct 09, 2017 at 11:05:56AM +0300, Avi Kivity wrote:
Sure, that might be the IO concurrency the SSD sees and handles, but
you very rarely require that much allocation parallelism in the
workload. Only a small amount of the IO submission path is actually
allocation work, so a single AG can provide plenty of async IO
parallelism before an AG is the limiting factor.
Sure. Can a single AG issue multiple I/Os, or is it single-threaded?
AGs don't issue IO. Applications issue IO, the filesystem allocates
space from AGs according to the write IO that passes through it.
What I meant was I/O in order to satisfy an allocation (read from
the free extent btree or whatever), not the application's I/O.
Once you're in the per-AG allocator context, it is single threaded
until the allocation is complete. We do things like btree block
readahead to minimise IO wait times, but we can't completely hide
things like metadata read Io wait time when it is required to make
progress.
I see, thanks. Will RWF_NOWAIT detect the need to do I/O for the
free space btree, or just contention? (I expect the latter from the
patches I've seen, but perhaps I missed something).
No, it checks at a high level whether allocation is needed (i.e. IO
into a hole) and if allocation is needed, it punts the IO
immediately to the background thread and returns to userspace. i.e.
it never gets near the allocator to begin with....

Interesting, that's both good and bad. Good, because we avoided a 
potential stall. Bad, because if the stall would not actually have 
happened (lock not contended, btree nodes cached) then we got punted to 
the helper thread which is a more expensive path.

In fact we don't even need to try the write, we know that every 
32MB/128k = 256 writes we will hit an allocation. Perhaps we can 
fallocate() the next 32MB chunk while writing to the previous one. If 
fallocate() is fast enough, writes will both never block/fail. If it's 
not, then we'll block/fail, but the likelihood is reduced. We can even 
increase the chunk size if we see we're getting blocked.

Even better would be if XFS would detect the sequential write and start 
allocating ahead of it.

Like I said before, RWF_NOWAIT prevents entire classes of
AIO submission blocking issues from occuring. Use it and almost all
filesystem blocking concerns go away....

I will indeed.

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html