Re: agcount for 2TB, 4TB and 8TB drives

Avi Kivity <avi@xxxxxxxxxxxx> · Mon, 9 Oct 2017 18:46:41 +0300

On 10/09/2017 02:23 PM, Dave Chinner wrote:
On Mon, Oct 09, 2017 at 11:05:56AM +0300, Avi Kivity wrote:
On 10/07/2017 01:21 AM, Eric Sandeen wrote:
On 10/6/17 5:20 PM, Dave Chinner wrote:
On Fri, Oct 06, 2017 at 11:18:39AM -0500, Eric Sandeen wrote:
On 10/6/17 10:38 AM, Darrick J. Wong wrote:
On Fri, Oct 06, 2017 at 10:46:20AM +0200, Gandalf Corvotempesta wrote:
Semirelated question: for a solid state disk on a machine with high CPU
counts do we prefer agcount == cpucount to take advantage of the
high(er) iops and lack of seek time to increase parallelism?

(Not that I've studied that in depth.)
Interesting question.  :)  Maybe harder to answer for SSD black boxes?
Easy: switch to multidisk mode if /sys/block/<dev>/queue/rotational
is zero after doing all the other checks. Then SSDs will get larger
AG counts automatically.
The "hard part" was knowing just how much parallelism is actually inside
the black box.
It's often > 100.
Sure, that might be the IO concurrency the SSD sees and handles, but
you very rarely require that much allocation parallelism in the
workload. Only a small amount of the IO submission path is actually
allocation work, so a single AG can provide plenty of async IO
parallelism before an AG is the limiting factor.

Sure. Can a single AG issue multiple I/Os, or is it single-threaded?

I understand that XFS_XFLAG_EXTSIZE and XFS_IOC_FSSETXATTR can reduce 
the AG's load. Is there a downside? for example, when I truncate + close 
the file, will the preallocated data still remain allocated? Do I need 
to return it with an fallocate()?

i.e. A single AG can typically support tens of thousands of free
space manipulations per second before the AG locks become the
bottleneck. Hence by the time you get to 16 AGs there's concurrency
available for (runs a concurrent workload and measures) at least
350,000 allocation transactions per second on relatively slow 5 year
old 8-core server CPUs. And that's CPU bound (16 CPUs all at >95%),
so faster, more recent CPUs will run much higher numbers.

IOws, don't confuse allocation concurrency with IO concurrency or
application concurrency. It's not the same thing and it is rarely a
limiting factor for most workloads, even the most IO intensive
ones...

In my load, the allocation load is not very high, but the impact of 
iowait is. So if I can reduce the chance of io_submit() blocking because 
of AG contention, then I'm happy to increase the number of AGs even if 
it hurts other things.

   But "multidisk mode" doesn't go too overboard, so yeah
that's probably fine.
Is there a penalty associated with having too many allocation groups?
Yes. You break up the large contiguous free spaces into many smaller
free spaces and so can induce premature onset of filesystem aging
related performance degradations. And for spinning disks, more than
4-8AGs per spindle causes excessive seeks in mixed workloads and
degrades performance that way....

For an SSD, would an AG per 10GB be reasonable? per 100GB?

Machines with 60-100 logical cores and low-tens of terabytes of SSD are 
becoming common.  How many AGs would work for such a machine? Again the 
allocation load is not very high (allocating a few GB/s with 32MB hints, 
so < 100 allocs/sec), but the penalty for contention is pretty high.

Thanks for the info!

Cheers,

Dave.

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html