Re: sleeps and waits during io_submit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/01/2015 10:45 PM, Dave Chinner wrote:
On Tue, Dec 01, 2015 at 09:01:13AM -0500, Glauber Costa wrote:
On Tue, Dec 1, 2015 at 8:58 AM, Avi Kivity <avi@xxxxxxxxxxxx> wrote:
On 12/01/2015 03:11 PM, Brian Foster wrote:
It sounds to me that first and foremost you want to make sure you don't
have however many parallel operations you typically have running
contending on the same inodes or AGs. Hint: creating files under
separate subdirectories is a quick and easy way to allocate inodes under
separate AGs (the agno is encoded into the upper bits of the inode
number).

Unfortunately our directory layout cannot be changed.  And doesn't this
require having agcount == O(number of active files)?  That is easily in the
thousands.
Actually, wouldn't agcount == O(nr_cpus) be good enough?
Not quite. What you need is agcount ~= O(nr_active_allocations).

Yes, this is what I mean by "active files".


The difference is an allocation can block waiting on IO, and the
CPU can then go off and run another process, which then tries to do
an allocation. So you might only have 4 CPUs, but a workload that
can have a hundred active allocations at once (not uncommon in
file server workloads).

But for us, probably not much more. We try to restrict active I/Os to the effective disk queue depth (more than that and they just turn sour waiting in the disk queue).


On worklaods that are roughly 1 process per CPU, it's typical that
agcount = 2 * N cpus gives pretty good results on large filesystems.

This is probably using sync calls. Using async calls you can have many more I/Os in progress (but still limited by effective disk queue depth).

If you've got 400GB filesystems or you are using spinning disks,
then you probably don't want to go above 16 AGs, because then you
have problems with maintaining continugous free space and you'll
seek the spinning disks to death....

We're concentrating on SSDs for now.


'mount -o ikeep,'

Interesting.  Our files are large so we could try this.
Keep in mind that ikeep means that inode allocation permanently
fragments free space, which can affect how large files are allocated
once you truncate/rm the original files.



We can try to prime this by allocating a lot of inodes up front, then removing them, so that this doesn't happen.

Hurray ext2.

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux