Re: sleeps and waits during io_submit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 1, 2015 at 8:58 AM, Avi Kivity <avi@xxxxxxxxxxxx> wrote:
>
>
> On 12/01/2015 03:11 PM, Brian Foster wrote:
>>
>> On Tue, Dec 01, 2015 at 11:08:47AM +0200, Avi Kivity wrote:
>>>
>>> On 11/30/2015 06:14 PM, Brian Foster wrote:
>>>>
>>>> On Mon, Nov 30, 2015 at 04:29:13PM +0200, Avi Kivity wrote:
>>>>>
>>>>> On 11/30/2015 04:10 PM, Brian Foster wrote:
>>
>> ...
>>>>
>>>> The agsize/agcount mkfs-time heuristics change depending on the type of
>>>> storage. A single AG can be up to 1TB and if the fs is not considered
>>>> "multidisk" (e.g., no stripe unit/width is defined), 4 AGs is the
>>>> default up to 4TB. If a stripe unit is set, the agsize/agcount is
>>>> adjusted depending on the size of the overall volume (see
>>>> xfsprogs-dev/mkfs/xfs_mkfs.c:calc_default_ag_geometry() for details).
>>>
>>> We'll experiment with this.  Surely it depends on more than the amount of
>>> storage?  If you have a high op rate you'll be more likely to excite
>>> contention, no?
>>>
>> Sure. The absolute optimal configuration for your workload probably
>> depends on more than storage size, but mkfs doesn't have that
>> information. In general, it tries to use the most reasonable
>> configuration based on the storage and expected workload. If you want to
>> tweak it beyond that, indeed, the best bet is to experiment with what
>> works.
>
>
> We will do that.
>
>>>>> Are those locks held around I/O, or just CPU operations, or a mix?
>>>>
>>>> I believe it's a mix of modifications and I/O, though it looks like some
>>>> of the I/O cases don't necessarily wait on the lock. E.g., the AIL
>>>> pushing case will trylock and defer to the next list iteration if the
>>>> buffer is busy.
>>>>
>>> Ok.  For us sleeping in io_submit() is death because we have no other
>>> thread
>>> on that core to take its place.
>>>
>> The above is with regard to metadata I/O, whereas io_submit() is
>> obviously for user I/O.
>
>
> Won't io_submit() also trigger metadata I/O?  Or is that all deferred to
> async tasks?  I don't mind them blocking each other as long as they let my
> io_submit alone.
>
>>   io_submit() can probably block in a variety of
>> places afaict... it might have to read in the inode extent map, allocate
>> blocks, take inode/ag locks, reserve log space for transactions, etc.
>
>
> Any chance of changing all that to be asynchronous?  Doesn't sound too hard,
> if somebody else has to do it.
>
>>
>> It sounds to me that first and foremost you want to make sure you don't
>> have however many parallel operations you typically have running
>> contending on the same inodes or AGs. Hint: creating files under
>> separate subdirectories is a quick and easy way to allocate inodes under
>> separate AGs (the agno is encoded into the upper bits of the inode
>> number).
>
>
> Unfortunately our directory layout cannot be changed.  And doesn't this
> require having agcount == O(number of active files)?  That is easily in the
> thousands.

Actually, wouldn't agcount == O(nr_cpus) be good enough?

>
>>   Reducing the frequency of block allocation/frees might also be
>> another help (e.g., preallocate and reuse files,
>
>
> Isn't that discouraged for SSDs?
>
> We can do that for a subset of our files.
>
> We do use XFS_IOC_FSSETXATTR though.
>
>> 'mount -o ikeep,'
>
>
> Interesting.  Our files are large so we could try this.
>
>> etc.). Beyond that, you probably want to make sure the log is large
>> enough to support all concurrent operations. See the xfs_log_grant_*
>> tracepoints for a window into if/how long transaction reservations might
>> be waiting on the log.
>
>
> I see that on an 400G fs, the log is 180MB.  Seems plenty large for write
> operations that are mostly large sequential, though I've no real feel for
> the numbers.  Will keep an eye on this.
>
> Thanks for all the info.
>
>
>> Brian
>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs@xxxxxxxxxxx
>>> http://oss.sgi.com/mailman/listinfo/xfs
>
>

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs



[Index of Archives]     [Linux XFS Devel]     [Linux Filesystem Development]     [Filesystem Testing]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux