Re: sleeps and waits during io_submit

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 2 Dec 2015 08:04:17 +1100

On Tue, Dec 01, 2015 at 05:22:38PM +0200, Avi Kivity wrote:
> On 12/01/2015 04:56 PM, Brian Foster wrote:
> >On Tue, Dec 01, 2015 at 03:58:28PM +0200, Avi Kivity wrote:
> >>>  io_submit() can probably block in a variety of
> >>>places afaict... it might have to read in the inode extent map, allocate
> >>>blocks, take inode/ag locks, reserve log space for transactions, etc.
> >>Any chance of changing all that to be asynchronous?  Doesn't sound too hard,
> >>if somebody else has to do it.
> >>
> >I'm not following... if the fs needs to read in the inode extent map to
> >prepare for an allocation, what else can the thread do but wait? Are you
> >suggesting the request kick off whatever the blocking action happens to
> >be asynchronously and return with an error such that the request can be
> >retried later?
> 
> Not quite, it should be invisible to the caller.

I have a pony I can sell you.

> That is, the code called by io_submit()
> (file_operations::write_iter, it seems to be called today) can kick
> off this operation and have it continue from where it left off.

This is a problem that people have tried to solve in the past (e.g.
syslets, etc) where the thread executes until it has to block, and
then it's handled off to a worker thread/syslet to block and the
main process returns with EIOCBQUEUED.

Basically, you're asking for a real AIO infrastructure to
beintroduced into the kernel, and I think that's beyond what us XFS
guys can do...

> >>>  Reducing the frequency of block allocation/frees might also be
> >>>another help (e.g., preallocate and reuse files,
> >>Isn't that discouraged for SSDs?
> >>
> >Perhaps, if you're referring to the fact that the blocks are never freed
> >and thus never discarded..? Are you running fstrim?
> 
> mount -o discard.  And yes, overwrites are supposedly more expensive
> than trim old data + allocate new data, but maybe if you compare it
> with the work XFS has to do, perhaps the tradeoff is bad.

Oh, you do realise that using "-o discard" causes significant delays
in journal commit processing? i.e. the journal commit completion
blocks until all the discards have been submitted and waited on
*synchronously*. This is a problem with the linux block layer in
that blkdev_issue_discard() is a synchronous operation.....

Hence if you are seeing delays in transactions (e.g. timestamp updates)
it's entirely possible that things will get much better if you
remove the discard mount option. It's much better from a performance
perspective to use the fstrim command every so often - fstrim issues
discard operations in the context of the fstrim process - it does
not interact with the transaction subsystem at all.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs