On Tue, Dec 01, 2015 at 05:22:38PM +0200, Avi Kivity wrote: > On 12/01/2015 04:56 PM, Brian Foster wrote: > >On Tue, Dec 01, 2015 at 03:58:28PM +0200, Avi Kivity wrote: > >>> io_submit() can probably block in a variety of > >>>places afaict... it might have to read in the inode extent map, allocate > >>>blocks, take inode/ag locks, reserve log space for transactions, etc. > >>Any chance of changing all that to be asynchronous? Doesn't sound too hard, > >>if somebody else has to do it. > >> > >I'm not following... if the fs needs to read in the inode extent map to > >prepare for an allocation, what else can the thread do but wait? Are you > >suggesting the request kick off whatever the blocking action happens to > >be asynchronously and return with an error such that the request can be > >retried later? > > Not quite, it should be invisible to the caller. I have a pony I can sell you. > That is, the code called by io_submit() > (file_operations::write_iter, it seems to be called today) can kick > off this operation and have it continue from where it left off. This is a problem that people have tried to solve in the past (e.g. syslets, etc) where the thread executes until it has to block, and then it's handled off to a worker thread/syslet to block and the main process returns with EIOCBQUEUED. Basically, you're asking for a real AIO infrastructure to beintroduced into the kernel, and I think that's beyond what us XFS guys can do... > >>> Reducing the frequency of block allocation/frees might also be > >>>another help (e.g., preallocate and reuse files, > >>Isn't that discouraged for SSDs? > >> > >Perhaps, if you're referring to the fact that the blocks are never freed > >and thus never discarded..? Are you running fstrim? > > mount -o discard. And yes, overwrites are supposedly more expensive > than trim old data + allocate new data, but maybe if you compare it > with the work XFS has to do, perhaps the tradeoff is bad. Oh, you do realise that using "-o discard" causes significant delays in journal commit processing? i.e. the journal commit completion blocks until all the discards have been submitted and waited on *synchronously*. This is a problem with the linux block layer in that blkdev_issue_discard() is a synchronous operation..... Hence if you are seeing delays in transactions (e.g. timestamp updates) it's entirely possible that things will get much better if you remove the discard mount option. It's much better from a performance perspective to use the fstrim command every so often - fstrim issues discard operations in the context of the fstrim process - it does not interact with the transaction subsystem at all. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs