Re: [fuse-devel] [PATCH 0/6] fuse: process direct IO asynchronously

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Brian,

Existing fuse implementation always processes direct IO synchronously: it submits next request to userspace fuse only when previous is completed. This is suboptimal because: 1) libaio DIO works in blocking way; 2) userspace fuse can't achieve parallelism processing several requests simultaneously (e.g.
in case of distributed network storage); 3) userspace fuse can't merge
requests before passing it to actual storage.

The idea of the patch-set is to submit fuse requests in non-blocking way
(where it's possible) and either return -EIOCBQUEUED or wait for their
completion synchronously. The patch-set to be applied on top of for-next of
Miklos' git repo.

To estimate performance improvement I used slightly modified fusexmp over tmpfs (clearing O_DIRECT bit from fi->flags in xmp_open). For synchronous
operations I used 'dd' like this:

dd of=/dev/null if=/fuse/mnt/file bs=2M count=256 iflag=direct
dd if=/dev/zero of=/fuse/mnt/file bs=2M count=256 oflag=direct conv=notrunc

For AIO I used 'aio-stress' like this:

aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 1 /fuse/mnt/file
aio-stress -s 512 -a 4 -b 1 -c 1 -O -o 0 /fuse/mnt/file

The throughput on some commodity (rather feeble) server was (in MB/sec):

              original / patched

dd reads:    ~322     / ~382
dd writes:   ~277     / ~288

aio reads:   ~380     / ~459
aio writes:  ~319     / ~353


Thanks for posting this, first of all. I was reading through the code
and found some of the logic a bit hard to follow, particularly due to
controlling the behavior of underlying read/write functions based on the
'async' kiocb. For example: conditionally updating inode size in a
couple different places; submitting sync requests async and waiting on
them (causing the need to duplicate the update size call); and the
existence of a kiocb called 'async' (when a kiocb can be either sync or
async).

Anyways, I couldn't quite put my finger on how to potentially clean that
up without messing around with the code, so I've inlined the diff that I
came up with that (IMO) cleans things up a bit. This should apply on top
of your set and makes the following tweaks:

- Always pass a kiocb down through __fuse_direct_write()/read() (instead
of struct file).
- Trigger the "async" behavior in fuse_direct_io() and
fuse_send_read/write() based on the sync/async nature of the kiocb. This
means that sync requests and async requests are sent as such based on
the nature of the kiocb (as opposed to whether a kiocb exists).
- Update the various other callers of __fuse_direct_write()/read and
fuse_direct_io() to create and pass a sync kiocb where necessary.
- Use the same approach in fuse_direct_IO() to turn an extending, async
write into a sync write.

The tradeoff with this approach is that we slightly uglify the callers
that have to now create a sync kiocb. That said, I think some of those
codepaths (i.e., fuse_file_aio_write()->fuse_perform_write()->...) could
now just pass the kiocb from the vfs straight down. Thoughts?


Thanks a lot for review and further efforts. Your clean-up patch looks fine, but it constrains the scope of patch-set too much: ordinary synchronous dio-s (generated by read(2) and write(2)) cease to be optimized. I think such a loss doesn't justify the clean-up. To be sure that we are on the same page, I'll try to explain this optimization in more details:

Imagine direct read or write of 1MB size. Having FUSE_MAX_PAGES_PER_REQ==32, this can be fulfilled by 8 fuse requests. Generic linux-kernel (e.g. generic_file_aio_read) passes synchronous iocb to fuse_direct_IO(). From fuse perspective, it only means that the caller expects to get control back when that 1MB request is completed. It's up to fuse how to process that 1MB request.

Existing fuse implementation (w/o my patch-set) processes it like this:
- allocate, fill and send 1st fuse request, wait for its completion (ACK-ing by userspace); - allocate, fill and send 2nd fuse request, wait for its completion (ACK-ing by userspace);
 - ...
- allocate, fill and send 8th fuse request, wait for its completion (ACK-ing by userspace);

With my patch-set applied, it becomes like this:
 - allocate, fill and send 1st fuse request;
 - allocate, fill and send 2nd fuse request;
 - ...
 - allocate, fill and send 8th fuse request;
 - wait till all of them are completed (ACK-ed by userspace)

This is significant optimization for non-trivial fuse userspace implementations because: 1) fuse requests might be processed concurrently; 2) fuse requests might be merged before processing.

Now, let's proceed to technical details... Your idea to use iocb everywhere instead of file and rely on its type (sync vs. async) is cool. I like it! Unfortunately, to make it usable in both libaio and sync dio cases we'll need to put our hands on generic linux-kernel code as well. Let me explain why:

Currently, iocb has only one binary attribute - it's either sync or async. aio_complete() derives its ancestry (libaio vs. read(2)/write(2)) from its sync/async type:

    if (is_sync_kiocb(iocb)) {
        BUG_ON(iocb->ki_users != 1);
        iocb->ki_user_data = res;
        iocb->ki_users = 0;
        wake_up_process(iocb->ki_obj.tsk);
        return 1;
    }

In the other words, if it's sync, it's enough to wake up someone in kernel (who called wait_on_sync_kiocb()) and return. Otherwise, aio_complete() performs some steps to wake up user (who called io_getevents()). What we need for fuse is another binary attribute: iocb was generated by libaio vs. read(2)/write(2). Then we could use aio_complete() and wait_on_sync_kiocb() for any iocb which was not generated by libaio. E.g. to process sync dio in async way, we'd allocate iocb in fuse_direct_IO on stack (as you suggested), but initialize it as async iocb and pass it to __fuse_direct_read/write, then call wait_on_sync_kiocb(), not to return -EIOCBQUEUED.

The problem is that FUSE tends to be the only user of this 'feature'. I'm considering one-line change of aio_complete():

-    if (is_sync_kiocb(iocb)) {
+    if (!iocb->ki_ctx) {

Though, not sure whether Miklos, Al Viro, Benjamin and others will accept it... Let's try if you're OK about this approach. At least this will cover all fuse dio use-cases (including libaio extending file) and makes fuse code significantly cleaner.

Thanks,
Maxim
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux