Re: Options for improving f_fs.c performance?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 04/21/2017 09:14 AM, Felipe Balbi wrote:

Hi Jerry,

Jerry Zhang <zhangjerry@xxxxxxxxxx> writes:
(reposting this question in its own subject, instead of buried in a patchset)

Android has migrated to using f_fs for the mtp function, but in the
process we ran into some performance hiccups in comparison to f_mtp
during bulk transfer of large files.

Allocating memory at boot allowed f_mtp to use large buffers,
around 250kB. Compared to using a 16 or 32 kB buffer, the larger buffer

you don't really need to allocate it at boot time :-) but fair
enough. Also, several (say 200) small buffers (PAGE_SIZE -> 4096 bytes)
would probably be better, but read on.

is faster by almost 50%. f_fs can't use io sizes that large due
to memory fragmentation, since there is a kmalloc on each call.

right

One option here would be to add O_DIRECT as a flag to open
(similar to what is discussed here:
http://www.makelinux.net/ldd3/chp-15-sect-3).

We *could* use O_DIRECT, but I'm not sure that's what's causing
issues. I've been considering how to improve f_fs.c myself for the past
few months, but no time to do it yet.

Then on each io, we could get_user_pages, vmap them together,

get_user_pages_fast() + sg_alloc_table_from_pages() would've been the
best. That way, there would be no copy_to/from_user needed, but that's
something that should happen later. There are lower hanging fruits in
f_fs.c IMO.

and pass that address into ep_queue (for sync io only, non direct io
does not change). This would would allow larger io sizes
to be used and eliminates a user-kernel copy. Having the flag
allows users to decide which endpoints they choose to use this way.

Correct :-)

Does this sound like something that people want or find acceptable?

yes, it _is_ definitely acceptable. But f_fs.c needs a cleanup
first. Also, IMHO a better flag to implement would be O_NONBLOCK. Here's
how f_fs.c works today:

write(ep2, buf, length);
 ffs_epfile_write_iter()
  ffs_epfile_io()
   usb_ep_queue()
   wait_for_completion_interruptible()

That wait_for_completion_interruptible() is what's killing
performance. Each and every read/write waits for the USB side to
complete. It would've been much better to have something like:

if (flags & O_NONBLOCK)
	wait_for_completion_interruptible()

This would help the write() side of things by a long margin. For reads,
what we could do is have a kernel ring buffer with pre-allocated and
pre-queued usb_requests pointing to different pages in this ring
buffer. When a read() comes, instead of queueing the request right
there, we check if there's data in the internal ring buffer, if there
is, we just copy_to_user(), otherwise we either return 0 or return
-EAGAIN (depending on O_NONBLOCK).

After that, it becomes easy to implement poll() for endpoint files.


Please correct me if I'm wrong but isn't this functionality already implemented a long time ago using aio and eventfd?

Take a look at example[1] in kernel source.

Footnotes:
1 - https://github.com/torvalds/linux/blob/master/tools/usb/ffs-aio-example/multibuff/device_app/aio_multibuff.c


Best regards,
--
Krzysztof Opasiak
Samsung R&D Institute Poland
Samsung Electronics
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux