Re: slow eMMC write speed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/29/2011 01:17 AM, Per Förlin wrote:
On 09/29/2011 09:24 AM, Linus Walleij wrote:
On Wed, Sep 28, 2011 at 11:34 PM, J Freyensee
<james_p_freyensee@xxxxxxxxxxxxxxx>  wrote:

Now in the 3.0 kernel I know mmc_wait_for_req() has changed and the goal was
to try and make that function a bit more non-blocking,

What has been done by Per Förlin is to add pre_req/post_req hooks
for the datapath. This will improve data transfers in general if and
only if the driver can do some meaningful work in these hooks, so
your driver needs to be patched to use these.

Per patched a few select drivers to prepare the DMA buffers
at this time. In our case (mmci.c) dma_map_sg() can be done in
parallel with an ongoing transfer.

In our case (ux500, mmci, dma40) we don't have bounce buffers
so the only thing that will happen in parallel with ongoing transfers
is L2 and L1 cache flush. *still* we see a noticeable improvement in
throughput, most in L2, but even on the U300 which only does L1
cache I see some small improvements.

I *guess* if you're using bounce buffers, the gain will be even
more pronounced.

(Per, correct me if I'm wrong on any of this...)

Summary:
* The mmc block driver runs mmc_blk_rw_rq_prep(), mmc_queue_bounce_post() and __blk_end_request() in parallel with an ongoing mmc transfer.
* The driver may use the hooks to schedule low level work such as preparing dma and caches in parallel with ongoing mmc transfer.
* The big benefit of this is when using DMA and running the CPU at a lower speed. Here's an example of that: https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req#Block_device_tests_with_governor


with it too much because my current focus is on existing products and no
handheld product uses a 3.0 kernel yet (that I am aware of at least).
  However, I still see the fundamental problem is that the MMC stack, which
was probably written with the intended purpose to be independent of the OS
block subsystem (struct request and other stuff), really isn't independent
of the OS block subsystem and will cause holdups between one another,
thereby dragging down read/write performance of the MMC.

There are two issues IIRC:

- The block layer does not provide enough buffers at a time for
   the out-of-order buffer pre/post preps to make effect, I think this
   was during writes only (Per, can you elaborate?)

As I've been playing around with with buffering/caching, it seems to me an opportunity to simplify things in the MMC space is to eliminate the need for a mmc_blk_request struct or mmc_request struct. With looking through the mmc_blk_issue_rw_rq(), there is a lot of work to initialize struct mmc_blk_request brq, only to pass a struct mmc_queue variable the actual mmc_wait_for_req() instead. In fact, some of the parameters in the struct mmc_blk_request member brq.mrq (of type mmc_request) wind up just pointing to members in struct mmc_blk_request brq. Granted, I totally don't understand everything going on here and I haven't studied this code nearly as long as others, but when I see something like this, the first thing that comes up in my mind is 'elimination/simplification opportunity'.


Writes are buffered and pushed down many in one go. This mean they can easily be scheduled to be prepared while another is being transferred.
Large continues reads are pushed down to MMC synchronously one request per read ahead size. The next large continues read will wait in the block layer and not start until the current one is complete. Read more about the details here: https://wiki.linaro.org/WorkingGroups/Kernel/Specs/StoragePerfMMC-async-req#Analysis_of_how_block_layer_adds_read_request_to_the_mmc_block_queue

- Anything related to card geometries and special sectors and
   sector sizes etc, i.e. the stuff that Arnd has analyzed in detail,
   also Tixy looked into that for some cards IIRC.

Each needs to be adressed and is currently "to be done".

The other fundamental problem is the writes themselves.  Way, WAY more
writes occur on a handheld system in an end-user's hands than reads.
Fundamental computer principle states "you make the common case fast". So
focus should be on how to complete a write operation the fastest way
possible.

First case above I think, yep it needs looking into...

The mmc non-blocking patches only tries to move any overhead in parallel with transfer. The actual transfer speed of MMC reads and writes are unaffected. I am hoping that the eMMC v4.5 packed commands support (the ability to group a series of commands in a single data transaction) will help to boost the performance in the future.

Regards,
Per


--
J (James/Jay) Freyensee
Storage Technology Group
Intel Corporation
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux