On Sep 28, 2011, at 4:16 PM, Praveen G K wrote: > On Wed, Sep 28, 2011 at 3:59 PM, J Freyensee > <james_p_freyensee@xxxxxxxxxxxxxxx> wrote: >> On 09/28/2011 03:24 PM, Praveen G K wrote: >>> >>> On Wed, Sep 28, 2011 at 2:34 PM, J Freyensee >>> <james_p_freyensee@xxxxxxxxxxxxxxx> wrote: >>>> >>>> On 09/28/2011 02:03 PM, Praveen G K wrote: >>>>> >>>>> On Wed, Sep 28, 2011 at 2:01 PM, J Freyensee >>>>> <james_p_freyensee@xxxxxxxxxxxxxxx> wrote: >>>>>> >>>>>> On 09/28/2011 01:34 PM, Praveen G K wrote: >>>>>>> >>>>>>> On Wed, Sep 28, 2011 at 12:59 PM, J Freyensee >>>>>>> <james_p_freyensee@xxxxxxxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> On 09/28/2011 12:06 PM, Praveen G K wrote: >>>>>>>>> >>>>>>>>> On Tue, Sep 27, 2011 at 10:42 PM, Linus Walleij >>>>>>>>> <linus.walleij@xxxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> On Fri, Sep 23, 2011 at 7:05 AM, Praveen G K<praveen.gk@xxxxxxxxx> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> I am working on the block driver module of the eMMC driver (SDIO >>>>>>>>>>> 3.0 >>>>>>>>>>> controller). I am seeing very low write speed for eMMC transfers. >>>>>>>>>>> On >>>>>>>>>>> further debugging, I observed that every 63rd and 64th transfer >>>>>>>>>>> takes >>>>>>>>>>> a long time. >>>>>>>>>> >>>>>>>>>> Are you not just seeing the card-internal garbage collection? >>>>>>>>>> http://lwn.net/Articles/428584/ >>>>>>>>> >>>>>>>>> Does this mean, theoretically, I should be able to achieve larger >>>>>>>>> speeds if I am not using linux? >>>>>>>> >>>>>>>> In theory in a fairy-tale world, maybe, in reality, not really. In >>>>>>>> R/W >>>>>>>> performance measurements we have done, eMMC performance in products >>>>>>>> users >>>>>>>> would buy falls well, well short of any theoretical numbers. We >>>>>>>> believe >>>>>>>> in >>>>>>>> theory, the eMMC interface should be able to support up to 100MB/s, >>>>>>>> but >>>>>>>> in >>>>>>>> reality on real customer platforms write bandwidths (for example) >>>>>>>> barely >>>>>>>> approach 20MB/s, regardless if it's a Microsoft Windows environment >>>>>>>> or >>>>>>>> Android (Linux OS environment we care about). So maybe it is >>>>>>>> software >>>>>>>> implementation issues of multiple OSs preventing higher eMMC >>>>>>>> performance >>>>>>>> numbers (hence the reason why I sometimes ask basic coding questions >>>>>>>> of >>>>>>>> the >>>>>>>> MMC subsystem- the code isn't the easiest to follow); however, one >>>>>>>> looks >>>>>>>> no >>>>>>>> further than what Apple has done with the iPad2 to see that eMMC >>>>>>>> probably >>>>>>>> just is not a good solution to use in the first place. We have >>>>>>>> measured >>>>>>>> Apple's iPad2 write performance on *WHAT A USER WOULD SEE* being >>>>>>>> double >>>>>>>> what >>>>>>>> we see with products using eMMC solutions. The big difference? Apple >>>>>>>> doesn't use eMMC at all for the iPad2. >>>>>>> >>>>>>> Thanks for all the clarification. The problem is I am seeing write >>>>>>> speeds of about 5MBps on a Sandisk eMMC product and I can clearly see >>>>>>> the time lost when measured between sending a command and receiving a >>>>>>> data irq. I am not sure what kind of an issue this is. 5MBps feels >>>>>>> really slow but can the internal housekeeping of the card take so much >>>>>>> time? >>>>>> >>>>>> Have you tried to trace through all structs used for an MMC >>>>>> operation??! >>>>>> Good gravy, there are request, mmc_queue, mmc_card, mmc_host, >>>>>> mmc_blk_request, mmc_request, multiple mmc_command and multiple >>>>>> scatterlists >>>>>> that these other structs use...I've been playing around on trying to >>>>>> cache >>>>>> some things to try and improve performance and it blows me away how >>>>>> many >>>>>> variables and pointers I have to keep track of for one operation going >>>>>> to >>>>>> an >>>>>> LBA on an MMC. I keep wondering if more of the 'struct request' could >>>>>> have >>>>>> been used, and 1/3 of these structures could be eliminated. And >>>>>> another >>>>>> thing I wonder too is how much of this infrastructure is really needed, >>>>>> that >>>>>> when I do ask "what is this for?" question on the list and no one >>>>>> responds, >>>>>> if anyone else understands if it's needed either. >>>>> >>>>> I know I am not using the scatterlists, since the scatterlists are >>>>> aggregated into a 64k bounce buffer. Regarding the different structs, >>>>> I am just taking them on face value assuming everything works "well". >>>>> But, my concern is why does it take such a long time (250 ms) to >>>>> return a transfer complete interrupt on occasional cases. During this >>>>> time, the kernel is just waiting for the txfer_complete interrupt. >>>>> That's it. >>>> >>>> I think one fundamental problem with execution of the MMC commands is >>>> even >>>> though the MMC has it's own structures and own DMA/Host-controller, the >>>> OS's >>>> block subsystem and MMC subsystem do not really run independent of either >>>> other and each are still tied to each others' fate, holding up >>>> performance >>>> of the kernel in general. >>>> >>>> In particular, I have found that in the 2.6.36+ kernels that the sooner >>>> you >>>> can retire the 'struct request *req' (ie using __blk_end_request()) with >>>> respect to when the mmc_wait_for_req() call is made, the higher >>>> performance >>>> you are going to get out of the OS in terms of reads/writes using an MMC. >>>> mmc_wait_for_req() is a blocking call, so that OS 'struct request req' >>>> will >>>> just sit around and do nothing until mmc_wait_for_req() is done. I have >>>> been able to do some caching of some commands, calling >>>> __blk_end_request() >>>> before mmc_wait_for_req(), and getting much higher performance in a few >>>> experiments (but the work certainly is not ready for prime-time). >>>> >>>> Now in the 3.0 kernel I know mmc_wait_for_req() has changed and the goal >>>> was >>>> to try and make that function a bit more non-blocking, but I have not >>>> played >>>> with it too much because my current focus is on existing products and no >>>> handheld product uses a 3.0 kernel yet (that I am aware of at least). >>>> However, I still see the fundamental problem is that the MMC stack, >>>> which >>>> was probably written with the intended purpose to be independent of the >>>> OS >>>> block subsystem (struct request and other stuff), really isn't >>>> independent >>>> of the OS block subsystem and will cause holdups between one another, >>>> thereby dragging down read/write performance of the MMC. >>>> >>>> The other fundamental problem is the writes themselves. Way, WAY more >>>> writes occur on a handheld system in an end-user's hands than reads. >>>> Fundamental computer principle states "you make the common case fast". So >>>> focus should be on how to complete a write operation the fastest way >>>> possible. >>> >>> Thanks for the detailed explanation. >>> Please let me know if there is a fundamental issue with the way I am >>> inserting the high res timers. In the block.c file, I am timing the >>> transfers as follows >>> >>> 1. Start timer >>> mmc_queue_bounce_pre() >>> mmc_wait_for_req() >>> mmc_queue_bounce_post() >>> End timer >>> >>> So, I don't really have to worry about the blk_end_request right. >>> Like you said, wait_for_req is a blocking wait. I don't see what is >>> wrong with that being a blocking wait, because until you get the data >>> xfer complete irq, there is no point in going ahead. The >>> blk_end_request comes later in the picture only when all the data is >>> transferred to the card. >> >> Yes, that is correct. >> >> But if you can do some cache trickery or queue tricks, you can delay when >> you have to actually write to the MMC, so then __blk_end_request() and >> retiring the 'struct request *req' becomes the time-sync. That is a reason >> why mmc_wait_for_req() got some work done on it in the 3.0 kernel. The OS >> does not have to wait for the host controller to complete the operation (ie, >> block on mmc_wait_for_data()) if there is no immediate dependency on that >> data- that is kind-of dumb. This is why this can be a problem and a time >> sync. It's no different than out-of-order execution in CPUs. > > Thanks I'll look into the 3.0 code to see what the changes are and > whether it can improve the speed. Thanks for your suggestions. > >>> My line of thought is that the card is taking a lot of time for its >>> internal housekeeping. >> >> Each 'write' to a solid-state/nand/flash requires an erase operation first, >> so yes, there is more housekeeping going on than a simple 'write'. >> >> But, I want to be absolutely sure of my >>> >>> analysis before I can pass that judgement. >>> >>> I have also used another Toshiba card that gives me about 12 MBps >>> write speed for the same code, but I am worried is whether I am >>> masking some issue by blaming it on the card. What if the Toshiba >>> card can give a throughput more than 12MBps ideally? >> >> No clue...you'd have to talk to Toshiba. >> >>> >>> Or could there be an issue that the irq handler(sdhci_irq) is called >>> with some kind of a delay and is there a possibility that we are not >>> capturing the transfer complete interrupt immediately? >>> >>>>> >>>>>> I mean, for the usual transfers it takes about 3ms to transfer >>>>>>> >>>>>>> 64kB of data, but for the 63rd and 64th transfers, it takes 250 ms. >>>>>>> The thing is this is not on a file system. I am measuring the speed >>>>>>> using basic "dd" command to write directly to the block device. >>>>>>> >>>>>>>> So, is this a software issue? or if >>>>>>>>> >>>>>>>>> there is a way to increase the size of bounce buffers to 4MB? >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>> Yours, >>>>>>>>>> Linus Walleij >>>>>>>>>> >>>>>>>>> -- >>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" >>>>>>>>> in >>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> J (James/Jay) Freyensee >>>>>>>> Storage Technology Group >>>>>>>> Intel Corporation >>>>>>>> >>>>>>> -- >>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" >>>>>>> in >>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>> >>>>>> >>>>>> -- >>>>>> J (James/Jay) Freyensee >>>>>> Storage Technology Group >>>>>> Intel Corporation >>>>>> >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> -- >>>> J (James/Jay) Freyensee >>>> Storage Technology Group >>>> Intel Corporation >>>> >> some questions: does using a bounce buffer make things faster ? I think you are using sdma. I am wondering if there is a way to increase the the xfer size. Is there some magic number inside the mmc code that can be increased ? Philip >> >> -- >> J (James/Jay) Freyensee >> Storage Technology Group >> Intel Corporation >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html