Hi Chris, The amount of improvement from the packed commands, as from any other eMMC4.5 feature, depends on several parameters: 1. The card support of this feature. If the card supports only the feature interface, then you'll see no improvement when using the feature. 2. The benchmark tool used. Since the packed command preparation is stopped due to a FLUSH request, a benchmark that issues many FLUSH requests can result in a small amount of packing and you will see no improvement. You can use the following patch to get the packed commands statistics: http://marc.info/?l=linux-mmc&m=134374508625826&w=2 With this patch you will be able to see the amount of packing and what caused the packed preparation to stop. We tested the packed commands feature with SanDisk cards and got improvement of 30% when using lmdd and tiotest. We don't use iozone for sequential tests but if you'll send me the exact command that you use we can try it as well. It is true that packed commands can cause degradation of read in read-write collisions. However, it is only nature that when having longer write request a read request has to wait for a longer time and its latency will increase. I believe that it is not our duty to decide if this is a reason to exclude this feature. Everyone should take its own decision if he wants to benefit from the write improvement, while risking the read-write collisions scenarios. eMMC4.5 introduces the HPI and stop transmission to overcome the degradation of read latency due to write (regardless of the packed commands). The packing control is our own enhancement that we believe can also be used to overcome this degradation. It is tunable and requires a specific enabling, so it can be the developer?s decision whether to use it or not. Since it is not a standard feature we can discuss separately if it should be accepted or not and what is the best way to use it. Packed commands is not the only eMMC4.5 feature that can cause degradation in specific scenarios. If we will look at the cache feature, it causes degradation by almost a half in random operations when FLUSH is being used. When using the following iozone command when cache is enabled, you will see degradation in the iozone results: ./data/iozone -i0 -i2 -r4k -s50m -O -o -I -f /data/mmc0/file3 However, cache support was accepted regardless of this degradation and it is the developer?s responsibility to decide if to use this feature or not. To summarize, all eMMC4.5 features that were added are tunable and disabled by default. I believe that when someone would enable a certain feature he will do all the required testing for determining if he can benefit from this feature or not in his own environment. Thanks, Maya On Tue, November 13, 2012 6:54 pm, Chris Ball wrote: > Hi Maya, > > On Sun, Nov 04 2012, merez@xxxxxxxxxxxxxx wrote: >> Packed commands is a mandatory eMMC4.5 feature and is supported by all the card vendors. > > We're still only talking about using packed writes, though, right? > >> It wa proven to be beneficial for eMMC4.5 cards and harmless for non eMMC4.5 cards. > > My understanding is that write packing causes a regression in read performance that can be tuned/fixed by your num_wr_reqs_to_start_packing tunable (and read packing causes a read regression with current eMMC 4.5 cards). Is that wrong? > >> I don't see a point to hold it back while it can be enabled or >> disabled by a flag and most of the code it adds is guarded in specific functions and is not active when packed commands is disabled. > > Earlier in the thread I wrote: > >>> * I still don't have a good set of representative benchmarks showing >>> what kind of performance changes come with this patchset. It seems like we've had a small amount of testing on one controller/eMMC part combo from Seungwon, and an entirely different test from Maya, and the results aren't documented fully anywhere to the level of describing what the hardware was, what the test was, and what the results were before and after the patchset. > > I still feel this way. I'm worried that we might be merging code that works well on your controller/card but causes large regressions for everyone else. I don't want to handle this by making a tunable that everyone has to tune for their system, because I don't think anyone will tune it. I don't think that shipping a capability that will probably lead to performance regressions if you turn it on is a good idea. > > I'm in a better position to help now, though -- I have some motherboards with Marvell SoCs and a socketed eMMC slot, and I have eMMC 4.5 parts from Sandisk and Toshiba. So I can try to help work out how > generalizable your results are across other controllers and cards. > > So far I've only tried the Sandisk part, but it didn't show any write improvement with write packing. I've verified that the switch command to turn on packed_event_en happens and succeeds, and that the caps are set correctly, so I'm not sure what's wrong yet. With iozone I get: > > KB reclen write rewrite > Unpacked writes: 10240 8192 17250 16794 > Packed writes: 10240 8192 16930 17353 > > I'll try the Toshiba part next, and I'll start using lmdd as well as iozone. Any ideas on why I might not be seeing improvements with Sandisk? > > I'm not opposed to merging packed write support in principle, I just want to be convinced that we're not causing regressions for most users who turn it on. (And more than that, I want to see that it leads to improvements that make it worth adding the code complexity for.) > > Thanks, > > - Chris. > -- > Chris Ball <cjb@xxxxxxxxxx> <http://printf.net/> > One Laptop Per Child > -- QUALCOMM ISRAEL, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html