On Mon, Jul 23, 2012 at 5:13 PM, <merez@xxxxxxxxxxxxxx> wrote: > On Wed, July 18, 2012 12:26 am, Chris Ball wrote: >> Hi, [removing Jens and the documentation list, since now we're >> talking about the MMC side only] >> >> On Wed, Jul 18 2012, merez@xxxxxxxxxxxxxx wrote: >>> Is there anything else that holds this patch from being pushed to > mmc-next? >> >> Yes, I'm still uncomfortable with the write packing patchsets for a > couple of reasons, and I suspect that the sum of those reasons means that > we should probably plan on holding off merging it until after 3.6. >> >> Here are the open issues; please correct any misunderstandings: >> >> With Seungwon's patchset ("Support packed write command"): >> >> * I still don't have a good set of representative benchmarks showing >> what kind of performance changes come with this patchset. It seems > like we've had a small amount of testing on one controller/eMMC part combo > from Seungwon, and an entirely different test from Maya, and the results > aren't documented fully anywhere to the level of describing what the > hardware was, what the test was, and what the results were before and > after the patchset. > > Currently, there is only one card vendor that supports packed commands. > Following are our sequential write (LMDD) test results on 2 of our targets > (in MB/s): > No packing packing > Target 1 (SDR 50MHz) 15 25 > Target 2 (DDR 50MHz) 20 30 > >> >> With the reads-during-writes regression: >> >> * Venkat still has open questions about the nature of the read >> regression, and thinks we should understand it with blktrace before > trying to fix it. Maya has a theory about writes overwhelming reads, but > Venkat doesn't understand why this would explain the observed > bandwidth drop. > > The degradation of read due to writes is not a new behavior and exists > also without the write packing feature (which only increases the > degradation). Our investigation of this phenomenon led us to the > Conclusion that a new scheduling policy should be used for mobile devices, > but this is not related to the current discussion of the write packing > feature. > > The write packing feature increases the degradation of read due to write > since it allows the MMC to fetch many write requests in a row, instead of > fetching only one at a time. Therefore some of the read requests will > have to wait for the completion of more write requests before they can be > issued. I am a bit puzzled by this claim. One thing I checked carefully when reviewing write packing patches from SJeon was that the code didn't plough through a mixed list of reads and writes and selected only writes. This section of the code in "mmc_blk_prep_packed_list()", from v8 patchset.. <Quote> + if (rq_data_dir(cur) != rq_data_dir(next)) { + put_back = 1; + break; + } </Quote> means that once a read is encountered in the middle of write packing, the packing is stopped at that point and it is executed. Then the next blk_fetch_request should get the next read and continue as before. IOW, the ordering of reads and writes is _not_ altered when using packed commands. For example if there were 5 write requests, followed by 1 read, followed by 5 more write requests in the request_queue, the first 5 writes will be executed as one "packed command", then the read will be executed, and then the remaining 5 writes will be executed as one "packed command". So the read does not have to wait any more than it waited before (packing feature) And I requested blktrace to confirm that this is indeed the behaviour. Your rest of the arguments anyway depend on this assertion, so can you please clarify this. > > To overcome this behavior, the solution would be to stop the write packing > when a read request is fetched, and this is the algorithm suggested by the > write packing control. > > Let's also keep in mind that lmdd benchmarking doesn't fully reflect the > real life in which there are not many scenarios that cause massive read > and write operations. In our user-common-scenarios tests we saw that in > many cases the write packing decreases the read latency. It can happen in > cases where the same amount of write requests is fetched with and without > packing. In such a case the write packing decreases the transfer time of > the write requests and causes the read request to wait for a shorter time. > >> >> With Maya's patchset ("write packing control"): >> >> * Venkat thinks that HPI should be used, and the number-of-requests >> metric is too coarse, and it doesn't let you disable packing at the > right time, and you're essentially implementing a new I/O scheduler inside > the MMC subsystem without understanding the root cause for why that's > necessary. > > According to our measurements the stop transmission (CMD12) + HPI is a > heavy operation that may take up to several milliseconds. Therefore, a > massive usage of HPI can cause a degradation of performance. > In addition, it doesn’t provide a complete solution for read during write > since it doesn’t solve the problem of “what to do with the interrupted > write request remainder?”. That is, a common interrupting read request > will usually be followed by another one. If we just continue to write the > interrupted write request remainder we will probably get another HPI due > to the second read request, so eventually we may end up with lots of HPIs > and write retries. A complete solution will be: stop the current write, > change packing mode to non-packing, serve the read request, push back the > write remainders to the block I/O scheduler and let him schedule them > again probably after the read burst ends (this requires block layer > support of course). > > Regarding the packing control, there seem to be a confusion since the > number-of-requests is the trigger for *enabling* the packing (after it was > disabled), while a single read request disable packing. Therefore, the > packing is stopped at the right time. > > The packing control doesn't add any scheduling policy to the MMC layer. > The write packing feature is the one changing the scheduling policy by > fetching many write requests in a row without a delay that allows read > requests to come in the middle. > By disabling the write packing, the write packing control returns the old > scheduling policy. It causes the MMC to fetch the requests one by one, > thus read requests are served as before. > > It is correct that the trigger for enabling the write packing control > should be adjusted per platform and doesn't give a complete solution. As I > mentioned above, the complete solution will include the usage of write > packing control, a re-insert of the write packed to the scheduler when a > read request is fetched and usage of HPI to stop the packing that is > already transferred. > > To summarize - > We recommend including the write packing in 3.6 due to the following reasons: > 1. It significantly improves the write throughput > 2. In some of the cases it even decreases the read latency > 3. The read degradation in simultaneous read-write flows already exist, > even without this feature > > As for the write packing control, it can be included in 3.6 to supply a > partial solution for the read degradation or we can postpone it to 3.7 and > integrate it as part of the complete solution. > > Thanks, > Maya > >> >> My sense is that there's no way we can solve all of these to >> satisfaction in the next week (which is when the merge window will > open), but that by waiting a cycle we might come up with some good answers. >> >> What do other people think? If you're excited about these patchsets, > now would be a fine time to come forward with your benchmarking results > and to help understand the reads-during-writes regression. >> -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html