On Tuesday 01 March 2011 19:48:17 Jens Axboe wrote: > > On 2011-02-25 07:21, Arnd Bergmann wrote: > > On Friday 25 February 2011, Andrei Warkentin wrote: > >> Yup. I understand :-). That's the strategy I'm going to follow. For > >> page_size-alignment/splitting I'm looking at the block layer now. Is > >> that the right approach or should I still submit a (cleaned up) patch > >> to mmc/card/block.c for that performance improvement. > > > > I guess it should live in block/cfq-iosched in the long run, but I don't > > know how easy it is to implement it there for test purposes. > > I don't think I saw the original patch(es) for this? Nobody has posted one yet, only discussions. Andrei made a patch for the MMC block driver to split requests in some cases, but I think the concept has changed enough that it's probably not useful to look at that patch. I think what needs to be done here is to split requests in these cases: * Small requests should be split on flash page boundaries, where a page is typically 8 to 32 KB. Sending one hardware request that spans two partial pages can be slower than sending two requests with the same data, but on page boundaries. * If a hardware transfer is limited to a few sectors, these should be aligned to page boundaries. E.g. assuming a 16 sector page and 32 sector maximum transfers, a request that spans from sector 7 to 62 should be split into three transfers: 7-15, 16-47 and 48-62, not 7-38 and 39-62. This reduces the number of page read-modify-write cycles that the drive does. * No request should ever span multiple erase blocks. Most flash drives today have 4MB erase blocks (sometimes 1, 2 or 8), and the I/O scheduler should treat the erase block boundary like a seek on a hard drive. The I/O scheduler should try to send all sector writes of an erase block in sequence, but after that it can chose any other erase block to write to next. I think if we get this logic, we can deal well with all cheap flash drives. The two parameters we need are the page size and the erase block size, which the kernel can sometimes guess, but should also be tunable in sysfs for devices that don't tell us or lie to the kernel about them. I'm not sure if we want to do this for all nonrotational media, or add another flag to enable these optimizations. On proper SSDs that have an intelligent controller and enough RAM, they probably would not help all that much, or even make it slightly slower due to a higher number of separate write requests. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html