On Tue, Feb 15, 2011 at 11:16 AM, Arnd Bergmann <arnd@xxxxxxxx> wrote: > On Monday 14 February 2011, Andrei Warkentin wrote: >> > There are multiple ways how this could be implemented: >> > >> > 1. Have one exception cache for all "special" blocks. This would normally >> > be for FAT32 subdirectory updates, which always write to the same >> > few blocks. This means you can do small writes efficiently anywhere >> > on the card, but only up to a (small) fixed number of block addresses. >> > If you overflow the table, the card still needs to go through an >> > extra PE for each new entry you write, in order to free up an entry. >> > >> > 2. Have a small number of AUs that can be in a special mode with efficient >> > small writes but inefficient large writes. This means that when you >> > alternate between small and large writes in the same AU, it has to go >> > through a PE on every switch. Similarly, if you do small writes to >> > more than the maximum number of AUs that can be held in this mode, you >> > get the same effect. This number can be as small as one, because that >> > is what FAT32 requires. >> > >> > In both cases, you don't actually have a solution for the problem, you just >> > make it less likely for specific workloads. >> >> Aha, ok. By the way, I did find out that either suggestion works. So >> I'll pull out the reversing portion of the patch. No need to >> overcomplicate :). > > BTW, what file system are you using? I could imagine that each of ext4, btrfs > and nilfs2 give you very different results here. It could be that if your > patch is optimizing for one file system, it is actually pessimising for > another one. > Ext4. I've actually been rewriting the patch a lot and it's taking time because there are a lot of things that are wrong in it (so I feel kinda bad for forwarding it to this list in the first place...). I've already mentioned that there is no need to reorder, so that's going away and it simplifies everything greatly. I agree, which is why all of this is controlled now through sysfs, and there are no more hard-coded checks for manfid, mmc versus sd or any other magic. There is a page_size_secs attribute, through which you can notify of the page size for the device. The workaround for small writes crossing the page boundary (and winding up in Buffer B, instead of A) is turned on by setting split_tlow and split_thigh, which provided a threshold range in sectors over which the the writes will be split/aligned. The second workaround for splitting larger requests and writing them with reliable write (to avoid getting coalesced and winding up in Buffer B again) is controlled through split_relw_tlow and split_relw_thigh. Do you think there is a better way? Or is this good enough? So, as I mentioned before, T had done some tests given data provided by M, and then T verified that this fix was good. I need to do my own tests on the patch after I rewrite it. Is iozone the best tool I can use? So far I have a MMC logging facility through connector that I use to collect stats (useful for seeing how fs traffic translates to actual mmc commands...once I clean it up I'll push here for RFC). What about the tool you're writing? Any way I can use it? -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html