On 2012-06-15, at 4:04 PM, Ted Ts'o wrote: > On Thu, Jun 14, 2012 at 09:55:31PM +0000, Arnd Bergmann wrote: >> There is one more option we have to give the best possible performance, >> although that would be a huge amount of work to implement: >> >> Any large file gets put into its own context, and we mark that >> context "write-only" "unreliable" and "large-unit". This means the >> file system has to write the file sequentially, filling one erase >> block at a time, writing only "superpage" units (e.g. 16KB) or >> multiples of that at once. We can neither overwrite nor read back >> any of the data in that context until it is closed, and there is >> no guarantee that any of the data has made it to the physical medium >> before the context is closed. We are allowed to do read and write >> accesses to any other context between superpage writes though. >> After closing the context, the data will be just like any other >> block again. > > Oh, that's cool. And I don't think that's hard to do. We could just > keep a flag in the in-core inode indicating whether it is in "large > unit" mode. If it is in large unit mode, we can make the fs writeback > function make sure that we adhere to the restrictions of the large > unit mode, and if at any point we need to do something that might > violate the constraints, the file system would simply close the > context. This is very similar to what was implemented in mballoc preallocation. Large files will get their own preallocation context, while small files would share a context (i.e. an 8MB extent) and be packed densely into this extent to avoid seeking. It wouldn't be unreasonable to just give each mballoc context a different eMMC context. > The only reason I can think of why this might be problematic is if > there is a substantial performance cost involved with opening and > closing contexts on eMMC devices. Is that an issue we need to be > worried about? > >> Right now, there is no support for large-unit context and also not for >> read-only or write-only contexts, which means we don't have to >> enforce strict policies and can basically treat the context ID >> as a hint. Using the advanced features would require that we >> keep track of the context IDs across partitions and have to flush >> write-only contexts before reading the data again. If we want to >> do that, we can probably discard the patch series and start over. > > Well, I'm interested in getting something upstream, which is useful > not just for the consumer-grade eMMC devices in handsets, but which > might also be extensible to SSD's, and all the way up to PCIe-attached > flash devices that might be used in large data centers. > > I think if we do things right, it should be possible to do something > which would accomodate a large range of devices (which is why I > brought up the concept of exposing virtualized contexts to the file > system layer). > > Regards, > > - Ted > -- > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html Cheers, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html