On Thu, May 9, 2019 at 1:15 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote: > > I think you have made the right choice for you and for the product you are > working on to use an isolated module to provide this functionality. > > But I assume the purpose of your posting was to request upstream inclusion, > community code review, etc. This is not likely to happen when the > implementation and design choices are derived from Employer needs vs. > the community needs. Sure, you can get high level design review, which is > what *this* is, but I recon not much more. > > This discussion has several references to community projects that can benefit > from this functionality, but not in its current form. > > This development model has worked well in the past for Android and the Android > user base leverage could help to get you a ticket to staging, but eventually, > those modules (e.g. ashmem) often do get replaced with more community oriented > APIs. > Hi fsdevel I'm Yurii, and I work with Eugene on the same team and the same project. I want to explain how we ended up with a custom filesystem instead of trying to improve FUSE for everyone, and why we think (maybe incorrectly) that it may be still pretty useful for the community. As the project goal was to allow instant (-ish) deployment of apps from the dev environment to Android phone, we were hoping to stick with plain FUSE filesystem, and that's what we've done at first. But it turned out that even with the best tuning it was still really slow and battery-hungry (phones spent energy faster than they were charging over the cord). At this point we've already collected the profiles for the filesystem usage, and also figured out what features are essential to make it usable for streaming: 1. Random reads are the most common -> 4kb-sized read is the size we have to support, and may not go to usermode on each of those 2. Android tends to list the app directory and stat files in it often -> these operations need to be cached in kernel as well 3. Because of *random* reads streaming files sequentially isn't optimal -> need to be able to collect read logs from first deployment and stream in that order next time on incremental builds 4. Devices have small flash cards, need to deploy uncompressed game images for speed and mmap access -> support storing 4kb blocks compressed 4.1. Host computer is much better at compression -> support streaming compressed blocks into the filesystem storage directly, without recompression on the phone 5. Android has to verify app signature for installation -> need to support per-block signing and lazy verification 5.1. For big games even per-block signature data can be huge, so need to stream even the signatures 6. Development cycle is usually edit-build-try-edit-... -> need to support delta-patches from existing files 7. File names for installed apps are standard and different from what they were on the host -> must be able to store user-supplied 'key' next to each file to identify it 8. Files never change -> no need to have complex code for mutable data in the filesystem In the end, we saw only two ways how to make all of this work: either take sdcardfs as a base and extend it, or change FUSE to support cache in kernel; and as you can imagine, sdcardfs route got thrown out of the window immediately after looking at the code. But after learning some FUSE internals and its code what we found out is that to make it do all the listed things we'd basically have to implement a totally new filesystem inside of it. The only real use of FUSE that remained was to send FUSE_INIT, and occasional read requests. Everything else required, first of all, making a cache object inside FUSE intercept every message before it goes to the user mode, and also adding new specialized commands initiated by the usermode (e.g. prefetching data that hasn't been requested yet, or streaming hashes in). Some things even didn't make sense for a generic usecase (e.g. having a limited circular buffer of read blocks in kernel that user can ask for and flush). In the end, after several tries we just came to a conclusion that the very set of original requirements is so specific that, funny enough, anyone who wants to create a lazy-loading experience would hit most of them, while anyone who's doing something else, would miss most of them. That's the main reason to go with a separate specialized driver module, and the reason to share it with the community - we have a feeling that people will benefit from a high-quality implementation of lazy loading in kernel, and we will benefit from the community support and guiding. Again, we all are human and can be wrong at any step when making conclusions. E.g. we didn't know about the fscache subsystem, and were only planning to create a cache object inside FUSE instead. But for now I still feel that our original research stands, and that in the long run specialized filesystem serves its users much better than several scattered changes in other places that all pretty much look like the same filesystem split into three parts and adopted to the interfaces those places force onto it. Even more, those changes and interfaces look quite strange on their own, when not used together. Please tell me what you think about this whole thing. We do care about the feature in general, not about making it look as we've coded it right now. If you feel that making fscache interface that covers the whole FUSE usermode messages + allows for those requirements is useful beyond streaming, we'll investigate that route further. Thank you, and sorry for a long email -- Thanks, Yurii