> With the proposed FUSE solution the following sequences would occur: > > kernel: if index for given block is missing, send MAP message > userspace: if data/hash is missing for given block then download data/hash > userspace: send MAP reply > kernel: decompress data and verify hash based on index > > The kernel would not be involved in either streaming data or hash, it > would only work with data/hash that has already been downloaded. > Right? > > Or is your implementation doing streamed decompress/hash or partial blocks? > ... > Why does the kernel have to know the on-disk format to be able to load > and discard parts of the index on-demand? It only needs to know which > blocks were accessed recently and which not so recently. > (1) You're correct, only the userspace deals with all streaming. Kernel then sees full blocks of data (usually LZ4-compressed) and blocks of hashes We'd need to give the location of the hash tree instead of the individual hash here though - verification has to go all the way to the top and even check the signature there. And the same 5 GB file would have over 40 MB of hashes (32 bytes of SHA2 for each 4K block), so those have to be read from disk as well. Overall, let's just imagine a phone with 100 apps, 100MB each, installed this way. That ends up being ~10GB of data, so we'd need _at least_ 40 MB for the index and 80 MB for hashes *in kernel*. Android now fights for each megabyte of RAM used in the system services, so FUSE won't be able to cache that, going back to the user mode for almost all reads again. (1 and 2) ... If FUSE were to know the on-disk format it would be able to simply parse and read it when needed, with as little memory footprint as it can. Requesting this data from the usermode every time with little caching defeats the whole purpose of the change. > BTW, which interface does your fuse filesystem use? Libfuse? Raw device? Yes, our code interacts with the raw FUSE fd via poll/read/write calls. We have tried the multithreaded approach via duping the control fd and FUSE_DEV_IOC_CLONE, but it didn't give much improvement - Android apps aren't usually use multithreaded, so there's at most two pending reads at once. I've seen 10 once, but that was some kind of miractle And again, we have not even looked at the directory structure and stat caching yet, neither interface nor memory usage. For a general case we have to make direct disk reads from kernel and this forces even bigger part of the disk format to be defined there. The end result is what we've got when researching FUSE - a huge chunk of FUSE gets overspecialized to handle our own way of using it end to end, with no real configurability (because making it configurable makes that code even bigger and more complex) -- Thanks, Yurii