On Wed, May 29, 2019 at 11:06 PM Yurii Zubrytskyi <zyy@xxxxxxxxxx> wrote: > Yes, and this was _exactly_ our first plan, and it mitigates the read > performance > issue. The reasons why we didn't move forward with it are that we figured out > all other requirements, and fixing each of those needs another change in > FUSE, up to the level when FUSE interface becomes 50% dedicated to > our specific goal: > 1. MAP message would have to support data compression (with different > algorithms), hash verification (same thing) with hash streaming (because > even the Merkle tree for a 5GB file is huge, and can't be preloaded > at once) With the proposed FUSE solution the following sequences would occur: kernel: if index for given block is missing, send MAP message userspace: if data/hash is missing for given block then download data/hash userspace: send MAP reply kernel: decompress data and verify hash based on index The kernel would not be involved in either streaming data or hash, it would only work with data/hash that has already been downloaded. Right? Or is your implementation doing streamed decompress/hash or partial blocks? > 1.1. Mapping memory usage can get out of hands pretty quickly: it has to > be at least (offset + size + compression type + hash location + hash size + > hash kind) per each block. I'm not even thinking about multiple storage files > here. For that 5GB file (that's a debug APK for some Android game we're > targeting) we have 1.3M blocks, so ~16 bytes *1.3M = 20M of index only, > without actual overhead for the lookup table. > If the kernel code owns and manages its own on-disk data store and the > format, this index can be loaded and discarded on demand there. Why does the kernel have to know the on-disk format to be able to load and discard parts of the index on-demand? It only needs to know which blocks were accessed recently and which not so recently. > > There's also work currently ongoing in optimizing the overhead of > > userspace roundtrip. The most promising thing appears to be matching > > up the CPU for the userspace server with that of the task doing the > > request. This can apparently result in 60-500% speed improvement. > > That sounds almost too good to be true, and will be really cool. > Do you have any patches or git remote available in any compilable state to > try the optimization out? Android has quite complicated hardware config > and I want to see how this works, especially with our model where > several processes may send requests into the same filesystem FD. Currently it's only a bunch of hacks, no proper interfaces yet. I'll let you know once there's something useful for testing with a real filesystem. BTW, which interface does your fuse filesystem use? Libfuse? Raw device? Thanks, Miklos