Re: Initial patches for Incremental FS

Yurii Zubrytskyi <zyy@xxxxxxxxxx> · Thu, 30 May 2019 15:45:42 -0700

> With the proposed FUSE solution the following sequences would occur:
>
> kernel: if index for given block is missing, send MAP message
>   userspace: if data/hash is missing for given block then download data/hash
>   userspace: send MAP reply
> kernel: decompress data and verify hash based on index
>
> The kernel would not be involved in either streaming data or hash, it
> would only work with data/hash that has already been downloaded.
> Right?
>
> Or is your implementation doing streamed decompress/hash or partial blocks?
> ...
> Why does the kernel have to know the on-disk format to be able to load
> and discard parts of the index on-demand?  It only needs to know which
> blocks were accessed recently and which not so recently.
>
(1) You're correct, only the userspace deals with all streaming.
Kernel then sees full blocks of data (usually LZ4-compressed) and
blocks of hashes
We'd need to give the location of the hash tree instead of the
individual hash here though - verification has to go all the way to
the top and even check the signature there. And the same 5 GB file
would have over 40 MB of hashes (32 bytes of SHA2 for each 4K block),
so those have to be read from disk as well.
Overall, let's just imagine a phone with 100 apps, 100MB each,
installed this way. That ends up being ~10GB of data, so we'd need _at
least_ 40 MB for the index and 80 MB for hashes *in kernel*. Android
now fights for each megabyte of RAM used in the system services, so
FUSE won't be able to cache that, going back to the user mode for
almost all reads again.
(1 and 2) ... If FUSE were to know the on-disk format it would be able
to simply parse and read it when needed, with as little memory
footprint as it can. Requesting this data from the usermode every time
with little caching defeats the whole purpose of the change.

> BTW, which interface does your fuse filesystem use?  Libfuse?  Raw device?
Yes, our code interacts with the raw FUSE fd via poll/read/write
calls. We have tried the multithreaded approach via duping the control
fd and FUSE_DEV_IOC_CLONE, but it didn't give much improvement -
Android apps aren't usually use multithreaded, so there's at most two
pending reads at once. I've seen 10 once, but that was some kind of
miractle

And again, we have not even looked at the directory structure and stat
caching yet, neither interface nor memory usage. For a general case we
have to make direct disk reads from kernel and this forces even bigger
part of the disk format to be defined there. The end result is what
we've got when researching FUSE - a huge chunk of FUSE gets
overspecialized to handle our own way of using it end to end, with no
real configurability (because making it configurable makes that code
even bigger and more complex)

--
Thanks, Yurii