Re: Initial patches for Incremental FS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 3, 2019 at 12:23 AM Eugene Zemtsov <ezemtsov@xxxxxxxxxx> wrote:
>
> On Thu, May 2, 2019 at 6:26 AM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > Why not CODA, though, with local fs as cache?
>
> On Thu, May 2, 2019 at 4:20 AM Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >
> > This sounds very useful.
> >
> > Why does it have to be a new special-purpose Linux virtual file?
> > Why not FUSE, which is meant for this purpose?
> > Those are things that you should explain when you are proposing a new
> > filesystem,
> > but I will answer for you - because FUSE page fault will incur high
> > latency also after
> > blocks are locally available in your backend store. Right?
> >
> > How about fscache support for FUSE then?
> > You can even write your own fscache backend if the existing ones don't
> > fit your needs for some reason.
> >
> > Piling logic into the kernel is not the answer.
> > Adding the missing interfaces to the kernel is the answer.
> >
>
> Thanks for the interest and feedback. What I dreaded most was silence.
>
> Probably I should have given a bit more details in the introductory email.
> Important features we’re aiming for:
>
> 1. An attempt to read a missing data block gives a userspace data loader a
> chance to fetch it. Once a block is loaded (in advance or after a page fault)
> it is saved into a local backing storage and following reads of the same block
> are done directly by the kernel. [Implemented]
>
> 2. Block level compression. It saves space on a device, while still allowing
> very granular loading and mapping. Less granular compression would trigger
> loading of more data than absolutely necessary, and that’s the thing we
> want to avoid. [Implemented]
>
> 3. Block level integrity verification. The signature scheme is similar to
> DMverity or fs-verity. In other words, each file has a Merkle tree with
> crypto-digests of 4KB blocks. The root digest is signed with RSASSA or ECDSA.
> Each time a data block is read digest is calculated and checked with the
> Merkle tree, if the signature check fails the read operation fails as well.
> Ideally I’d like to use fs-verity API for that. [Not implemented yet.]
>
> 4. New files can be pushed into incremental-fs “externally” when an app needs
> a new resource or a binary. This is needed for situations when a new resource
> or a new version of code is available, e.g. a user just changed the system
> language to Spanish, or a developer rolled out an app update.
> Things change over time and this means that we can’t just incrementally
> load a precooked ext4 image and mount it via a loopback device.   [Implemented]
>
> 5. No need to support writes or file resizing. It eliminates a lot of
> complexity.
>
> Currently not all of these features are implemented yet, but they all will be
> needed to achieve our goals:
>  - Apps can be delivered incrementally without having to wait for extra data.
>    At the same time given enough time the app can be downloaded fully without
>    having to keep a connection open after that.
> - App’s integrity should be verifiable without having to read all its blocks.
> - Local storage and battery need to be conserved.
> - Apps binaries and resources can change over time.
>    Such changes are triggered by external events.
>

This really sounds to me like the properties of a network filesystem
with local cache. It seems that you did a thorough research, but
I am not sure that you examined the fscache option properly.
Remember, if an existing module does not meet your needs,
it does not mean that creating a new module is the right answer.
It may be that extending an existing module is something that
everyone, including yourself will benefit from.

> I’d like to comment on proposed alternative solutions:
>
> FUSE
> We have a FUSE based prototype and though functional it turned out to be battery
> hungry and read performance leaving much to be desired.
> Our measurements were roughly corresponding to results in the article
> I link in PATCH 1 incrementalfs.rst
>
> In this thread Amir Goldstein absolutely correctly pointed out that FUSE’s
> constant overhead keeps hurting app’s performance even when all blocks are
> available locally. But not only that, FUSE needs to be involved with each
> readdir() and stat() call. And to our surprise we learned that many apps do
> directory traversals and stat()-s much more often that it seems reasonable.
>

That is a real problem. Alas readdir cache, recently added probably solves
your problem since your directory changes are infrequent.
stat cache also exists, but will be used depending on policy of mount options.
I am sure you can come up with caching policy that will meet your needs
and AFAIK FUSE protocol supports invalidating cache entries from server
(i.e. on "external" changes).

> Moreover, Android has a bit of a recent history with FUSE. A big chunk of
> Android directory tree (“external storage”) use to be mounted via FUSE.
> It didn’t turn out to be a great approach and it was eventually replaced by
> a kernel module.
>

I am aware of that history.
I suspect the decision to write sdcardfs followed similar logic to the one
that has lead you to write incfs.

> I reckon the amount of changes that we’d need to introduce to FUSE in order
> to make it support things mentioned above will be, to put it mildly,
> very substantial. And having to be as generic as FUSE (i.e. support writes etc)
> will make the task much more complicated than it is now.
>

Maybe. We won't know until you explore this option. Will we?

Thanks,
Amir.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux