Re: [LSF/MM/BPF TOPIC] Dropping page cache of individual fs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 17, 2024 at 02:19:43PM +0100, Christian Brauner wrote:
> On Wed, Jan 17, 2024 at 07:56:01AM +1100, Dave Chinner wrote:
> > On Tue, Jan 16, 2024 at 11:50:32AM +0100, Christian Brauner wrote:
> > > Hey,
> > > 
> > > I'm not sure this even needs a full LSFMM discussion but since I
> > > currently don't have time to work on the patch I may as well submit it.
> > > 
> > > Gnome recently got awared 1M Euro by the Sovereign Tech Fund (STF). The
> > > STF was created by the German government to fund public infrastructure:
> > > 
> > > "The Sovereign Tech Fund supports the development, improvement and
> > >  maintenance of open digital infrastructure. Our goal is to sustainably
> > >  strengthen the open source ecosystem. We focus on security, resilience,
> > >  technological diversity, and the people behind the code." (cf. [1])
> > > 
> > > Gnome has proposed various specific projects including integrating
> > > systemd-homed with Gnome. Systemd-homed provides various features and if
> > > you're interested in details then you might find it useful to read [2].
> > > It makes use of various new VFS and fs specific developments over the
> > > last years.
> > > 
> > > One feature is encrypting the home directory via LUKS. An approriate
> > > image or device must contain a GPT partition table. Currently there's
> > > only one partition which is a LUKS2 volume. Inside that LUKS2 volume is
> > > a Linux filesystem. Currently supported are btrfs (see [4] though),
> > > ext4, and xfs.
> > > 
> > > The following issue isn't specific to systemd-homed. Gnome wants to be
> > > able to support locking encrypted home directories. For example, when
> > > the laptop is suspended. To do this the luksSuspend command can be used.
> > > 
> > > The luksSuspend call is nothing else than a device mapper ioctl to
> > > suspend the block device and it's owning superblock/filesystem. Which in
> > > turn is nothing but a freeze initiated from the block layer:
> > > 
> > > dm_suspend()
> > > -> __dm_suspend()
> > >    -> lock_fs()
> > >       -> bdev_freeze()
> > > 
> > > So when we say luksSuspend we really mean block layer initiated freeze.
> > > The overall goal or expectation of userspace is that after a luksSuspend
> > > call all sensitive material has been evicted from relevant caches to
> > > harden against various attacks. And luksSuspend does wipe the encryption
> > > key and suspend the block device. However, the encryption key can still
> > > be available clear-text in the page cache.
> > 
> > The wiping of secrets is completely orthogonal to the freezing of
> > the device and filesystem - the freeze does not need to occur to
> > allow the encryption keys and decrypted data to be purged. They
> > should not be conflated; purging needs to be a completely separate
> > operation that can be run regardless of device/fs freeze status.
> 
> Yes, I'm aware. I didn't mean to imply that these things are in any way
> necessarily connected. Just that there are use-cases where they are. And
> the encrypted home directory case is one. One froze the block device and
> filesystem one would now also like to drop the page cache which has most
> of the interesting data.
> 
> The fact that after a block layer initiated freeze - again mostly a
> device mapper problem - one may or may not be able to successfully read
> from the filesystem is annoying. Of course one can't write, that will
> hang one immediately. But if one still has some data in the page cache
> one can still dump the contents of that file. That's at least odd
> behavior from a users POV even if for us it's cleary why that's the
> case.

A frozen filesystem doesn't prevent read operations from occurring.

> And a freeze does do a sync_filesystem() and a sync_blockdev() to flush
> out any dirty data for that specific filesystem.

Yes, it's required to do that - the whole point of freezing a
filesystem is to bring the filesystem into a *consistent physical
state on persistent storage* and to hold it in that state until it
is thawed.

> So it would be fitting
> to give users an api that allows them to also drop the page cache
> contents.

Not as part of a freeze operation.

Read operations have *always* been allowed from frozen filesystems;
they are intended to be allowed because one of the use cases for
freezing is to create a consistent filesystem state for backup of
the filesystem. That requires everything in the filesystem can be
read whilst it is frozen, and that means the page cache needs to
remain operational.

What the underlying device allows when it has been *suspended* is a
different issue altogether. The key observation here is that storage
device suspend != filesystem freeze and they can have very different
semantics depending on the operation being performed on the block
device while it is suspended.

IOWs, a device suspend implementation might freeze the filesystem to
bring the contents of the storage device whilst frozen into a
consistent, uptodate state (e.g. for device level backups), but
block device level suspend does not *require* that the filesystem is
frozen whilst the device IO operations are suspended.

> For some use-cases like the Gnome use-case one wants to do a freeze and
> drop everything that one can from the page cache for that specific
> filesystem.

So they have to do an extra system call between FS_IOC_FREEZE and
FS_IOC_THAW. What's the problem with that? What are you trying to
optimise by colliding cache purging with FS_IOC_FREEZE?

If the user/application/infrastructure already has to iterate all
the mounted filesystems to freeze them, then it's trivial for them
to add a cache purging step to that infrastructure for the storage
configurations that might need it. I just don't see why this needs
to be part of a block device freeze operation, especially as the
"purge caches on this filesystem" operation has potential use cases
outside of the luksSuspend context....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux