Re: [LSF/MM/BPF TOPIC] Dropping page cache of individual fs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 16, 2024 at 11:50:32AM +0100, Christian Brauner wrote:
> Hey,
> 
> I'm not sure this even needs a full LSFMM discussion but since I
> currently don't have time to work on the patch I may as well submit it.
> 
> Gnome recently got awared 1M Euro by the Sovereign Tech Fund (STF). The
> STF was created by the German government to fund public infrastructure:
> 
> "The Sovereign Tech Fund supports the development, improvement and
>  maintenance of open digital infrastructure. Our goal is to sustainably
>  strengthen the open source ecosystem. We focus on security, resilience,
>  technological diversity, and the people behind the code." (cf. [1])
> 
> Gnome has proposed various specific projects including integrating
> systemd-homed with Gnome. Systemd-homed provides various features and if
> you're interested in details then you might find it useful to read [2].
> It makes use of various new VFS and fs specific developments over the
> last years.
> 
> One feature is encrypting the home directory via LUKS. An approriate
> image or device must contain a GPT partition table. Currently there's
> only one partition which is a LUKS2 volume. Inside that LUKS2 volume is
> a Linux filesystem. Currently supported are btrfs (see [4] though),
> ext4, and xfs.
> 
> The following issue isn't specific to systemd-homed. Gnome wants to be
> able to support locking encrypted home directories. For example, when
> the laptop is suspended. To do this the luksSuspend command can be used.
> 
> The luksSuspend call is nothing else than a device mapper ioctl to
> suspend the block device and it's owning superblock/filesystem. Which in
> turn is nothing but a freeze initiated from the block layer:
> 
> dm_suspend()
> -> __dm_suspend()
>    -> lock_fs()
>       -> bdev_freeze()
> 
> So when we say luksSuspend we really mean block layer initiated freeze.
> The overall goal or expectation of userspace is that after a luksSuspend
> call all sensitive material has been evicted from relevant caches to
> harden against various attacks. And luksSuspend does wipe the encryption
> key and suspend the block device. However, the encryption key can still
> be available clear-text in the page cache. To illustrate this problem
> more simply:
> 
> truncate -s 500M /tmp/img
> echo password | cryptsetup luksFormat /tmp/img --force-password
> echo password | cryptsetup open /tmp/img test
> mkfs.xfs /dev/mapper/test
> mount /dev/mapper/test /mnt
> echo "secrets" > /mnt/data
> cryptsetup luksSuspend test
> cat /mnt/data
> 
> This will still happily print the contents of /mnt/data even though the
> block device and the owning filesystem are frozen because the data is
> still in the page cache.
> 
> To my knowledge, the only current way to get the contents of /mnt/data
> or the encryption key out of the page cache is via
> /proc/sys/vm/drop_caches which is a big hammer.
> 
> My initial reaction is to give userspace an API to drop the page cache
> of a specific filesystem which may have additional uses. I initially had
> started drafting an ioctl() and then got swayed towards a
> posix_fadvise() flag. I found out that this was already proposed a few
> years ago but got rejected as it was suspected this might just be
> someone toying around without a real world use-case. I think this here
> might qualify as a real-world use-case.
> 
> This may at least help securing users with a regular dm-crypt setup
> where dm-crypt is the top layer. Users that stack additional layers on
> top of dm-crypt may still leak plaintext of course if they introduce
> additional caching. But that's on them.
> 
> Of course other ideas welcome.

This isn't entirely unlike snapshot deletion, where we also need to
shoot down the pagecache.

Technically, the code I have now for snapshot deletion isn't quite what
I want; snapshot deletion probably wants something closer to revoke()
instead of waiting for files to be closed. But maybe the code I have is
close to what you need - maybe we could turn this into a common shared
API?

https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/fs.c#n1569

The need for page zeroing is pretty orthogonal; if you want page zeroing
you want that enabled for all page cache folios at all times.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux