On Tue, Jan 16, 2024 at 11:50:32AM +0100, Christian Brauner wrote: > Hey, > > I'm not sure this even needs a full LSFMM discussion but since I > currently don't have time to work on the patch I may as well submit it. > > Gnome recently got awared 1M Euro by the Sovereign Tech Fund (STF). The > STF was created by the German government to fund public infrastructure: > > "The Sovereign Tech Fund supports the development, improvement and > maintenance of open digital infrastructure. Our goal is to sustainably > strengthen the open source ecosystem. We focus on security, resilience, > technological diversity, and the people behind the code." (cf. [1]) > > Gnome has proposed various specific projects including integrating > systemd-homed with Gnome. Systemd-homed provides various features and if > you're interested in details then you might find it useful to read [2]. > It makes use of various new VFS and fs specific developments over the > last years. > > One feature is encrypting the home directory via LUKS. An approriate > image or device must contain a GPT partition table. Currently there's > only one partition which is a LUKS2 volume. Inside that LUKS2 volume is > a Linux filesystem. Currently supported are btrfs (see [4] though), > ext4, and xfs. > > The following issue isn't specific to systemd-homed. Gnome wants to be > able to support locking encrypted home directories. For example, when > the laptop is suspended. To do this the luksSuspend command can be used. > > The luksSuspend call is nothing else than a device mapper ioctl to > suspend the block device and it's owning superblock/filesystem. Which in > turn is nothing but a freeze initiated from the block layer: > > dm_suspend() > -> __dm_suspend() > -> lock_fs() > -> bdev_freeze() > > So when we say luksSuspend we really mean block layer initiated freeze. > The overall goal or expectation of userspace is that after a luksSuspend > call all sensitive material has been evicted from relevant caches to > harden against various attacks. And luksSuspend does wipe the encryption > key and suspend the block device. However, the encryption key can still > be available clear-text in the page cache. To illustrate this problem > more simply: > > truncate -s 500M /tmp/img > echo password | cryptsetup luksFormat /tmp/img --force-password > echo password | cryptsetup open /tmp/img test > mkfs.xfs /dev/mapper/test > mount /dev/mapper/test /mnt > echo "secrets" > /mnt/data > cryptsetup luksSuspend test > cat /mnt/data > > This will still happily print the contents of /mnt/data even though the > block device and the owning filesystem are frozen because the data is > still in the page cache. > > To my knowledge, the only current way to get the contents of /mnt/data > or the encryption key out of the page cache is via > /proc/sys/vm/drop_caches which is a big hammer. > > My initial reaction is to give userspace an API to drop the page cache > of a specific filesystem which may have additional uses. I initially had > started drafting an ioctl() and then got swayed towards a > posix_fadvise() flag. I found out that this was already proposed a few > years ago but got rejected as it was suspected this might just be > someone toying around without a real world use-case. I think this here > might qualify as a real-world use-case. > > This may at least help securing users with a regular dm-crypt setup > where dm-crypt is the top layer. Users that stack additional layers on > top of dm-crypt may still leak plaintext of course if they introduce > additional caching. But that's on them. > > Of course other ideas welcome. This isn't entirely unlike snapshot deletion, where we also need to shoot down the pagecache. Technically, the code I have now for snapshot deletion isn't quite what I want; snapshot deletion probably wants something closer to revoke() instead of waiting for files to be closed. But maybe the code I have is close to what you need - maybe we could turn this into a common shared API? https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/fs.c#n1569 The need for page zeroing is pretty orthogonal; if you want page zeroing you want that enabled for all page cache folios at all times.