Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Removing GFP_NOFS

Luis Henriques <lhenriques@xxxxxxx> · Tue, 09 Jan 2024 15:47:32 +0000

Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx> writes:

> On 05.01.24 11:57, Jan Kara wrote:
>> Hello,
>> 
>> On Thu 04-01-24 21:17:16, Matthew Wilcox wrote:
>>> This is primarily a _FILESYSTEM_ track topic.  All the work has already
>>> been done on the MM side; the FS people need to do their part.  It could
>>> be a joint session, but I'm not sure there's much for the MM people
>>> to say.
>>>
>>> There are situations where we need to allocate memory, but cannot call
>>> into the filesystem to free memory.  Generally this is because we're
>>> holding a lock or we've started a transaction, and attempting to write
>>> out dirty folios to reclaim memory would result in a deadlock.
>>>
>>> The old way to solve this problem is to specify GFP_NOFS when allocating
>>> memory.  This conveys little information about what is being protected
>>> against, and so it is hard to know when it might be safe to remove.
>>> It's also a reflex -- many filesystem authors use GFP_NOFS by default
>>> even when they could use GFP_KERNEL because there's no risk of deadlock.
>>>
>>> The new way is to use the scoped APIs -- memalloc_nofs_save() and
>>> memalloc_nofs_restore().  These should be called when we start a
>>> transaction or take a lock that would cause a GFP_KERNEL allocation to
>>> deadlock.  Then just use GFP_KERNEL as normal.  The memory allocators
>>> can see the nofs situation is in effect and will not call back into
>>> the filesystem.
>>>
>>> This results in better code within your filesystem as you don't need to
>>> pass around gfp flags as much, and can lead to better performance from
>>> the memory allocators as GFP_NOFS will not be used unnecessarily.
>>>
>>> The memalloc_nofs APIs were introduced in May 2017, but we still have
>>> over 1000 uses of GFP_NOFS in fs/ today (and 200 outside fs/, which is
>>> really sad).  This session is for filesystem developers to talk about
>>> what they need to do to fix up their own filesystem, or share stories
>>> about how they made their filesystem better by adopting the new APIs.
>> 
>> I agree this is a worthy goal and the scoped API helped us a lot in the
>> ext4/jbd2 land. Still we have some legacy to deal with:
>> 
>> ~> git grep "NOFS" fs/jbd2/ | wc -l
>> 15
>> ~> git grep "NOFS" fs/ext4/ | wc -l
>> 71
>>
>
> For everyone following out there being curious:
> 1 - affs
> 1 - cachefiles
> 1 - ecryptfs
> 1 - fscache
> 1 - notify
> 1 - squashfs
> 1 - vboxsf
> 1 - zonefs
> 2 - hfsplus
> 2 - tracefs
> 3 - 9p
> 3 - ext2
> 3 - iomap
> 5 - befs
> 5 - exfat
> 5 - fat
> 5 - udf
> 5 - ufs
> 7 - erofs
> 10 - fuse
> 11 - smb
> 14 - hpfs
> 15 - jbd2
> 17 - crypto
> 17 - jfs
> 17 - quota
> 17 - reiserfs
> 18 - nfs
> 18 - nilfs2
> 21 - ntfs
> 30 - xfs
> 37 - bcachefs
> 46 - gfs2
> 47 - afs
> 55 - dlm
> 61 - f2fs
> 63 - ceph
> 66 - ext4
> 71 - ocfs2
> 74 - ntfs3
> 84 - ubifs
> 199 - btrfs
>
> As I've already feared we (as in btrfs) are the worst here.

It probably won't make you feel any better, but the value for ceph isn't
correct as you're just taking into account the code in 'fs/ceph/'.  If you
also take 'net/ceph/', it brings it much closer to btrfs: 63 + 48 = 111

Cheers,
-- 
Luís