Re: [PATCH] [DRAFT RFC]: file: reclaim 24 bytes from f_owner

Mateusz Guzik <mjguzik@xxxxxxxxx> · Sat, 10 Aug 2024 11:36:32 +0200

On Fri, Aug 09, 2024 at 10:10:40PM +0200, Christian Brauner wrote:
> This is in rough shape. I just drafted it quickly to get the idea
> across. Compiled with all the SANS we have, booted and ran parts of LTP
> and so far nothing murderd my vms. /me goes to get some sleep
> 
> The struct fown_struct wastes 32 bytes of precious memory in struct
> file. It is often unused in a lot of workloads. We should put the burden
> on the use-cases that care about this and make them allocate the struct
> on demand. This will allow us to free up 24 bytes in struct file.
> 

Here is an alternative which smaller saving, but also less work, noting
it here just in case. Commentary on the patch is after that.

On my kernel pahole says the following about struct file:
[snip]
        /* --- cacheline 1 boundary (64 bytes) --- */
        loff_t                     f_pos;                /*    64     8 */
        unsigned int               f_flags;              /*    72     4 */

        /* XXX 4 bytes hole, try to pack */

        struct fown_struct         f_owner;              /*    80    32 */
        const struct cred  *       f_cred;               /*   112     8 */
        struct file_ra_state       f_ra;                 /*   120    32 */
        /* --- cacheline 2 boundary (128 bytes) was 24 bytes ago --- */
[snip]

And of course:
struct fown_struct {
        rwlock_t                   lock;                 /*     0     8 */
        struct pid *               pid;                  /*     8     8 */
        enum pid_type              pid_type;             /*    16     4 */
        kuid_t                     uid;                  /*    20     4 */
        kuid_t                     euid;                 /*    24     4 */
        int                        signum;               /*    28     4 */

        /* size: 32, cachelines: 1, members: 6 */
        /* last cacheline: 32 bytes */
};

So *some* saving can be achieved without moving stuff out.

1. there is no way using a rwlock is warranted, a spinlock is 4 bytes

Moving the lock to the end in this struct and placing f_flags *after*
f_owner should result in plugging the alignment gap.

2. enum pid_type would fit in one byte no problem, there is very opts
defined there. compilers support specifying enum size, but there is a
lot of bikeshedding possible around implementing it and I have no
interest in fighting that, fwiw in FreeBSD this landed as follows:

/* declare */
__enum_uint8_decl(vstate) {
        VSTATE_UNINITIALIZED,
        VSTATE_CONSTRUCTED,
        VSTATE_DESTROYING,
        VSTATE_DEAD,
        VLASTSTATE = VSTATE_DEAD,
};

/* use */
__enum_uint8(vstate) v_state;

/* implementation */
#define __enum_uint8_decl(name) enum enum_ ## name ## _uint8 : uint8_t
#define __enum_uint8(name)      enum enum_ ## name ## _uint8

as a hack here you could merely enum pid_type:uint8_t pid_type; or
similar

3. signum does not have to be int either, sounds like another one-byter

then this results in 6 bytes of padding which may or may not be possible
to fill in in struct file later.

> +struct fown_struct nop_fown_struct = {
> +	.lock		= __RW_LOCK_UNLOCKED(nop_fown_struct.lock),
> +	.pid		= NULL,
> +	.pid_type	= PIDTYPE_MAX,
> +	.uid		= INVALID_UID,
> +	.euid		= INVALID_UID,
> +	.signum		= 0,
> +};

why this instead of NULL checking?

For funcs which are not guaranteed to see the thing already allocated
you can still:
	f_owner = file_f_owner_allocate(filp);
	if (IS_ERR(f_owner))
		return PTR_ERR(f_owner);

while for funcs where it is optionally present you just
	if (filp->f_owner != NULL) ....

am I missing something here?

that aside error handling as implemented is weirdly inconsistent -- you
got ERR_CAST(fowner), PTR_ERR(fowner) and -ENOMEM.

> +
> +/*
> + * Allocate an file->f_owner struct if it doesn't exist, handling racing
> + * allocations correctly.
> + */
> +struct fown_struct *file_f_owner_allocate(struct file *file)
> +{
> +	struct fown_struct *f_owner;
> +
> +	f_owner = smp_load_acquire(&file->f_owner);

For all spots of the sort you don't need an acquire fence, a consume
fence is sufficient which I believe in Linux is guaranteed to be
provided with READ_ONCE. I failed to find smp_load_consume, presumably
for that reason.

> +struct fown_struct *file_f_owner_allocate(struct file *file)
> +{
> +	struct fown_struct *f_owner;
> +
> +	f_owner = smp_load_acquire(&file->f_owner);
> +	if (f_owner != &nop_fown_struct)
> +		return NULL;
> +
> +	f_owner = kzalloc(sizeof(struct fown_struct), GFP_KERNEL);
> +	if (!f_owner)
> +		return ERR_PTR(-ENOMEM);
> +
> +	rwlock_init(&f_owner->lock);
> +	f_owner->file = file;
> +	/* If someone else raced us, drop our allocation. */
> +	if (unlikely(cmpxchg(&file->f_owner, &nop_fown_struct, f_owner) !=
> +		     &nop_fown_struct)) {
> +		kfree(f_owner);
> +		return NULL;

this wants to return the found pointer, not NULL

> +	}
> +
> +	return f_owner;
> +}