On Fri, Aug 09, 2024 at 10:10:40PM +0200, Christian Brauner wrote: > This is in rough shape. I just drafted it quickly to get the idea > across. Compiled with all the SANS we have, booted and ran parts of LTP > and so far nothing murderd my vms. /me goes to get some sleep > > The struct fown_struct wastes 32 bytes of precious memory in struct > file. It is often unused in a lot of workloads. We should put the burden > on the use-cases that care about this and make them allocate the struct > on demand. This will allow us to free up 24 bytes in struct file. > Here is an alternative which smaller saving, but also less work, noting it here just in case. Commentary on the patch is after that. On my kernel pahole says the following about struct file: [snip] /* --- cacheline 1 boundary (64 bytes) --- */ loff_t f_pos; /* 64 8 */ unsigned int f_flags; /* 72 4 */ /* XXX 4 bytes hole, try to pack */ struct fown_struct f_owner; /* 80 32 */ const struct cred * f_cred; /* 112 8 */ struct file_ra_state f_ra; /* 120 32 */ /* --- cacheline 2 boundary (128 bytes) was 24 bytes ago --- */ [snip] And of course: struct fown_struct { rwlock_t lock; /* 0 8 */ struct pid * pid; /* 8 8 */ enum pid_type pid_type; /* 16 4 */ kuid_t uid; /* 20 4 */ kuid_t euid; /* 24 4 */ int signum; /* 28 4 */ /* size: 32, cachelines: 1, members: 6 */ /* last cacheline: 32 bytes */ }; So *some* saving can be achieved without moving stuff out. 1. there is no way using a rwlock is warranted, a spinlock is 4 bytes Moving the lock to the end in this struct and placing f_flags *after* f_owner should result in plugging the alignment gap. 2. enum pid_type would fit in one byte no problem, there is very opts defined there. compilers support specifying enum size, but there is a lot of bikeshedding possible around implementing it and I have no interest in fighting that, fwiw in FreeBSD this landed as follows: /* declare */ __enum_uint8_decl(vstate) { VSTATE_UNINITIALIZED, VSTATE_CONSTRUCTED, VSTATE_DESTROYING, VSTATE_DEAD, VLASTSTATE = VSTATE_DEAD, }; /* use */ __enum_uint8(vstate) v_state; /* implementation */ #define __enum_uint8_decl(name) enum enum_ ## name ## _uint8 : uint8_t #define __enum_uint8(name) enum enum_ ## name ## _uint8 as a hack here you could merely enum pid_type:uint8_t pid_type; or similar 3. signum does not have to be int either, sounds like another one-byter then this results in 6 bytes of padding which may or may not be possible to fill in in struct file later. > +struct fown_struct nop_fown_struct = { > + .lock = __RW_LOCK_UNLOCKED(nop_fown_struct.lock), > + .pid = NULL, > + .pid_type = PIDTYPE_MAX, > + .uid = INVALID_UID, > + .euid = INVALID_UID, > + .signum = 0, > +}; why this instead of NULL checking? For funcs which are not guaranteed to see the thing already allocated you can still: f_owner = file_f_owner_allocate(filp); if (IS_ERR(f_owner)) return PTR_ERR(f_owner); while for funcs where it is optionally present you just if (filp->f_owner != NULL) .... am I missing something here? that aside error handling as implemented is weirdly inconsistent -- you got ERR_CAST(fowner), PTR_ERR(fowner) and -ENOMEM. > + > +/* > + * Allocate an file->f_owner struct if it doesn't exist, handling racing > + * allocations correctly. > + */ > +struct fown_struct *file_f_owner_allocate(struct file *file) > +{ > + struct fown_struct *f_owner; > + > + f_owner = smp_load_acquire(&file->f_owner); For all spots of the sort you don't need an acquire fence, a consume fence is sufficient which I believe in Linux is guaranteed to be provided with READ_ONCE. I failed to find smp_load_consume, presumably for that reason. > +struct fown_struct *file_f_owner_allocate(struct file *file) > +{ > + struct fown_struct *f_owner; > + > + f_owner = smp_load_acquire(&file->f_owner); > + if (f_owner != &nop_fown_struct) > + return NULL; > + > + f_owner = kzalloc(sizeof(struct fown_struct), GFP_KERNEL); > + if (!f_owner) > + return ERR_PTR(-ENOMEM); > + > + rwlock_init(&f_owner->lock); > + f_owner->file = file; > + /* If someone else raced us, drop our allocation. */ > + if (unlikely(cmpxchg(&file->f_owner, &nop_fown_struct, f_owner) != > + &nop_fown_struct)) { > + kfree(f_owner); > + return NULL; this wants to return the found pointer, not NULL > + } > + > + return f_owner; > +}