On Thu, Nov 7, 2024 at 1:57 AM Vinicius Costa Gomes <vinicius.gomes@xxxxxxxxx> wrote: > > Hi, > > Changes from v3: > - Another reorganization of the series: separate the pure mechanical > changes into their own (Amir Goldstein) > > The series now reads: > > Patch 1: Introduce the _light() version of the override/revert cred operations; > Patch 2: Convert backing-file.c to use those; > Patch 3: Mechanical change to introduce the ovl_revert_creds() helper; > Patch 4: Make the ovl_{override,convert}_creds() use the _light() > creds helpers, and fix the reference counting issue that would happen; > For the record, this series depends on backing_file API cleanup patch by Miklos: https://lore.kernel.org/linux-fsdevel/20241021103340.260731-1-mszeredi@xxxxxxxxxx/ > Changes from v2: > - Removed the "convert to guard()/scoped_guard()" patches (Miklos Szeredi); > - In the overlayfs code, convert all users of override_creds()/revert_creds() to the _light() versions by: > 1. making ovl_override_creds() use override_creds_light(); > 2. introduce ovl_revert_creds() which calls revert_creds_light(); > 3. convert revert_creds() to ovl_revert_creds() > (Amir Goldstein); > - Fix an potential reference counting issue, as the lifetime > expectations of the mounter credentials are different (Christian > Brauner); > I pushed these patches to: https://github.com/amir73il/linux/commits/ovl_creds rebased overlayfs-next on top of them and tested. Christian, Since this work is mostly based on your suggestions, I thought that you might want to author and handle this PR? Would you like to take the patches from ovl_creds (including the backing_file API cleanup) to a stable branch in your tree for me to base overlayfs-next on? Or would you rather I include them in the overlayfs PR for v6.13 myself? Thanks, Amir. P.S. some of the info below is relevant for the PR message and some of it is completely stale... > The series is now much simpler: > > Patch 1: Introduce the _light() version of the override/revert cred operations; > Patch 2: Convert backing-file.c to use those; > Patch 3: Do the conversion to use the _light() version internally; > Patch 4: Fix a potential refcounting issue > > Changes from v1: > - Re-organized the series to be easier to follow, more details below > (Miklos Szeredi and Amir Goldstein); > > The series now reads as follows: > > Patch 1: Introduce the _light() version of the override/revert cred operations; > Patch 2: Convert backing-file.c to use those; > Patch 3: Introduce the overlayfs specific _light() helper; > Patch 4: Document the cases that the helper cannot be used (critical > section may change the cred->usage counter); > Patch 5: Convert the "rest" of overlayfs to the _light() helpers (mostly mechanical); > Patch 6: Introduce the GUARD() helpers; > Patch 7: Convert backing-file.c to the GUARD() helpers; > Patch 8-15: Convert each overlayfs/ file to use the GUARD() helpers, > also explain the cases in which the scoped_guard() helper is > used. Note that a 'goto' jump that crosses the guard() should > fail to compile, gcc has a bug that fails to detect the > error[1]. > Patch 16: Remove the helper introduced in Patch 3 to close the series, > as it is no longer used, everything was converted to use the > safer/shorter GUARD() helpers. this info is stale and confusing in the context of this series. > > [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91951 > > This bug was also noticed here: > > https://lore.kernel.org/all/20240730050927.GC5334@ZenIV/ > > Link to v1: > > https://lore.kernel.org/r/20240403021808.309900-1-vinicius.gomes@xxxxxxxxx/ > > Changes from RFC v3: > - Removed the warning "fixes" patches, as they could hide potencial > bugs (Christian Brauner); > - Added "cred-specific" macros (Christian Brauner), from my side, > added a few '_' to the guards to signify that the newly introduced > helper macros are preferred. > - Changed a few guard() to scoped_guard() to fix the clang (17.0.6) > compilation error about 'goto' bypassing variable initialization; > > Link to RFC v3: > > https://lore.kernel.org/r/20240216051640.197378-1-vinicius.gomes@xxxxxxxxx/ > > Changes from RFC v2: > - Added separate patches for the warnings for the discarded const > when using the cleanup macros: one for DEFINE_GUARD() and one for > DEFINE_LOCK_GUARD_1() (I am uncertain if it's better to squash them > together); > - Reordered the series so the backing file patch is the first user of > the introduced helpers (Amir Goldstein); > - Change the definition of the cleanup "class" from a GUARD to a > LOCK_GUARD_1, which defines an implicit container, that allows us > to remove some variable declarations to store the overriden > credentials (Amir Goldstein); > - Replaced most of the uses of scoped_guard() with guard(), to reduce > the code churn, the remaining ones I wasn't sure if I was changing > the behavior: either they were nested (overrides "inside" > overrides) or something calls current_cred() (Amir Goldstein). > > New questions: > - The backing file callbacks are now called with the "light" > overriden credentials, so they are kind of restricted in what they > can do with their credentials, is this acceptable in general? > - in ovl_rename() I had to manually call the "light" the overrides, > both using the guard() macro or using the non-light version causes > the workload to crash the kernel. I still have to investigate why > this is happening. Hints are appreciated. > > Link to the RFC v2: > > https://lore.kernel.org/r/20240125235723.39507-1-vinicius.gomes@xxxxxxxxx/ > > Original cover letter (lightly edited): > > It was noticed that some workloads suffer from contention on > increasing/decrementing the ->usage counter in their credentials, > those refcount operations are associated with overriding/reverting the > current task credentials. (the linked thread adds more context) > > In some specialized cases, overlayfs is one of them, the credentials > in question have a longer lifetime than the override/revert "critical > section". In the overlayfs case, the credentials are created when the > fs is mounted and destroyed when it's unmounted. In this case of long > lived credentials, the usage counter doesn't need to be > incremented/decremented. > > Add a lighter version of credentials override/revert to be used in > these specialized cases. To make sure that the override/revert calls > are paired, add a cleanup guard macro. This was suggested here: > > https://lore.kernel.org/all/20231219-marken-pochen-26d888fb9bb9@brauner/ > > With a small number of tweaks: > - Used inline functions instead of macros; > - A small change to store the credentials into the passed argument, > the guard is now defined as (note the added '_T ='): > > DEFINE_GUARD(cred, const struct cred *, _T = override_creds_light(_T), > revert_creds_light(_T)); > > - Allow "const" arguments to be used with these kind of guards; > > Some comments: > - If patch 1/5 and 2/5 are not a good idea (adding the cast), the > alternative I can see is using some kind of container for the > credentials; > - The only user for the backing file ops is overlayfs, so these > changes make sense, but may not make sense in the most general > case; > > For the numbers, some from 'perf c2c', before this series: > (edited to fit) > > # > # ----- HITM ----- Shared > # Num RmtHitm LclHitm Symbol Object Source:Line Node > # ..... ....... ....... .......................... ................ .................. .... > # > ------------------------- > 0 412 1028 > ------------------------- > 41.50% 42.22% [k] revert_creds [kernel.vmlinux] atomic64_64.h:39 0 1 > 15.05% 10.60% [k] override_creds [kernel.vmlinux] atomic64_64.h:25 0 1 > 0.73% 0.58% [k] init_file [kernel.vmlinux] atomic64_64.h:25 0 1 > 0.24% 0.10% [k] revert_creds [kernel.vmlinux] cred.h:266 0 1 > 32.28% 37.16% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 > 9.47% 8.75% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 > 0.49% 0.58% [k] inode_owner_or_capable [kernel.vmlinux] mnt_idmapping.h:81 0 1 > 0.24% 0.00% [k] generic_permission [kernel.vmlinux] namei.c:354 0 > > ------------------------- > 1 50 103 > ------------------------- > 100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 > > ------------------------- > 2 50 98 > ------------------------- > 96.00% 96.94% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 > 2.00% 1.02% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1 > 0.00% 2.04% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0 > 2.00% 0.00% [k] update_cfs_group [kernel.vmlinux] fair.c:3932 0 1 > > after this series: > > # > # ----- HITM ----- Shared > # Num RmtHitm LclHitm Symbol Object Source:Line Node > # ..... ....... ....... .................... ................ ................ .... > # > ------------------------- > 0 54 88 > ------------------------- > 100.00% 100.00% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 > > ------------------------- > 1 48 83 > ------------------------- > 97.92% 97.59% [k] update_cfs_group [kernel.vmlinux] atomic64_64.h:15 0 1 > 2.08% 1.20% [k] update_load_avg [kernel.vmlinux] atomic64_64.h:25 0 1 > 0.00% 1.20% [k] update_load_avg [kernel.vmlinux] fair.c:4118 0 1 > > ------------------------- > 2 28 44 > ------------------------- > 85.71% 79.55% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 > 14.29% 20.45% [k] generic_permission [kernel.vmlinux] mnt_idmapping.h:81 0 1 > > The contention is practically gone. > > Link: https://lore.kernel.org/all/20231018074553.41333-1-hu1.chen@xxxxxxxxx/ > > Vinicius Costa Gomes (4): > cred: Add a light version of override/revert_creds() > fs/backing-file: Convert to revert/override_creds_light() > ovl: use wrapper ovl_revert_creds() > ovl: Optimize override/revert creds > > fs/backing-file.c | 20 ++++++++++---------- > fs/overlayfs/copy_up.c | 2 +- > fs/overlayfs/dir.c | 17 +++++++++++------ > fs/overlayfs/file.c | 14 +++++++------- > fs/overlayfs/inode.c | 20 ++++++++++---------- > fs/overlayfs/namei.c | 10 +++++----- > fs/overlayfs/overlayfs.h | 1 + > fs/overlayfs/readdir.c | 8 ++++---- > fs/overlayfs/util.c | 11 ++++++++--- > fs/overlayfs/xattrs.c | 9 ++++----- > include/linux/cred.h | 18 ++++++++++++++++++ > kernel/cred.c | 6 +++--- > 12 files changed, 82 insertions(+), 54 deletions(-) > > -- > 2.47.0 >