On Mon, Oct 2, 2023 at 10:23 AM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > On Mon, Oct 02, 2023 at 09:40:15AM +0300, Amir Goldstein wrote: > > On Mon, Oct 2, 2023 at 5:37 AM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > > > ovl_permission() accesses ->layers[...].mnt; we can't have ->layers > > > freed without an RCU delay on fs shutdown. Fortunately, kern_unmount_array() > > > used to drop those mounts does include an RCU delay, so freeing is > > > delayed; unfortunately, the array passed to kern_unmount_array() is > > > formed by mangling ->layers contents and that happens without any > > > delays. > > > > > > Use a separate array instead; local if we have a few layers, > > > kmalloc'ed if there's a lot of them. If allocation fails, > > > fall back to kern_unmount() for individual mounts; it's > > > not a fast path by any stretch of imagination. > > > > If that is the case, then having 3 different code paths seems > > quite an excessive over optimization... > > > > I think there is a better way - > > layout the mounts array linearly in ofs->mounts[] to begin with, > > remove .mnt out of ovl_layer and use ofs->mounts[layer->idx] to > > get to a layer's mount. > > > > I can try to write this patch to see how it ends up looking. > > Fine by me; about the only problem I see is the cache footprint... I've also considered just allocating the extra space for ofs->mounts[] at super creation time rather than on super destruction. I just cannot get myself to be bothered with this cleanup code because of saving memory of ovl_fs. However, looking closer, we have a wasfull layer->name pointer, which is only ever used for ovl_show_options() (to print the original requested layer path from mount options). So I am inclined to move these rarely accessed pointers to ofs->layer_names[], which can be used for the temp array for kern_unmount_array() because freeing name does not need RCU delay AFAICT (?). I will take a swing at it. > How many layers do you consider the normal use, actually? Define "normal". Currently, we have #define OVL_MAX_STACK 500 and being able to create an overlay with many layers was cited as one of the requirements to drive the move to new mount api: https://lore.kernel.org/linux-unionfs/20230605-fs-overlayfs-mount_api-v3-0-730d9646b27d@xxxxxxxxxx/ I am not really familiar with the user workloads that need that many layers and why. It's obviously not the common case. Thanks, Amir.