On Sat, Jun 9, 2018 at 9:57 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote: > On Fri, Jun 8, 2018 at 11:34 PM, Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >> On Fri, Jun 8, 2018 at 6:27 AM Jan Kara <jack@xxxxxxx> wrote: >>> >>> The alignment could be fixed by having >>> >>> struct inode { >>> ... >>> struct fsnotify_obj i_fsnotify __attribute__ ((aligned(sizeof(void *)))); >>> ... >>> } >>> >>> but that's too ugly even for my twisted taste. And before someone asks, >>> adding aligned attribute to the definition of fsnotify_obj structure makes >>> it grow the padding at the end again. >> >> Yeah, I don't think there's any way to say "this can't be an array >> member, don't pad the size out to the alignment". >> >> I do wonder if perhaps "struct fsnotify_obj" could actually have a >> mask and a 32-bit object ID instead, which would (a) avoid the >> alignment issue and (b) actually shrink the structure onm 64-bit. >> >> Obviously you'd then have to look up the pointer from the object ID >> (presumably using a hash, but maybe it would use the radix tree of >> idr_alloc() or something). >> >> I haven't looked at how often those pointers actually get looked up. >> If the 'mask' makes it fairly rare, then the extra indirection might >> be ok. I suspect it's not. But I thought I'd mention it as a possible >> approach. > > As a matter of fact, the pointer is checked to be non zero before > accessing the mask: > > 7c49b8616460 fs/notify: optimize inotify/fsnotify code for unwatched files > > So the mask is not really needed to optimize away unwatched inodes. > I guess the mask is needed to optimize away use cases like frequent > fsnotify_access() calls on inodes that are watched for modification only, > but not sure that is worth optimizing. > >> >> The other approach is - as you say - to move 'mask' to be behind the >> pointer. But if that causes a lot of extra pointer chasing that mask >> would normally avoid, that could be really expensive too. We see that >> with the security objects: the cost of looking up whatever is behind >> i_security is really nasty, because every inode gets allocated a >> private security structure, so it's absolutely horrendous for cache >> effects (no sharing). > > Besides pointer chasing, there is also srcu_read_lock. > >> >> But I suspect that the behavior of i_fsnotify_marks is very different >> from i_security, in that most inodes probably don't have marks at all. >> No? Yes? So you might not have the nasty case of "almost all inode >> access ends up following that pointer and it's horrible in the cache". >> > > Yes and No :-) > Yes, most inodes don't have marks at all, > but, every fsnotify() call may need to follow mnt->mnt_fsnotify.marks > if mount is watched and there is no mnt->mnt_fsnotify.mask in fsnotify_obj. > > Sure, we don't need to worry about 32bits in struct mount, so we could > have a different attach method for inodes and mounts, but the whole > purpose of this cleanup is to make the code more generic in handing > and attaching of marks to inode/mounts. > > How about if we move mask into connector, but also leave a shadow mask, > in an "inherited" struct i.e.: > > /* struct to embed in objects, which marks can be attached to */ > struct fsnotify_obj { > struct fsnotify_mark_connector __rcu *marks; > }; > > /* struct to embed in objects, which marks can be attached to with > shadow mask */ > struct fsnotify_obj_mask { > struct fsnotify_obj obj; > /* all events this object cares about */ > __u32 mask; > }; > > We embed fsnotify_obj in struct inode and fsnotify_obj_mask in struct mount. > > The only code that needs to test the shadow mask before following into > connector is the optimization code in start of fsnotify() which was not > generalized anyway and where the mask optimization is more important > for mount watches. > No. That's not true, there is also fsnotify_inode_watches_children() and __fsnotify_parent(). But I think the only bits in the mask that really matter for optimizing inode watches are FS_ACCESS and FS_EVENT_ON_CHILD and those could be encoded in lower bits of connector pointer. connector is pointer is always dereferenced via fsnotify_grab_connector(), so it's easy to mask out those bits. > The only code that sets the mask is __fsnotify_recalc_mask(), needs to > set the shadow mask, if conn->type != FSNOTIFY_OBJ_TYPE_INODE. > We already special case FSNOTIFY_OBJ_TYPE_INODE for pinning inode > in several places, so there is nothing new here. > So instead, __fsnotify_recalc_mask() needs to set the bits on connector pointer in FSNOTIFY_OBJ_TYPE_INODE case. I'll try to sketch some patches. Thanks, Amir.