On Fri 08-06-18 13:34:04, Linus Torvalds wrote: > On Fri, Jun 8, 2018 at 6:27 AM Jan Kara <jack@xxxxxxx> wrote: > > > > The alignment could be fixed by having > > > > struct inode { > > ... > > struct fsnotify_obj i_fsnotify __attribute__ ((aligned(sizeof(void *)))); > > ... > > } > > > > but that's too ugly even for my twisted taste. And before someone asks, > > adding aligned attribute to the definition of fsnotify_obj structure makes > > it grow the padding at the end again. > > Yeah, I don't think there's any way to say "this can't be an array > member, don't pad the size out to the alignment". > > I do wonder if perhaps "struct fsnotify_obj" could actually have a > mask and a 32-bit object ID instead, which would (a) avoid the > alignment issue and (b) actually shrink the structure onm 64-bit. > > Obviously you'd then have to look up the pointer from the object ID > (presumably using a hash, but maybe it would use the radix tree of > idr_alloc() or something). > > I haven't looked at how often those pointers actually get looked up. > If the 'mask' makes it fairly rare, then the extra indirection might > be ok. I suspect it's not. But I thought I'd mention it as a possible > approach. The most critical is the mask check in fs/notify/fsnotify.c: fsnotify(). There the mask matches whenever an event should be delivered to someone watching inode / vfsmount. And we already know someone is watching due to previous checks in there, we just don't know whether they are interested in what is currently happening. How frequent is the mask match is difficult to tell - it heavily depends on the way fsnotify is used on the system - and there are definitely heavy fsnotify users where I suspect the additional idr / hash lookup for each event would be visible. So I suppose we are better off with a simple approach of grabbing srcu_read_lock and dereference to access the mask. That is going to visibly hit the cases where someone is watching the inode / vfsmount but not for event currently happening (Amir already referenced the commit 7c49b8616460e "fs/notify: optimize inotify/fsnotify code for unwatched files" which gives an idea of the impact). Not great but the least of evils I suppose. > The other approach is - as you say - to move 'mask' to be behind the > pointer. But if that causes a lot of extra pointer chasing that mask > would normally avoid, that could be really expensive too. We see that > with the security objects: the cost of looking up whatever is behind > i_security is really nasty, because every inode gets allocated a > private security structure, so it's absolutely horrendous for cache > effects (no sharing). > > But I suspect that the behavior of i_fsnotify_marks is very different > from i_security, in that most inodes probably don't have marks at all. > No? Yes? So you might not have the nasty case of "almost all inode > access ends up following that pointer and it's horrible in the cache". Yes, I don't expect most inodes to have any marks attached. Directories tend to be more likely to have them but still usually only a small fraction of them. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR