On Mon, 3 Jul 2023 10:13:22 +0000 Ajay Kaher <akaher@xxxxxxxxxx> wrote: > >> +/** > >> + * eventfs_down_write - acquire write lock function > >> + * @eventfs_rwsem: a pointer to rw_semaphore > >> + * > >> + * helper function to perform write lock on eventfs_rwsem > >> + */ > >> +static void eventfs_down_write(struct rw_semaphore *eventfs_rwsem) > >> +{ > >> + while (!down_write_trylock(eventfs_rwsem)) > >> + msleep(10); > > > > What's this loop for? Something like that needs a very good explanation > > in a comment. Loops like these are usually a sign of a workaround for a > > bug in the design, or worse, simply hides an existing bug. > > > > Yes correct, this logic is to solve deadlock: > > Thread 1 Thread 2 > down_read_nested() - read lock acquired > down_write() - waiting for write lock to acquire > down_read_nested() - deadlock > > Deadlock is because rwlock wouldn’t allow read lock to be acquired if write lock is waiting. > down_write_trylock() wouldn’t add the write lock in waiting queue, hence helps to prevent > deadlock scenario. > > I was stuck with this Deadlock, tried few methods and finally borrowed from cifs, as it’s > upstreamed, tested and working in cifs, please refer: > https://elixir.bootlin.com/linux/v6.3.1/source/fs/cifs/file.c#L438 I just looked at that code and the commit, and I honestly believe that is a horrible hack, and very fragile. It's in the smb code, so it was unlikely reviewed by anyone outside that subsystem. I really do not want to prolificate that solution around the kernel. We need to come up with something else. I also think it's buggy (yes the cifs code is buggy!) because in the comment above the down_read_nested() it says: /* * nested locking. NOTE: rwsems are not allowed to recurse * (which occurs if the same task tries to acquire the same * lock instance multiple times), but multiple locks of the * same lock class might be taken, if the order of the locks * is always the same. This ordering rule can be expressed * to lockdep via the _nested() APIs, but enumerating the * subclasses that are used. (If the nesting relationship is * static then another method for expressing nested locking is * the explicit definition of lock class keys and the use of * lockdep_set_class() at lock initialization time. * See Documentation/locking/lockdep-design.rst for more details.) */ So this is NOT a solution (and the cifs code should be fixed too!) Can you show me the exact backtrace where the reader lock gets taken again? We will have to come up with a way to not take the same lock twice. We can also look to see if we can implement this with RCU. What exactly is this rwsem protecting? > > Looking further for your input. I will add explanation in v4. > > > >> +} > >> + [..] > >> + * > >> + * This function creates the top of the trace event directory. > >> + */ > >> +struct dentry *eventfs_create_events_dir(const char *name, > >> + struct dentry *parent, > >> + struct rw_semaphore *eventfs_rwsem) > > > > OK, I'm going to have to really look at this. Passing in a lock to the > > API is just broken. We need to find a way to solve this another way. > > eventfs_rwsem is a member of struct trace_array, I guess we should > pass pointer to trace_array. No, it should not be part of the trace_array. If we can't do this with RCU, then we need to add a descriptor that contains the dentry that is returned above, and have the lock held there. The caller of the eventfs_create_events_dir() should not care about locking. That's an implementation detail that should *not* be part of the API. That is, if you need a lock: struct eventfs_dentry { struct dentry *dentry; struct rwsem *rwsem; }; And then get to that lock by using the container_of() macro. All created eventfs dentry's could have this structure, where the rwsem points to the top one. Again, that's only if we can't do this with RCU. -- Steve > > > > I'm about to board a plane to JFK shortly, I'm hoping to play with this > > while flying back. > > > > I have replied for major concerns. All other minor I will take care in v4. > > Thanks a lot for giving time to eventfs patches. > > - Ajay >