On Tue, Mar 15, 2022 at 10:27 AM Hao Luo <haoluo@xxxxxxxxxx> wrote: > > On Mon, Mar 14, 2022 at 4:12 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > > > > On Mon, Mar 14, 2022 at 10:07:31AM -0700, Hao Luo wrote: > > > Hello Al, > > > > > > In which contexts can those be called? > > > > > > > > > > In a sleepable context. The plan is to introduce a certain tracepoints > > > as sleepable, a program that attaches to sleepable tracepoints is > > > allowed to call these functions. In particular, the first sleepable > > > tracepoint introduced in this patchset is one at the end of > > > cgroup_mkdir(). Do you have any advices? > > > > Yes - don't do it, unless you really want a lot of user-triggerable > > deadlocks. > > > > Pathname resolution is not locking-agnostic. In particular, you can't > > do it if you are under any ->i_rwsem, whether it's shared or exclusive. > > That includes cgroup_mkdir() callchains. And if the pathname passed > > to these functions will have you walk through the parent directory, > > you would get screwed (e.g. if the next component happens to be > > inexistent, triggering a lookup, which takes ->i_rwsem shared). > > I'm thinking of two options, let's see if either can work out: > > Option 1: We can put restrictions on the pathname passed into this > helper. We can explicitly require the parameter dirfd to be in bpffs > (we can verify). In addition, we check pathname to be not containing > any dot or dotdot, so the resolved path will end up inside bpffs, > therefore won't take ->i_rwsem that is in the callchain of > cgroup_mkdir(). > > Option 2: We can avoid pathname resolution entirely. Like above, we > can adjust the semantics of this helper to be: making an immediate > directory under the dirfd passed in. In particular, like above, we can > enforce the dirfd to be in bpffs and pathname to consist of only > alphabet and numbers. With these restrictions, we call vfs_mkdir() to > create directories. > > Being able to mkdir from bpf has useful use cases, let's try to make > it happen even with many limitations. Option 3. delegate vfs_mkdir to a worker and wait in the helper.