On 12/10/20 7:35 PM, Al Viro wrote: > On Thu, Dec 10, 2020 at 01:01:13PM -0700, Jens Axboe wrote: >> io_uring always punts opens to async context, since there's no control >> over whether the lookup blocks or not. Add LOOKUP_NONBLOCK to support >> just doing the fast RCU based lookups, which we know will not block. If >> we can do a cached path resolution of the filename, then we don't have >> to always punt lookups for a worker. >> >> During path resolution, we always do LOOKUP_RCU first. If that fails and >> we terminate LOOKUP_RCU, then fail a LOOKUP_NONBLOCK attempt as well. > > In effect you are adding a mode where > * unlazy would fail, except when done from complete_walk() > * ->d_revalidate() wouldn't be attempted at all (not even with LOOKUP_RCU) > * ... but ->get_link() in RCU mode would > * ... and so would everything done after complete_walk() in > do_open(), very much including the joys like mnt_want_write() (i.e. waiting for > frozen fs to thaw), handling O_TRUNC, calling ->open() itself... > > So this "not punting lookups for a worker" looks fishy as hell - if you care > about blocking operations, you haven't really won anything. > > And why exactly is the RCU case of ->d_revalidate() worth buggering off (it > really can't block - it's called under rcu_read_lock() and it does *not* > drop it)? > > _IF_ for some theoretical exercise you want to do "lookup without dropping > out of RCU", just add a flag that has unlazy_walk() fail. With -ECHILD. > Strip it away in complete_walk() and have path_init() with that flag > and without LOOKUP_RCU fail with -EAGAIN. All there is to it. Thanks Al, that makes for an easier implementation. I like that suggestion, boils it down to just three hunks (see below). For io_uring, the concept is just to perform the fast path inline. The RCU lookup serves that purpose nicely - if we fail that, then it's expected to take the latency hit of going async. > It still leaves you with fuckloads of blocking operations (and that's > "blocking" with "until admin thaws the damn filesystem several hours > down the road") after complete_walk(), though. But that's true (and expected) for any open that isn't non-blocking. diff --git a/fs/namei.c b/fs/namei.c index d7952f863e79..d49c72e34c6e 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -686,6 +686,8 @@ static bool unlazy_walk(struct nameidata *nd) BUG_ON(!(nd->flags & LOOKUP_RCU)); nd->flags &= ~LOOKUP_RCU; + if (nd->flags & LOOKUP_NONBLOCK) + goto out1; if (unlikely(!legitimize_links(nd))) goto out1; if (unlikely(!legitimize_path(nd, &nd->path, nd->seq))) @@ -792,6 +794,7 @@ static int complete_walk(struct nameidata *nd) */ if (!(nd->flags & (LOOKUP_ROOT | LOOKUP_IS_SCOPED))) nd->root.mnt = NULL; + nd->flags &= ~LOOKUP_NONBLOCK; if (unlikely(unlazy_walk(nd))) return -ECHILD; } @@ -2209,6 +2212,9 @@ static const char *path_init(struct nameidata *nd, unsigned flags) if (!*s) flags &= ~LOOKUP_RCU; + /* LOOKUP_NONBLOCK requires RCU, ask caller to retry */ + if ((flags & (LOOKUP_RCU | LOOKUP_NONBLOCK)) == LOOKUP_NONBLOCK) + return ERR_PTR(-EAGAIN); if (flags & LOOKUP_RCU) rcu_read_lock(); -- Jens Axboe