On Mon, 2022-10-31 at 13:13 -0600, Jens Axboe wrote: > On 10/31/22 10:44 AM, Dylan Yudaken wrote: > > On Mon, 2022-10-31 at 10:02 -0600, Jens Axboe wrote: > > > On 10/31/22 7:41 AM, Dylan Yudaken wrote: > > > > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c > > > > index 8d0d40713a63..40b37899e943 100644 > > > > --- a/io_uring/rsrc.c > > > > +++ b/io_uring/rsrc.c > > > > @@ -248,12 +248,20 @@ static unsigned int > > > > io_rsrc_retarget_table(struct io_ring_ctx *ctx, > > > > return refs; > > > > } > > > > > > > > -static void io_rsrc_retarget_schedule(struct io_ring_ctx *ctx) > > > > +static void io_rsrc_retarget_schedule(struct io_ring_ctx *ctx, > > > > bool delay) > > > > __must_hold(&ctx->uring_lock) > > > > { > > > > - percpu_ref_get(&ctx->refs); > > > > - mod_delayed_work(system_wq, &ctx->rsrc_retarget_work, > > > > 60 * > > > > HZ); > > > > - ctx->rsrc_retarget_scheduled = true; > > > > + unsigned long del; > > > > + > > > > + if (delay) > > > > + del = 60 * HZ; > > > > + else > > > > + del = 0; > > > > + > > > > + if (likely(!mod_delayed_work(system_wq, &ctx- > > > > > rsrc_retarget_work, del))) { > > > > + percpu_ref_get(&ctx->refs); > > > > + ctx->rsrc_retarget_scheduled = true; > > > > + } > > > > } > > > > > > What happens for del == 0 and the work running ala: > > > > > > CPU 0 CPU 1 > > > mod_delayed_work(.., 0); > > > delayed_work runs > > > put ctx > > > percpu_ref_get(ctx) > > > > The work takes the lock before put(ctx), and CPU 0 only releases > > the > > lock after calling get(ctx) so it should be ok. > > But io_ring_ctx_ref_free() would've run at that point? Maybe I'm > missing something... > > In any case, would be saner to always grab that ref first. Or at > least have a proper comment as to why it's safe, because it looks > iffy. I think I misunderstood - assuming a ref was already taken higher up the stack. That is not the case, and in fact in the _exiting() calls it is not really valid to take the reference as it may have already hit zero. Instead we can use the cancel_delayed_work in exiting (no need to retarget rsrc nodes at this point) and it makes things a bit cleaner. I'll update in v2. > > > > Also I think that likely() needs to get dropped. > > > > > > > It's not a big thing, but the only time it will be enqueued is on > > ring > > shutdown if there is an outstanding enqueue. Other times it will > > not > > get double enqueued as it is protected by the _scheduled bool (this > > is > > important or else it will continually push back by 1 period and > > maybe > > never run) > > We've already called into this function, don't think it's worth a > likely. Same for most of the others added in this series, imho they > only really make sense for a very hot path where that branch is > inline. Will remove it