Re: [PATCH] asm-generic/mmiowb: Get cpu in mmiowb_set_pending

Will Deacon <willdeacon@xxxxxxxxxx> · Wed, 15 Jul 2020 15:48:06 +0100

On Wed, Jul 15, 2020 at 07:03:49AM -0700, Palmer Dabbelt wrote:
> On Wed, 15 Jul 2020 03:42:46 PDT (-0700), Will Deacon wrote:
> > Hmm. Although I _think_ something like the diff below ought to work, are you
> > sure you want to be doing MMIO writes in preemptible context? Setting
> > '.disable_locking = true' in 'sifive_gpio_regmap_config' implies to me that
> > you should be handling the locking within the driver itself, and all the
> > other regmap writes are protected by '&gc->bgpio_lock'.
> 
> I guess my goal here was to avoid fixing the drivers: it's one thing if it's
> just broken SiFive drivers, as they're all a bit crusty, but this is blowing up
> for me in the 8250 driver on QEMU as well.  At that point I figured there'd be
> an endless stream of bugs around this and I'd rather just.

Right, and my patch should solve that.

> > Given that riscv is one of the few architectures needing an implementation
> > of mmiowb(), doing MMIO in a preemptible section seems especially dangerous
> > as you have no way to ensure completion of the writes without adding an
> > mmiowb() to the CPU migration path (i.e. context switch).
> 
> I was going to just stick one in our context switching code unconditionally.
> While we could go track cumulative writes outside the locks, the mmiowb is
> essentially free for us because the one RISC-V implementation treats all fences
> the same way so the subsequent store_release would hold all this up anyway.
> 
> I think the right thing to do is to add some sort of arch hook right about here
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index cfd71d61aa3c..14b4f8b7433f 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3212,6 +3212,7 @@ static struct rq *finish_task_switch(struct task_struct *prev)
> 	prev_state = prev->state;
> 	vtime_task_switch(prev);
> 	perf_event_task_sched_in(prev, current);
> +	finish_arch_pre_release(prev);
> 	finish_task(prev);
> 	finish_lock_switch(rq);
> 	finish_arch_post_lock_switch();
> 
> but I was just going to stick it in switch_to for now... :).  I guess we could
> also roll the fence up into yet another one-off primitive for the scheduler,
> something like

What does the above get you over switch_to()?

> > diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h
> > index 9439ff037b2d..5698fca3bf56 100644
> > --- a/include/asm-generic/mmiowb.h
> > +++ b/include/asm-generic/mmiowb.h
> > @@ -27,7 +27,7 @@
> >  #include <asm/smp.h>
> > 
> >  DECLARE_PER_CPU(struct mmiowb_state, __mmiowb_state);
> > -#define __mmiowb_state()       this_cpu_ptr(&__mmiowb_state)
> > +#define __mmiowb_state()       raw_cpu_ptr(&__mmiowb_state)
> >  #else
> >  #define __mmiowb_state()       arch_mmiowb_state()
> >  #endif /* arch_mmiowb_state */
> > @@ -35,7 +35,9 @@ DECLARE_PER_CPU(struct mmiowb_state, __mmiowb_state);
> >  static inline void mmiowb_set_pending(void)
> >  {
> >         struct mmiowb_state *ms = __mmiowb_state();
> > -       ms->mmiowb_pending = ms->nesting_count;
> > +
> > +       if (likely(ms->nesting_count))
> > +               ms->mmiowb_pending = ms->nesting_count;
> 
> Ya, that's one of the earlier ideas I had, but I decided it doesn't actually do
> anything: if we're scheduleable then we know that pending and count are zero,
> thus the check isn't necessary.  It made sense late last night and still does
> this morning, but I haven't had my coffee yet.

What it does is prevent preemptible writeX() from trashing the state on
another CPU, so I think it's a valid fix. I agree that it doesn't help
you if you need mmiowb(), but then that _really_ should only be needed if
you're holding a spinlock. If you're doing concurrent lockless MMIO you
deserve all the pain you get.

I don't get why you think the patch does nothing, as it will operate as
expected if writeX() is called with preemption disabled, which is the common
case.

> I'm kind of tempted to just declare "mmiowb() is fast on RISC-V, so let's do it
> unconditionally everywhere it's necessary".  IIRC that's essentially true on
> the existing implementation, as it'll get rolled up to any upcoming fence
> anyway.  It seems like building any real machine that relies on the orderings
> provided by mmiowb is going to have an infinate rabbit hole of bugs anyway, so
> in that case we'd just rely on the hardware to elide the now unnecessary fences
> so we'd just be throwing static code size at this wacky memory model and then
> forgetting about it.

If you can do that, that's obviously the best approach.

> I'm going to send out a patch set that does all the work I think is necessary
> to avoid fixing up the various drivers, with the accounting code to avoid
> mmiowbs all over our port.  I'm not sure I'm going to like it, but I guess we
> can argue as to exactly how ugly it is :)

Ok.

Will