On Wed, Jul 15, 2020 at 07:03:49AM -0700, Palmer Dabbelt wrote: > On Wed, 15 Jul 2020 03:42:46 PDT (-0700), Will Deacon wrote: > > Hmm. Although I _think_ something like the diff below ought to work, are you > > sure you want to be doing MMIO writes in preemptible context? Setting > > '.disable_locking = true' in 'sifive_gpio_regmap_config' implies to me that > > you should be handling the locking within the driver itself, and all the > > other regmap writes are protected by '&gc->bgpio_lock'. > > I guess my goal here was to avoid fixing the drivers: it's one thing if it's > just broken SiFive drivers, as they're all a bit crusty, but this is blowing up > for me in the 8250 driver on QEMU as well. At that point I figured there'd be > an endless stream of bugs around this and I'd rather just. Right, and my patch should solve that. > > Given that riscv is one of the few architectures needing an implementation > > of mmiowb(), doing MMIO in a preemptible section seems especially dangerous > > as you have no way to ensure completion of the writes without adding an > > mmiowb() to the CPU migration path (i.e. context switch). > > I was going to just stick one in our context switching code unconditionally. > While we could go track cumulative writes outside the locks, the mmiowb is > essentially free for us because the one RISC-V implementation treats all fences > the same way so the subsequent store_release would hold all this up anyway. > > I think the right thing to do is to add some sort of arch hook right about here > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index cfd71d61aa3c..14b4f8b7433f 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3212,6 +3212,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) > prev_state = prev->state; > vtime_task_switch(prev); > perf_event_task_sched_in(prev, current); > + finish_arch_pre_release(prev); > finish_task(prev); > finish_lock_switch(rq); > finish_arch_post_lock_switch(); > > but I was just going to stick it in switch_to for now... :). I guess we could > also roll the fence up into yet another one-off primitive for the scheduler, > something like What does the above get you over switch_to()? > > diff --git a/include/asm-generic/mmiowb.h b/include/asm-generic/mmiowb.h > > index 9439ff037b2d..5698fca3bf56 100644 > > --- a/include/asm-generic/mmiowb.h > > +++ b/include/asm-generic/mmiowb.h > > @@ -27,7 +27,7 @@ > > #include <asm/smp.h> > > > > DECLARE_PER_CPU(struct mmiowb_state, __mmiowb_state); > > -#define __mmiowb_state() this_cpu_ptr(&__mmiowb_state) > > +#define __mmiowb_state() raw_cpu_ptr(&__mmiowb_state) > > #else > > #define __mmiowb_state() arch_mmiowb_state() > > #endif /* arch_mmiowb_state */ > > @@ -35,7 +35,9 @@ DECLARE_PER_CPU(struct mmiowb_state, __mmiowb_state); > > static inline void mmiowb_set_pending(void) > > { > > struct mmiowb_state *ms = __mmiowb_state(); > > - ms->mmiowb_pending = ms->nesting_count; > > + > > + if (likely(ms->nesting_count)) > > + ms->mmiowb_pending = ms->nesting_count; > > Ya, that's one of the earlier ideas I had, but I decided it doesn't actually do > anything: if we're scheduleable then we know that pending and count are zero, > thus the check isn't necessary. It made sense late last night and still does > this morning, but I haven't had my coffee yet. What it does is prevent preemptible writeX() from trashing the state on another CPU, so I think it's a valid fix. I agree that it doesn't help you if you need mmiowb(), but then that _really_ should only be needed if you're holding a spinlock. If you're doing concurrent lockless MMIO you deserve all the pain you get. I don't get why you think the patch does nothing, as it will operate as expected if writeX() is called with preemption disabled, which is the common case. > I'm kind of tempted to just declare "mmiowb() is fast on RISC-V, so let's do it > unconditionally everywhere it's necessary". IIRC that's essentially true on > the existing implementation, as it'll get rolled up to any upcoming fence > anyway. It seems like building any real machine that relies on the orderings > provided by mmiowb is going to have an infinate rabbit hole of bugs anyway, so > in that case we'd just rely on the hardware to elide the now unnecessary fences > so we'd just be throwing static code size at this wacky memory model and then > forgetting about it. If you can do that, that's obviously the best approach. > I'm going to send out a patch set that does all the work I think is necessary > to avoid fixing up the various drivers, with the accounting code to avoid > mmiowbs all over our port. I'm not sure I'm going to like it, but I guess we > can argue as to exactly how ugly it is :) Ok. Will