Re: RFC on writel and writel_relaxed

Oliver <oohall@xxxxxxxxx> · Thu, 22 Mar 2018 21:15:04 +1100

On Thu, Mar 22, 2018 at 3:24 PM, Benjamin Herrenschmidt
<benh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, 2018-03-21 at 08:53 -0500, Sinan Kaya wrote:
>> writel_relaxed() needs to have ordering guarantees with respect to the order
>> device observes writes.
>
> Correct.
>
>> x86 has compiler barrier inside the relaxed() API so that code does not
>> get reordered. ARM64 architecturally guarantees device writes to be observed
>> in order.
>>
>> I was hoping that PPC could follow x86 and inject compiler barrier into the
>> relaxed functions.
>>
>> BTW, I have no idea what compiler barrier does on PPC and if
>>
>> wrltel() == compiler barrier() + wrltel_relaxed()
>>
>> can be said.
>
> No, it's not sufficient.
>
> Replacing wmb() + writel() with wmb() + writel_relaxed() will work on
> PPC, it will just not give you a benefit today.
>
> The main problem is that the semantics of writel/writel_relaxed (and
> read versions) aren't very well defined in Linux esp. when it comes
> to different memory types (NC, WC, ...).
>
> I've been wanting to implement the relaxed accessors for a while but
> was battling with this to try to also better support WC, and due to
> other commitments, this somewhat fell down the cracks.
>
> Two options I can think of:
>
>  - Just make the _relaxed variants use an eieio instead of a sync, this
> will effectively lift the ordering guarantee vs. cachable storage (and
> thus unlock) and might give a (small) performance improvement.

Wouldn't we still have the unlock ordering due to the io_sync hack or
are you thinking we should remove that too for the relaxed version?

> However,
> we still have the problem that on WC mappings, neither writel nor
> writel_relaxed will effectively allow combining to happen (only raw
> accesses will because on powerpc *all* barriers will break combining).

Hmm, eieio is only architected to affect CI+G (and WT) so it shouldn't
affect combining
on non-guarded memory. Do most implementations apply it to all CI
accesses anyway?

>  - Make writel_relaxed() be a simple store without barriers, and
> readl_relaxed() be "eieio, read, eieio", thus allowing write combining
> to happen between successive writel_relaxed on WC space (no change on
> normal NC space) while maintaining the ordering between relaxed reads
> and writes. The flip side is a (slight) increased overhead of
> readl_relaxed.

Are there many drivers that actually do writeX() on WC space?
memory-barriers.txt
pretty much says that all bets are off and no ordering guarantees can be assumed
when using readX/writeX on prefetchable IO memory. It seems sketchy enough to
give me some pause, but maybe it works fine elsewhere.

Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html