Re: [RFC PATCH v1 1/5] locking/atomic: Implement atomic_fetch_and_or

hev <r@xxxxxx> · Thu, 29 Jul 2021 18:18:28 +0800

Hi, Will,

On Thu, Jul 29, 2021 at 5:39 PM Will Deacon <will@xxxxxxxxxx> wrote:
>
> On Wed, Jul 28, 2021 at 07:48:22PM +0800, Rui Wang wrote:
> > From: wangrui <wangrui@xxxxxxxxxxx>
> >
> > This patch introduce a new atomic primitive 'and_or', It may be have three
> > types of implemeations:
> >
> >  * The generic implementation is based on arch_cmpxchg.
> >  * The hardware supports atomic 'and_or' of single instruction.
>
> Do any architectures actually support this instruction?
No, I'm not sure now.

>
> On arm64, we can clear arbitrary bits and we can set arbitrary bits, but we
> can't combine the two in a fashion which provides atomicity and
> forward-progress guarantees.
>
> Please can you explain how this new primitive will be used, in case there's
> an alternative way of doing it which maps better to what CPUs can actually
> do?
I think we can easily exchange arbitrary bits of a machine word with atomic
andnot_or/and_or. Otherwise, we can only use xchg8/16 to do it. It depends on
hardware support, and the key point is that the bits to be exchanged
must be in the
same sub-word. qspinlock adjusted memory layout for this reason, and waste some
bits(_Q_PENDING_BITS == 8).

In the case of qspinlock xchg_tail, I think there is no change in the
assembly code
after switching to atomic andnot_or, for the architecture that
supports CAS instructions.
But for LL/SC style architectures, We can implement xchg for sub-word
better with new
primitive and clear[1]. And in fact, it reduces the number of retries
when the two memory
load values are not equal.

If the hardware supports this atomic semantics, we will get better
performance and flexibility.
I think the hardware is easy to support.

[1] https://github.com/heiher/linux/commit/f77e1c6e4e579543177010bef2b394479c50b6cf

Regards
Rui

>
> Cheers,
>
> Will