Re: [RFC PATCH v1 1/5] locking/atomic: Implement atomic_fetch_and_or

hev <r@xxxxxx> · Sat, 31 Jul 2021 09:46:35 +0800

Hi,

On Sat, Jul 31, 2021 at 2:40 AM Waiman Long <llong@xxxxxxxxxx> wrote:
>
> On 7/29/21 6:18 AM, hev wrote:
> > Hi, Will,
> >
> > On Thu, Jul 29, 2021 at 5:39 PM Will Deacon <will@xxxxxxxxxx> wrote:
> >> On Wed, Jul 28, 2021 at 07:48:22PM +0800, Rui Wang wrote:
> >>> From: wangrui <wangrui@xxxxxxxxxxx>
> >>>
> >>> This patch introduce a new atomic primitive 'and_or', It may be have three
> >>> types of implemeations:
> >>>
> >>>   * The generic implementation is based on arch_cmpxchg.
> >>>   * The hardware supports atomic 'and_or' of single instruction.
> >> Do any architectures actually support this instruction?
> > No, I'm not sure now.
> >
> >> On arm64, we can clear arbitrary bits and we can set arbitrary bits, but we
> >> can't combine the two in a fashion which provides atomicity and
> >> forward-progress guarantees.
> >>
> >> Please can you explain how this new primitive will be used, in case there's
> >> an alternative way of doing it which maps better to what CPUs can actually
> >> do?
> > I think we can easily exchange arbitrary bits of a machine word with atomic
> > andnot_or/and_or. Otherwise, we can only use xchg8/16 to do it. It depends on
> > hardware support, and the key point is that the bits to be exchanged
> > must be in the
> > same sub-word. qspinlock adjusted memory layout for this reason, and waste some
> > bits(_Q_PENDING_BITS == 8).
>
> It is not actually a waste of bits. With _Q_PENDING_BITS==8, more
> optimized code can be used for pending bit processing. It is only in the
> rare case that NR_CPUS >= 16k - 1 that we have to fall back to
> _Q_PENDING_BITS==1. In fact, that should be the only condition that will
> make _Q_PENDING_BITS=1.

Yes, you are right. The memory layout is adjusted so that
locked/pending and tail do ont share bits, so normal store
instructions can be used to clear_pending and
clear_pending_set_locked. It's faster than atomic.

Regards,
Rui