Hi, On Sat, Jul 31, 2021 at 2:40 AM Waiman Long <llong@xxxxxxxxxx> wrote: > > On 7/29/21 6:18 AM, hev wrote: > > Hi, Will, > > > > On Thu, Jul 29, 2021 at 5:39 PM Will Deacon <will@xxxxxxxxxx> wrote: > >> On Wed, Jul 28, 2021 at 07:48:22PM +0800, Rui Wang wrote: > >>> From: wangrui <wangrui@xxxxxxxxxxx> > >>> > >>> This patch introduce a new atomic primitive 'and_or', It may be have three > >>> types of implemeations: > >>> > >>> * The generic implementation is based on arch_cmpxchg. > >>> * The hardware supports atomic 'and_or' of single instruction. > >> Do any architectures actually support this instruction? > > No, I'm not sure now. > > > >> On arm64, we can clear arbitrary bits and we can set arbitrary bits, but we > >> can't combine the two in a fashion which provides atomicity and > >> forward-progress guarantees. > >> > >> Please can you explain how this new primitive will be used, in case there's > >> an alternative way of doing it which maps better to what CPUs can actually > >> do? > > I think we can easily exchange arbitrary bits of a machine word with atomic > > andnot_or/and_or. Otherwise, we can only use xchg8/16 to do it. It depends on > > hardware support, and the key point is that the bits to be exchanged > > must be in the > > same sub-word. qspinlock adjusted memory layout for this reason, and waste some > > bits(_Q_PENDING_BITS == 8). > > It is not actually a waste of bits. With _Q_PENDING_BITS==8, more > optimized code can be used for pending bit processing. It is only in the > rare case that NR_CPUS >= 16k - 1 that we have to fall back to > _Q_PENDING_BITS==1. In fact, that should be the only condition that will > make _Q_PENDING_BITS=1. Yes, you are right. The memory layout is adjusted so that locked/pending and tail do ont share bits, so normal store instructions can be used to clear_pending and clear_pending_set_locked. It's faster than atomic. Regards, Rui