Re: The rdp->gpwrap behavior

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 13, 2025 at 01:30:33PM -0500, Joel Fernandes wrote:
> +rcu for archives
> 
> Hi,
> 
> Following up about our gpwrap discussions, I did some tracing of rdp->gpwrap
> with just boot testing.
> 
> I actually never see it set because rdp->gp_seq and rnp->gp_seq are usually
> very close.
> 
> Example trace:
> 
> # rnp->gp_seq starts with -1200 on boot and then right around the wrap:
> 
> 178.096329: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap before: 0
> 178.096330: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: -3 (0xfffffffffffffffd), rdp->gp_seq: -3 (0xfffffffffffffffd), wrap after: 0
> 178.100327: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap before: 0
> 178.100328: rcu_gpnum_ovf: CPU: 2, rnp->gp_seq: 1 (0x1), rdp->gp_seq: 1 (0x1), wrap after: 0
> 
> The wrap "before" and "after" are the value of gpwrap before and after
> calling rcu_gpnum_ovf().
> 
> Closer look at the _ovf() shows it should be only set if rdp->gp_seq and
> rnp->gp_seq are quite a distance apart (say if the CPU was idle for too long
> and many GPs have passed. This can happen both with/without wrap. So imminent
> wrapping seems not necessary for ->gpwrap to be even set AFAICS.
> 
> I think the problem is ULONG_CMP_LT is not really "<" so even after wrap, it
> can false. i.e if rdp->gpseq + ULONG/4 wraps, it could still return false.

Exactly, and all of that is in fact the intended behavior.

> >From a testing POV, the example shows it is not set even when around when a
> wrap actually happened.  So may be, we are not testing gpwrap much, or at
> all, with rcutorture?

True enough, and that would be well worth fixing.

One simple (and maybe too simple) fix is to add a Kconfig option to
kernel/rcu/Kconfig.debug that makes rcu_gpnum_ovf() use something like
(32 << RCU_SEQ_CTR_SHIFT).  This, or some similar value, should cause
rcutorture's stuttering and/or shuffling to trigger setting of ->gpwrap
within a few minutes.

And I would not be surprised if doing this located bugs.

> But before that, I am feeling it is not behaving correctly. I'd expect it to
> be set even if rnp and rdp values are close but we are actually about to
> wrap.  So most likely I am missing something.

If I understand correctly, you would like ->gpwrap to be set
unconditionally at various points in the cycle, for example, at zero,
ULONG_MAX/4, ULONG_MAX/2, and 3*ULONG_MAX/4.  But this could easily
have the effect of discarding each CPU's quiescent states that showed
up before the first call to rcu_watching_snap_recheck(), even if that
CPU was active.  Our users might not consider this to be a friendly act.

In contrast, the current setup only discards quiescent states for CPUs
that just came back from idle prior to call to rcu_watching_snap_save().

In addition, use of the aforementioned Kconfig option has the benefit of
making any ->gpwrap-related bugs appear during testing, not in production.
On the same grounds, one could argue that absence of this Kconfig option
should also cause gp_seq to zero rather than the current;

	.gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT,

One argument against is that production 32-bit systems can wrap ->gp_seq
in a relatively short time.  Maybe moreso for ->expedited_sequence, so
perhaps leave ->gp_seq and modify ->expedited_sequence.  Maybe similar
decisions for others as well.  ;-)

							Thanx, Paul




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux