Re: srcu_cleanup warning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jun 08, 2024 at 08:25:53PM -0700, Paul E. McKenney wrote:
> There is a grace period in progress ("read state: 1") and that grace
> period is the last one that has been requested ("gp state: 573/576").
> 
> Had there been callbacks pending, there would have been a warning from
> "if (WARN_ON(rcu_segcblist_n_cbs(&sdp->srcu_cblist)))", so srcu_barrier()
> having no effect is expected behavior.  Which also suggests that the
> unfinished grace period was started by start_poll_synchronize_srcu().

I'm surprised that srcu_barrier() has no effect; I would have exppected
the underlying machinery to be the same for explicit callbacks/barriers
as well as polling, so I think I'm missing something.

So I think there's something I'm missing; it sounds like something's not
getting kicked, and if you say srcu_barrier() is expected to have no
effect than that seems to imply there's something else I should be
calling?

> Could you please try something like this just before the call to
> cleanup_srcu_struct()?
> 
> 	WARN_ON_ONCE(poll_state_synchronize_srcu(&c->btree_trans_barrier, ck->btree_trans_barrier_seq);

Added, I'll check the results in the morning but they'll be here:
https://evilpiepirate.org/~testdashboard/ci?branch=bcachefs-testing

> 
> If there is some chance that start_poll_synchronize_srcu() was never
> ever invoked, this check will of course need some additional help.

start_poll_synchronize_srcu() is the only thing that version of my code
uses.

> I am curious about your use of ULONG_CMP_GE() on return values from
> different calls to start_poll_synchronize_srcu(), but that is not urgent.

The freelists are intended to in the order in which they can be
reclaimed - is that not actually a sequence number?

I'm actually in the process of redoing (and simplifying) that code.
Basically, the code is supposed to be tracking objects pending freeing
in exactly the same manner as which RCU tracks pending callbacks -
except that by doing it ourself we can allocate from those pending lists
and not be hosed if reclaim is delayed because of an srcu lock held too
long.

As an aside - I've been considering ripping that out and just freeing
objects via call_srcu(), it would definitely simplify things, but some
workloads cycle through a _lot_ of these objects and memory reclaim
stalling is a real concern. And after I redo it, it should be if
anything slightly more efficient than freeing objects via call_srcu()
like normal (elimination of indirect function calls), so perhaps a
technique we'll want to keep in mind.




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux