Re: SRCU question

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Nov 15, 2020 at 12:20:17PM -0800, Paul E. McKenney wrote:
> On Sun, Nov 15, 2020 at 12:11:24PM -0800, Paul E. McKenney wrote:
> > On Thu, Nov 12, 2020 at 03:15:47PM -0500, Kent Overstreet wrote:
> > > Hi to Paul & the rcu mailing list,
> > > 
> > > I've got a problem I'm trying to figure out if I can adapt SRCU for.
> > > 
> > > Within bcachefs, currently struct btree obects, and now also btree_cached_key,
> > > aren't freed until the filesystem is torn down, because btree iterators contain
> > > pointers to them and will drop and retake locks (iff a sequence number hasn't
> > > changed) without holding a ref. With the btree key cache code, this is now
> > > something I need to fix.
> > > 
> > > What I plan on doing is having struct btree_trans (container for btree
> > > iterators) hold an srcu read lock while it's alive. But, I don't want to just
> > > use call_srcu to free the btree/btree_cached_key objects, because btree trans
> > > objects can at times be fairly long lived and the existing code can reuse these
> > > objects for other btree nodes/btree cached keys immediately. Freeing them with
> > > call_srcu() would break that; I could have my callback function check if the
> > > object has been reused before freeing, but I'd still have a problem when the
> > > object gets freed a second time before the first call_scru() has finished.
> > > 
> > > What I'm wondering is if the SRCU code has a well defined notion of a clock that
> > > I could make use of. What I would like to do is, instead of doing call_srcu() to
> > > free the object - just mark that object with the current SRCU context time, and
> > > then when my shrinkers run they can free objects that haven't been reused and
> > > are old enough according the the current SRCU time.
> > > 
> > > Thoughts?
> > 
> > An early prototype is available on -rcu [1].  The Tree SRCU version
> > seems to be reasonably reliable, but I do not yet trust the Tiny SRCU
> > implementation.  So please avoid !SMP&&!PREEMPT if you would like to
> > give it a try.  Unless you -really- like helping me find bugs, in which
> > case full speed ahead!!!
> > 
> > Here is the API:
> > 
> > unsigned long start_poll_synchronize_srcu(struct srcu_struct *ssp)
> > 
> > 	Returns a "cookie" that can be thought of as a snapshot of your
> > 	"clock" above.	(SRCU calls it a "grace-period sequence number".)
> > 	Also ensures that enough future grace periods happen to eventually
> > 	make the grace-period sequence number reach the cookie.
> > 
> > bool poll_state_synchronize_srcu(struct srcu_struct *ssp, unsigned long cookie)
> > 
> > 	Given a cookie from start_poll_synchronize_srcu(), returns true if
> > 	at least one full SRCU grace period has elapsed in the meantime.
> > 	Given finite SRCU readers in a well-behaved kernel, the following
> > 	code will complete in finite time:
> > 
> > 		cookie = start_poll_synchronize_srcu(&my_srcu);
> > 		while (!poll_state_synchronize_srcu(&my_srcu, cookie))
> > 			schedule_timeout_uninterruptible(1);
> > 
> > unsigned long get_state_synchronize_srcu(struct srcu_struct *ssp)
> > 
> > 	Like start_poll_synchronize_srcu(), except that it does not start
> > 	any grace periods.  This means that the following code is -not-
> > 	guaranteed to complete:
> > 
> > 		cookie = get_state_synchronize_srcu(&my_srcu);
> > 		while (!poll_state_synchronize_srcu(&my_srcu, cookie))
> > 			schedule_timeout_uninterruptible(1);
> > 
> > 	Use this if you know that something else will be starting the
> > 	needed SRCU grace periods.  This might also be useful if you
> > 	had items that were likely to be reused before the SRCU grace
> > 	period elapsed, so that you avoid burning CPU on SRCU grace
> > 	periods that prove to be unnecessary.  Or if you don't want
> > 	to have more than (say) 100 SRCU grace periods per seconds,
> > 	in which case you might use a timer to start the grace periods.
> > 	Or maybe you don't bother starting the SRCU grace period until
> > 	some sort of emergency situation has arisen.  Or...
> > 
> > 	OK, maybe you don't need it, but I do need it for rcutorture,
> > 	so here it is anyway.
> > 
> > All of these can be invoked anywhere that call_srcu() can be invoked.
> > 
> > Does this look like it will work for you?

Yeah. My one quibble is, instead of having to call poll_state_synchronize_srcu()
on every cookie - would it be possible to get function that returns a cookie we
can compare against (i.e. with time_after())? I'm going to be use this in the
shrinker where we have to walk and check potentially tens of thousands of
objects.

> Oh, and due to historical inertia, Tiny SRCU's grace-period sequence
> number is only 16 bits.  I can change this easily, but I need to know
> that it is a real problem for you before I can do so.
> 
> The potential problem for you is that if you let a given cookie lie
> dormant for 16384 grace periods, it will take another 16385 grace
> periods for get_state_synchronize_srcu() to say that a grace period
> has elapsed.
> 
> In contrast, Tree SRCU's grace-period sequence number is either 32 bits
> or 64 bits, depending on the size of unsized long.

It's not something I'd lose sleep over, but I think it could be. If there isn't
memory pressure, then the shrinker won't be running and we won't be freeing the
objects with the oldest cookies, but freeing them internally will be creating
new grace periods - but if I make sure we're reusing objects in LIFO order would
also work against the shrinker actually being able to free any objects, so not
sure I want to do that...



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux