Kent Overstreet <koverstreet@xxxxxxxxxx> writes: > On Wed, May 15, 2013 at 10:37:20AM -0700, Tejun Heo wrote: >> Can you please expand it on a bit and, more importantly, describe in >> what limits, it's safe? This should be safe as long as the actual sum >> of refcnts given out doesn't overflow the original type, right? > > Precisely. > >> It'd be great if that is explained clearly in more intuitive way. The >> only actual explanation above is "modular arithmatic is commutative" >> which is a very compact way to put it and I really think it deserves >> an easier explanation. > > I'm not sure I know of any good way of explaining it intuitively, but > here's this at least... > > * (More precisely: because moduler arithmatic is commutative the sum of all the > * pcpu_count vars will be equal to what it would have been if all the gets and > * puts were done to a single integer, even if some of the percpu integers > * overflow or underflow). This seems intuitively obvious, so I wouldn't sweat it too much. What goes up, has to come down somewhere. >> > > Are we sure this is enough? 1<<31 is a fairly large number but it's >> > > just easy enough to breach from time to time and it's gonna be hellish >> > > to reproduce / debug when it actually overflows. Maybe we want >> > > atomic64_t w/ 1LLU << 63 bias? Or is there something else which >> > > guarantees that the bias can't over/underflow? >> > >> > Well, it has the effect of halving the usable range of the refcount, >> > which I think is probably ok - the thing is, the range of an atomic_t >> > doesn't really correspond to anything useful on 64 bit machines so if >> > you're concerned about overflow you probably need to be using an >> > atomic_long_t. That is, if 32 bits is big enough 31 bits probably is >> > too. >> >> I'm not worrying about the total refcnt overflowing 31 bits, that's >> fine. What I'm worried about is the percpu refs having systmetic >> drift (got on certain cpus and put on others), and the total counter >> being overflowed while percpu draining is in progress. To me, the >> problem is that the bias which tags that draining in progress can be >> overflown by percpu refs. The summing can be the same but the tagging >> should be put where summing can't overflow it. It'd be great if you >> can explain in the comment in what range it's safe and why, because >> that'd make the limits clear to both you and other people reading the >> code and would help a lot in deciding whether it's safe enough. > > (This is why I initially didn't (don't) like the bias method, it makes > things harder to reason about). > > The fact that the counter is percpu is irrelevant w.r.t. the bias; we > sum all the percpu counters up before adding them to the atomic counter > and subtracting the bias, so when we go to add the percpu counters it's > no different from if the percpu counter was a single integer all along. > > So there's only two counters we're adding together; there's the percpu > counter (just think of it as a single integer) that we started out > using, but then at some point in time we start applying the gets and > puts to the atomic counter. > > Note that there's no systemic drift here; at time t all the gets and > puts were happening to one counter, and then at time t+1 they switch to > a different counter. > > We know the sum of the counters will be positive (again, because modular > arithmatic is still commutative; when we sum the counters it's as if > there was a single counter all along) but that doesn't mean either of > the individual counters can't be negative. > > (Actually, unless I'm mistaken in this version the percpu counter can > never go negative - it definitely could with dynamic percpu allocation, > as you need a atomic_t -> percpu transition when the atomic_t was > 0 > for the percpu counter to go negative; but in this version we start out > using the percpu counters and the atomic_t 0 (ignoring for the moment > the bias and the initial ref). > > So, the sum must be positive but the atomic_t could be negative. How > negative? > > We can't do a get() to the percpu counters after we've seen that the ref > is no longer in percpu mode - so after we've done one put to the > atomic_t we can do more puts to atomic_t (or gets to the atomic_t) but > we can't do a get to the percpu counter. > > And we can't do more puts than there have been gets - because the sum > can't be negative. So the most puts() we can do at any given time is the > real count, or sum of the percpu ref and atomic_t. > > Therefore, the amount the atomic_t can go negative is bounded by the > maximum value of the refcount. > > So if we say arbitrarily that the maximum legal value of the refcount is > - say - 1U << 31, then the atomic_t will always be greater than > -((int) (1U << 31)). > > So as long as the total number of outstanding refs never exceeds the > bias we're fine. Yes. We should note the 31 bit limit somewhere. We could WARN_ON() if count is >= BIAS in percpu_ref_kill(), perhaps. >> I probably should have made it clearer. Sorry about that. tryget() >> is fine. I was curious about count() as it's always a bit dangerous a >> query interface which is racy and can return something unexpected like >> false zero or underflowed refcnt. > > Yeah, it is, it was intended just for the module code where it's only > used for the value lsmod shows. Open code it there? >> Let's just have percpu_ref_kill(ref, release) which puts the base ref >> and invokes release whenever it's done. > > Release has to be stored in struct percpu_ref() so it can be invoked > after a call_rcu() (percpu_ref_kill -> call_rcu() -> > percpu_ref_kill_rcu() -> percpu_ref_put()) so I'm passing it to > percpu_ref_init(), but yeah. Or hand it to percpu_ref_put(), too, as per kref_put(). I hate indirect magic. Cheers, Rusty. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html