On Mon, Aug 28, 2023 at 10:00 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote: > > On Mon, Aug 28, 2023 at 9:15 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > > [...] > > > > > > Well, I really have to say that I do not like the notion that reading > > > stats is unpredictable. This just makes it really hard to use. If > > > the precision is to be sarificed then this should be preferable over > > > potentially high global lock contention. We already have that model in > > > place of /proc/vmstat (configurable timeout for flusher and a way to > > > flush explicitly). I appreciate you would like to have a better > > > precision but as you have explored the locking is really hard to get rid > > > of here. > > > > Reading the stats *is* unpredictable today. In terms of > > accuracy/staleness and cost. Avoiding the flush entirely on the read > > path will surely make the cost very stable and cheap, but will make > > accuracy even less predictable. > > > > On average you would get the stats at most 2 second old, so I would > not say it is less predictable. > > > > > > > So from my POV I would prefer to avoid flushing from the stats reading > > > path and implement force flushing by writing to stat file. If the 2s > > > flushing interval is considered to coarse I would be OK to allow setting > > > it from userspace. This way this would be more in line with /proc/vmstat > > > which seems to be working quite well. > > > > > > If this is not accaptable or deemed a wrong approach long term then it > > > would be good to reonsider the current cgroup_rstat_lock at least. > > > Either by turning it into mutex or by dropping the yielding code which > > > can severly affect the worst case latency AFAIU. > > > > Honestly I think it's better if we do it the other way around. We make > > flushing on the stats reading path non-unified and deterministic. That > > model also exists and is used for cpu.stat. If we find a problem with > > the locking being held from userspace, we can then remove flushing > > from the read path and add interface(s) to configure the periodic > > flusher and do a force flush. > > > > Here I agree with you. Let's go with the approach which is easy to > undo for now. Though I prefer the new explicit interface for flushing, > that step would be very hard to undo. Let's reevaluate if the proposed > approach shows negative impact on production traffic and I think > Cloudflare folks can give us the results soon. Do you prefer we also switch to using a mutex (with preemption disabled) to avoid the scenario Michal described where flushers give up the lock and sleep resulting in an unbounded wait time in the worst case?