Hi :-) Excuse me the late response. Yesterday was a holiday in Poland. I think we're worried about the same thing. Modifying a counter from multiple threads can be costly because of the cache coherence CPUs need to maintain. There is a very good text explaining this far, far better than I could: please take a look on page ~57 of the perf book [1]. The overhead can be minimized by partitioning perf counter's memory, and thus basically unsharing it from writer threads' perspective. TLS (thread local storage) is a useful tool for this kind of job. However, our environment puts extra restrictions on the implementation of TLS augmented perf counters. One of them comes from e.g. the Throttle class. It's a common bit that uses a per-instance set of counters. Unfortunately, the number of such instances is hard/impossible to determine at compile-time as it depends on e.g. number of connections. The additional thing I've forgotten last time is the ::reset() part of PerfCounters' interface. In pure SeaStar world it looks doable in much more elegant way than it has been made in perf_counter_t. [1] https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook-1c.2017.01.02a.pdf On Tue, Aug 14, 2018 at 7:37 PM, Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote: > Hi Radek, > > Thanks for joining this topic, I am not very clear about the problem you solving now. Could you please describe in detail, why there is cache ping-pongs (TLS), how will you solve it? > > Thanks! > -Chunmei > >> -----Original Message----- >> From: Radoslaw Zarzynski [mailto:rzarzyns@xxxxxxxxxx] >> Sent: Tuesday, August 14, 2018 4:39 AM >> To: Liu, Chunmei <chunmei.liu@xxxxxxxxx> >> Cc: Kefu Chai <kchai@xxxxxxxxxx>; The Esoteric Order of the Squid Cybernetic >> <ceph-devel@xxxxxxxxxxxxxxx> >> Subject: Re: ceph seastar perfcounters implementation >> >> Hello! >> >> I'm working on an implementation [1] of perf counters that tries to avoid cache >> ping-pongs (TLS) and move as much as possible to compile-time while keeping >> the current logic. >> >> The idea behind is to have the possibility of coexistence with PerfCounters/ >> PerfCountersCollection and migrate gradually. Although not directly targeting >> SeaStar, I believe similar problems might pop-up also there: >> >> * number of threads is not exactly known at compile time, >> * number of counter's instances is variable and depends on run-time events. >> >> Regards, >> Radek >> >> [1] https://github.com/ceph/ceph/compare/master...rzarzynski:wip-common- >> perfctr-tls >> >> On Sat, Aug 11, 2018 at 1:09 AM, Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote: >> > Hi kefu, >> > >> > I took a look on perfcounters implementations in Ceph, and found in >> Messenger each worker threads has its own perfcounters instance, and in >> bluestore, it is single thread in each part, and only one perfcounter instance, in >> OSD also no perfcoutners instance shared across threads. >> > >> > So in Ceph Seastar mode, how about each seastar thread has its own >> perfcounters? That means no need consider the perfcounters shared between >> threads? >> > >> > What is your idea? >> > >> > Thanks! >> > -Chunmei >> >