Thanks Radek, Understand it. Best, -Chunmei > -----Original Message----- > From: Radoslaw Zarzynski [mailto:rzarzyns@xxxxxxxxxx] > Sent: Thursday, August 16, 2018 7:11 AM > To: Liu, Chunmei <chunmei.liu@xxxxxxxxx> > Cc: Kefu Chai <kchai@xxxxxxxxxx>; The Esoteric Order of the Squid Cybernetic > <ceph-devel@xxxxxxxxxxxxxxx> > Subject: Re: ceph seastar perfcounters implementation > > Hi :-) > > Excuse me the late response. Yesterday was a holiday in Poland. > > I think we're worried about the same thing. Modifying a counter from multiple > threads can be costly because of the cache coherence CPUs need to maintain. > There is a very good text explaining this far, far better than I > could: please take > a look on page ~57 of the perf book [1]. > > The overhead can be minimized by partitioning perf counter's memory, and thus > basically unsharing it from writer threads' perspective. TLS (thread local storage) > is a useful tool for this kind of job. > However, our environment puts extra restrictions on the implementation of TLS > augmented perf counters. One of them comes from e.g. the Throttle class. It's a > common bit that uses a per-instance set of counters. Unfortunately, the number > of such instances is hard/impossible to determine at compile-time as it depends > on e.g. number of connections. > > The additional thing I've forgotten last time is the ::reset() part of PerfCounters' > interface. In pure SeaStar world it looks doable in much more elegant way than > it has been made in perf_counter_t. > > [1] > https://mirrors.edge.kernel.org/pub/linux/kernel/people/paulmck/perfbook/per > fbook-1c.2017.01.02a.pdf > > On Tue, Aug 14, 2018 at 7:37 PM, Liu, Chunmei <chunmei.liu@xxxxxxxxx> wrote: > > Hi Radek, > > > > Thanks for joining this topic, I am not very clear about the problem you > solving now. Could you please describe in detail, why there is cache ping-pongs > (TLS), how will you solve it? > > > > Thanks! > > -Chunmei > > > >> -----Original Message----- > >> From: Radoslaw Zarzynski [mailto:rzarzyns@xxxxxxxxxx] > >> Sent: Tuesday, August 14, 2018 4:39 AM > >> To: Liu, Chunmei <chunmei.liu@xxxxxxxxx> > >> Cc: Kefu Chai <kchai@xxxxxxxxxx>; The Esoteric Order of the Squid > >> Cybernetic <ceph-devel@xxxxxxxxxxxxxxx> > >> Subject: Re: ceph seastar perfcounters implementation > >> > >> Hello! > >> > >> I'm working on an implementation [1] of perf counters that tries to > >> avoid cache ping-pongs (TLS) and move as much as possible to > >> compile-time while keeping the current logic. > >> > >> The idea behind is to have the possibility of coexistence with > >> PerfCounters/ PerfCountersCollection and migrate gradually. Although > >> not directly targeting SeaStar, I believe similar problems might pop-up also > there: > >> > >> * number of threads is not exactly known at compile time, > >> * number of counter's instances is variable and depends on run-time events. > >> > >> Regards, > >> Radek > >> > >> [1] > >> https://github.com/ceph/ceph/compare/master...rzarzynski:wip-common- > >> perfctr-tls > >> > >> On Sat, Aug 11, 2018 at 1:09 AM, Liu, Chunmei <chunmei.liu@xxxxxxxxx> > wrote: > >> > Hi kefu, > >> > > >> > I took a look on perfcounters implementations in Ceph, and > >> > found in > >> Messenger each worker threads has its own perfcounters instance, and > >> in bluestore, it is single thread in each part, and only one > >> perfcounter instance, in OSD also no perfcoutners instance shared across > threads. > >> > > >> > So in Ceph Seastar mode, how about each seastar thread has its > >> > own > >> perfcounters? That means no need consider the perfcounters shared > >> between threads? > >> > > >> > What is your idea? > >> > > >> > Thanks! > >> > -Chunmei > >> >