On Tue, Jan 08, 2019 at 11:58:26AM -0500, Waiman Long wrote: > On 01/07/2019 09:04 PM, Dave Chinner wrote: > > On Mon, Jan 07, 2019 at 05:41:39PM -0500, Waiman Long wrote: > >> On 01/07/2019 05:32 PM, Dave Chinner wrote: > >>> On Mon, Jan 07, 2019 at 10:12:56AM -0500, Waiman Long wrote: > > What I was suggesting is that you change the per-cpu counter > > implementation to the /generic infrastructure/ that solves this > > problem, and then determine if the extra update overhead is at all > > measurable. If you can't measure any difference in update overhead, > > then slapping complexity on the existing counter to attempt to > > mitigate the summing overhead is the wrong solution. > > > > Indeed, it may be that you need o use a custom batch scaling curve > > for the generic per-cpu coutner infrastructure to mitigate the > > update overhead, but the fact is we already have generic > > infrastructure that solves your problem and so the solution should > > be "use the generic infrastructure" until it can be proven not to > > work. > > > > i.e. prove the generic infrastructure is not fit for purpose and > > cannot be improved sufficiently to work for this use case before > > implementing a complex, one-off snowflake counter implementation... > > I see your point. I like the deferred summation approach that I am > currently using. If I have to modify the current per-cpu counter > implementation to support that No! Stop that already. The "deferred counter summation" is exactly the problem the current algorithm has and exactly the problem the generic counters /don't have/. Changing the generic percpu counter algorithm to match this specific hand rolled implementation is not desirable as it will break implementations that rely on the bound maximum summation deviation of the existing algorithm (e.g. the ext4 and XFS ENOSPC accounting algorithms). > and I probably need to add counter > grouping support to amortize the overhead, that can be a major The per-cpu counters already have configurable update batching to amortise the summation update cost across multiple individual per-cpu updates. You don't need to change the implementation at all, just tweak the amortisation curve appropriately for the desired requirements for update scalability (high) vs read accuracy (low). Let's face it, on large systems where counters are frequently updated, the resultant sum can be highly inaccurate by the time a thousand CPU counters have been summed. The generic counters have a predictable and bound "fast sum" maximum deviation (batch size * nr_cpus), so over large machines are likely to be much more accurate than "unlocked summing on demand" algorithms. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx