On Thu, 19 May 2022 14:14:16 +0200 Andrew Lunn <andrew@xxxxxxx> wrote: > > > Or otherwise make it unobtrusive (eg, with similar techniques to those > > > used by the networking API). > > > > I just tried this out by re-writing the statistics code using u64_stats_sync > > and per-cpu statistics, which get totaled on sysfs read access as Andrew Lunn > > suggested. > > The results are truly amazing! > > > > The overhead caused by statistics in my test dropped from 43us to just 1-2us. > > When you are potentially dealing with 10 million packets a second, you > cannot spend long on each individual packet incrementing a counter... I guess that is a different category of hardware you are talking about there... ;-) > > This was tested on a 64-bit machine though, so I don't know how it will affect > > 32-bit systems. Nor do I have an easy means of testing this. Any ideas? > > It does make a difference. On a 64 system, you can increment a counter > in a single instruction so you either see the old value, or the new > value. With 32 bit systems, which needs multiple instructions to > increment the counter, so the code takes are you cannot see anything > odd when it needs to overflow from the lower 32bits into the upper 32 > bits. So 32bit systems will be a little bit more expensive. However, > not by a lot. Ok, good to know. > > Also, I have converted all the struct spi_statistics members to u64_stats_t. > > It was easier to test this way. Some of the original types were unsigned long, > > which can have different sizes on 64bit or 32bit systems... is that > > intentional? > > You can keep with uint32, if you want to, and keep with the sequence > counter style locking. For networking, 32bit counters can wrap around > pretty fast, so the main counters are 64 bit. But the concept works > O.K. for smaller types. Since I am already moving the stats to per-cpu structs, wouldn't atomic_inc() be sufficient and even faster for uint32 stats on 32bit systems? I still would like to hear Mark's idea on the exact types. All bytes stats were ULL and transfer counter values were UL. Possibly the reason behind this was to make the transfer counters as efficient to increment as possible for the given platform, as should be the case for "unsigned long". If we move to per-cpu and seqcount, the kernel APIs (u64_stats_t) make us chose explicitly u64 or u32 though. On most 64bit platforms, all those stats are 64bit anyway, while on most 32bit platforms the UL stats will likely be 32bit while the ULL are 64bit. I am inclined to make them all u64, but might decide otherwise if I can detect a performance difference on 32bit systems. What types should I chose? Best regards, -- David Jander