Hello! On 6/9/20 6:41 PM, Hou Tao wrote: > Hi, > > For the following case, the under-counting is still possible if io2 wins cmpxchg(): > > t 0123456 > io1 |-----| > io2 |--| > stamp 0 6 > io_ticks 0 3 I hadn't noticed that bug. It looks like it can produce an unbounded quantity of undercount. > However considering patch 2 tries to improve sampling rate to 1 us, the problem will gone. Now that you mention it, the below case is also poorly handled, and will be incorrect regardless of sampling frequency. It experiences issues both under this patch (labeled io_ticks) and the current implementation (labeled io_ticks~): t 0123456 io1 |-----| io2 |-| stamp 0 56 io_ticks 28 stamp~ 0 3 56 io_ticks~ 1 34 I am beginning to doubt whether it is even possible to produce an algorithm that is simultaneously unbiased and synchronization-lite. At the same time, Ming's comment on patch 2 was leading me to wonder about the value of being synchronization-lite in the first place. At the proposed sampling rate of 1M/s, it is unlikely that we'd ever exercise the synchronization-free code path (and, as your case shows, incorrect). And for every block device that I'm aware of (even the ones that return in 10us), the cost of a disk access still completely dominates the cost of a locked CPU operation by three orders of magnitude. Josh