Hello, Willy! Continuing from linkedin: > Maybe this doesn't work as well as expected because of the common L3 cache > that runs at a single frequency and that imposes discrete timings. Also, > I noticed that on modern CPUs, cache lines tend to "stick" at least a few > cycles once they're in a cache, which helps the corresponding CPU chain > a few atomic ops undisturbed. For example on a 8-core Ryzen I'm seeing a > minimum of 8ns between two threads of the same core (L1 probably split in > two halves), 25ns between two L2 and 60ns between the two halves (CCX) > of the L3. This certainly makes it much harder to trigger concurrency > issues. Well let's continue by e-mail, it's a real pain to type in this > awful interface. Indeed, I get best (worst?) results from memory latency on multi-socket systems. And these results were not subtle: https://paulmck.livejournal.com/62071.html All that aside, any advice on portably and usefully getting 2-3x clock frequency differences into testing would be quite welcome. Thanx, Paul