On Thu, 30 May 2024 at 15:57, Maciej W. Rozycki <macro@xxxxxxxxxxx> wrote: > > On Wed, 29 May 2024, Linus Torvalds wrote: > > > > The 21064 actually did atomicity with an external pin on the bus, the > > same way people used to do before caches even existed. > > Umm, 8086's LOCK#, anyone? Well, yes and no. So yes, exactly like 8086 did before having caches. But no, not like the alpha contemporary PPro that did have caches. The PPro already did locked cycles in the caches. Yes, the PPro still did have an external lock pin (and in fact current much more modern x86 CPUs do too), but it's only used for locked IO accesses or possibly cacheline crossing accesses. So x86 has supported atomic accesses on IO - and it is very very slow, to this day. So slow, and problematic, in fact, that Intel is only now trying to remove it (look up "split lock" But the 21064 explicitly did not support locking on IO - and unaligned LL/SC accesses obviously also did not work. So I really feel the 21064 was broken. It's probably related to the whole cache coherency being designed to be external to the built-in caches - or even the Bcache. The caches basically are write-through, and the weak memory ordering was designed for allowing this horrible model. > > In fact, it's worse than "not thread safe". It's not even safe on UP > > with interrupts, or even signals in user space. > > Ouch, I find it a surprising oversight. The sad part is that it doesn't seem to have been an oversight. It really was broken-as-designed. Basically, the CPU was designed for single-threaded Spec benchmarks and absolutely nothing else. Classic RISC where you recompile to fix problems like the atomicity thing - "just use a 32-bit sig_atomic_t and you're fine") The original alpha architecture handbook makes a big deal of how clever the lack of byte and word operations is. I also remember reading an article by Dick Sites - one of the main designers - talking a lot about how the lack of byte operations is great, and encourages vectorizing byte accesses and doing string operations in whole words. Linus