On Sun, 12 May 2024 07:44:25 -0700, Paul E. McKenney wrote: > On Sun, May 12, 2024 at 08:02:59AM +0200, John Paul Adrian Glaubitz wrote: >> On Sat, 2024-05-11 at 18:26 -0700, Paul E. McKenney wrote: >> > And that breaks things because it can clobber concurrent stores to >> > other bytes in that enclosing machine word. >> >> But pre-EV56 Alpha has always been like this. What makes it broken >> all of a sudden? > > I doubt if it was sudden. Putting concurrently (but rarely) accessed > small-value quantities into single bytes is a very natural thing to do, > and I bet that there are quite a few places in the kernel where exactly > this happens. I happen to know of a specific instance that went into > mainline about two years ago. > > So why didn't the people running current mainline on pre-EV56 Alpha > systems notice? One possibility is that they are upgrading their > kernels only occasionally. Another possibility is that they are seeing > the failures, but are not tracing the obtuse failure modes back to the > change(s) in question. Yet another possibility is that the resulting > failures are very low probability, with mean times to failure that are > so long that you won't notice anything on a single system. Another possibility is that the Jensen system was booted into uni processer mode. Looking at the early boot log [1] provided by Ulrich (+CCed) back in Sept. 2021, I see the following by running "grep -i cpu": >> > [1] https://marc.info/?l=linux-alpha&m=163265555616841&w=2 [ 0.000000] Memory: 90256K/131072K available (8897K kernel code, 9499K rwdata, \ 2704K rodata, 312K init, 437K bss, 40816K reserved, 0K cma-reserved) [ 0.000000] \ random: get_random_u64 called from __kmem_cache_create+0x54/0x600 with crng_init=0 [ \ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 [ 0.000000] ^^^^^^ Without any concurrent atomic updates, the "broken" atomic accesses won't matter, I guess. Thanks, Akira