On 2025-02-21 12:10:51 [-0500], Joe Damato wrote: > On Fri, Feb 21, 2025 at 12:52:19PM +0100, Sebastian Andrzej Siewior wrote: > > This is a follow-up on > > https://lore.kernel.org/all/20250213093925.x_ggH1aj@xxxxxxxxxxxxx/ > > > > to convert the page_pool statistics to u64_stats_t to avoid u64 related > > problems on 32bit architectures. > > While looking over it, the comment for recycle_stat_inc() says that it > > is safe to use in preemptible context. > > I wrote that comment because it's an increment of a per-cpu counter. > > The documentation in Documentation/core-api/this_cpu_ops.rst > explains in more depth, but this_cpu_inc is safe to use without > worrying about pre-emption and interrupts. I don't argue that it is not safe to use in preemptible context. I am just saying that it is not safe after the rework. If it is really used like that, then it is no longer safe to do so (after the rework). > > The 32bit update is split into two 32bit writes and if we get > > preempted in the middle and another one makes an update then the > > value gets inconsistent and the previous update can overwrite the > > following. (Rare but still). > > Have you seen this? Can you show the generated assembly which > suggests that this occurs? It would be helpful if you could show the > before and after 32-bit assembly code. … So there are two things: First we have alloc_stat_inc(), which does #define alloc_stat_inc(pool, __stat) (pool->alloc_stats.__stat++) so on x86 32bit this turns into | addl $1, 120(%ebx) #, pool_15(D)->alloc_stats.fast | adcl $0, 124(%ebx) #, pool_15(D)->alloc_stats.fast So the lower 4 byte are incremented before the higher 4 byte are. recycle_stat_inc() is using this_cpu_inc() and performs a similar update. On x86-32 it turns into | movl 836(%ebx), %eax # pool_15(D)->recycle_stats, s | pushf ; pop %edx # flags | cli | movl %fs:this_cpu_off, %ecx # *_20, | addl %ecx, %eax #, _42 | addl $1, (%eax) #, *_42 | adcl $0, 4(%eax) #, *_42 | testb $2, %dh #, flags | je .L508 #, | sti |.L508: so the update can be performed safely in preemptible context as the CPU is determined within the IRQ-off section and so is the increment itself performed. It updates always the local value belonging to the CPU. Reading the values locally (on the CPU that is doing the update) is okay but reading the value from a remote CPU while an update might be done can result in reading the lower 4 bytes before the upper 4 bytes are visible. This can lead to an inconsistent value on 32bit which is likely "corrected" on the next read. Thus my initial question: Do we care? If so, I suggest to use u64_stats like most of the networking stack. However if we do so, the update must be performed with disabled-BH as I assume the updates are done in softirq context. It must be avoided that one update preempts another. Sebastian