On Sun, 17 Mar 2024 at 22:55, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > On Fri, Mar 08, 2024 at 02:31:53PM -0800, Paul E. McKenney wrote: > > On Fri, Mar 08, 2024 at 11:02:28PM +0100, Marco Elver wrote: > > > On Fri, 8 Mar 2024 at 22:41, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote: > > > > > > > > Tasks Trace RCU needs a single-byte cmpxchg(), but no such thing exists. > > > > > > Because not all architectures support 1-byte cmpxchg? > > > What prevents us from implementing it? > > > > Nothing that I know of, but I didn't want to put up with the KCSAN report > > in the interim. > > And here is a lightly tested patch to emulate one-byte and two-byte > cmpxchg() for architectures that do not support it. This is just the > emulation, and would be followed up with patches to make the relevant > architectures make use of it. > > The one-byte emulation has been lightly tested on x86. > > Thoughts? > > Thanx, Paul > > ------------------------------------------------------------------------ > > commit d72e54166b56d8b373676e1e92a426a07d53899a > Author: Paul E. McKenney <paulmck@xxxxxxxxxx> > Date: Sun Mar 17 14:44:38 2024 -0700 > > lib: Add one-byte and two-byte cmpxchg() emulation functions > > Architectures are required to provide four-byte cmpxchg() and 64-bit > architectures are additionally required to provide eight-byte cmpxchg(). > However, there are cases where one-byte and two-byte cmpxchg() > would be extremely useful. Therefore, provide cmpxchg_emu_u8() and > cmpxchg_emu_u16() that emulated one-byte and two-byte cmpxchg() in terms > of four-byte cmpxchg(). > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx> > Cc: Marco Elver <elver@xxxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: "Peter Zijlstra (Intel)" <peterz@xxxxxxxxxxxxx> > Cc: Douglas Anderson <dianders@xxxxxxxxxxxx> > Cc: Petr Mladek <pmladek@xxxxxxxx> > Cc: <linux-arch@xxxxxxxxxxxxxxx> > > diff --git a/arch/Kconfig b/arch/Kconfig > index 154f994547632..eef11e9918ec7 100644 > --- a/arch/Kconfig > +++ b/arch/Kconfig > @@ -1506,4 +1506,7 @@ config FUNCTION_ALIGNMENT > default 4 if FUNCTION_ALIGNMENT_4B > default 0 > > +config ARCH_NEED_CMPXCHG_1_2_EMU > + bool > + > endmenu > diff --git a/include/linux/cmpxchg-emu.h b/include/linux/cmpxchg-emu.h > new file mode 100644 > index 0000000000000..fee8171fa05eb > --- /dev/null > +++ b/include/linux/cmpxchg-emu.h > @@ -0,0 +1,16 @@ > +/* SPDX-License-Identifier: GPL-2.0+ */ > +/* > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > + * lacking direct support for these sizes. These are implemented in terms > + * of 4-byte cmpxchg operations. > + * > + * Copyright (C) 2024 Paul E. McKenney. > + */ > + > +#ifndef __LINUX_CMPXCHG_EMU_H > +#define __LINUX_CMPXCHG_EMU_H > + > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new); > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new); > + > +#endif /* __LINUX_CMPXCHG_EMU_H */ > diff --git a/lib/Makefile b/lib/Makefile > index 6b09731d8e619..fecd7b8c09cbd 100644 > --- a/lib/Makefile > +++ b/lib/Makefile > @@ -238,6 +238,7 @@ obj-$(CONFIG_FUNCTION_ERROR_INJECTION) += error-inject.o > lib-$(CONFIG_GENERIC_BUG) += bug.o > > obj-$(CONFIG_HAVE_ARCH_TRACEHOOK) += syscall.o > +obj-$(CONFIG_ARCH_NEED_CMPXCHG_1_2_EMU) += cmpxchg-emu.o Since you add instrumentation explicitly, we need to suppress instrumentation somehow. For the whole file this can be done with: KCSAN_SANITIZE_cmpxchg-emu.o := n Note, since you use cmpxchg, which pulls in its own instrument_read_write(), we can't use a function attribute (like __no_kcsan) if the whole-file no-instrumentation seems like overkill. Alternatively the cmpxchg could be wrapped into a data_race() (like your original RCU use case was doing). But I think "KCSAN_SANITIZE_cmpxchg-emu.o := n" would be my preferred way. With the explicit "instrument_read_write()" also note that this would do double-instrumentation with other sanitizers (KASAN, KMSAN). But I think we actually want to instrument the whole real access with those tools - would it be bad if we accessed some memory out-of-bounds, but that memory isn't actually used? I don't have a clear answer to that. Also, it might be useful to have an alignment check somewhere, because otherwise we end up with split atomic accesses (or whatever other bad thing the given arch does if that happens). Thanks, -- Marco > obj-$(CONFIG_DYNAMIC_DEBUG_CORE) += dynamic_debug.o > #ensure exported functions have prototypes > diff --git a/lib/cmpxchg-emu.c b/lib/cmpxchg-emu.c > new file mode 100644 > index 0000000000000..508b55484c2b6 > --- /dev/null > +++ b/lib/cmpxchg-emu.c > @@ -0,0 +1,68 @@ > +/* SPDX-License-Identifier: GPL-2.0+ */ > +/* > + * Emulated 1-byte and 2-byte cmpxchg operations for architectures > + * lacking direct support for these sizes. These are implemented in terms > + * of 4-byte cmpxchg operations. > + * > + * Copyright (C) 2024 Paul E. McKenney. > + */ > + > +#include <linux/types.h> > +#include <linux/export.h> > +#include <linux/instrumented.h> > +#include <linux/atomic.h> > +#include <asm-generic/rwonce.h> > + > +union u8_32 { > + u8 b[4]; > + u32 w; > +}; > + > +/* Emulate one-byte cmpxchg() in terms of 4-byte cmpxchg. */ > +uintptr_t cmpxchg_emu_u8(volatile u8 *p, uintptr_t old, uintptr_t new) > +{ > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x3); > + int i = ((uintptr_t)p) & 0x3; > + union u8_32 old32; > + union u8_32 new32; > + u32 ret; > + > + old32.w = READ_ONCE(*p32); > + do { > + if (old32.b[i] != old) > + return old32.b[i]; > + new32.w = old32.w; > + new32.b[i] = new; > + instrument_atomic_read_write(p, 1); > + ret = cmpxchg(p32, old32.w, new32.w); > + } while (ret != old32.w); > + return old; > +} > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u8); > + > +union u16_32 { > + u16 h[2]; > + u32 w; > +}; > + > +/* Emulate two-byte cmpxchg() in terms of 4-byte cmpxchg. */ > +uintptr_t cmpxchg_emu_u16(volatile u16 *p, uintptr_t old, uintptr_t new) > +{ > + u32 *p32 = (u32 *)(((uintptr_t)p) & ~0x1); > + int i = ((uintptr_t)p) & 0x1; > + union u16_32 old32; > + union u16_32 new32; > + u32 ret; > + > + old32.w = READ_ONCE(*p32); > + do { > + if (old32.h[i] != old) > + return old32.h[i]; > + new32.w = old32.w; > + new32.h[i] = new; > + instrument_atomic_read_write(p, 2); > + ret = cmpxchg(p32, old32.w, new32.w); > + } while (ret != old32.w); > + return old; > +} > +EXPORT_SYMBOL_GPL(cmpxchg_emu_u16);