On Sat, Sep 16, 2023 at 02:55:47PM +0800, Baokun Li wrote: > On 2023/9/13 16:59, Yi Zhang wrote: > > The issue still can be reproduced on the latest linux tree[2]. > > To reproduce I need to run about 1000 times blktests block/001, and > > bisect shows it was introduced with commit[1], as it was not 100% > > reproduced, not sure if it's the culprit? > > > > > > [1] 9257959a6e5b locking/atomic: scripts: restructure fallback ifdeffery > Hello, everyone! > > We have confirmed that the merge-in of this patch caused hlist_bl_lock > (aka, bit_spin_lock) to fail, which in turn triggered the issue above. Thanks for this! I believe I know what the issue is. I took a look at the generated assembly for hlist_bl_lock() and hlist_bl_unlock(), and for the latter I see a plain store rather than a store-release as was intended. I believe that in 9257959a6e5b, I messed up the fallback logic for atomic*_set_release(): | static __always_inline void | raw_atomic64_set_release(atomic64_t *v, s64 i) | { | #if defined(arch_atomic64_set_release) | arch_atomic64_set_release(v, i); | #elif defined(arch_atomic64_set) | arch_atomic64_set(v, i); | #else | if (__native_word(atomic64_t)) { | smp_store_release(&(v)->counter, i); | } else { | __atomic_release_fence(); | raw_atomic64_set(v, i); | } | #endif | } On arm64 we want to use smp_store_release(), and don't provide arch_atomic64_set_release(). Unfortunately we *do* provide arch_atomic64_set(), and the ifdeffery above will choose that in preference. Prior to that commit, the ifdeffery would do what we want: | #ifndef arch_atomic64_set_release | static __always_inline void | arch_atomic64_set_release(atomic64_t *v, s64 i) | { | if (__native_word(atomic64_t)) { | smp_store_release(&(v)->counter, i); | } else { | __atomic_release_fence(); | arch_atomic64_set(v, i); | } | } | #define arch_atomic64_set_release arch_atomic64_set_release | #endif That explains the lock going wrong -- we lose the RELEASE semantic on hlist_bl_unlock(), and so loads and stores within the critical section aren't guaranteed to be visible to the next hlist_bl_lock(). On x86 this happens to work becauase of TSO. I'm working on fixing that now; I'll try to have a patch shortly. Thanks, Mark.