Re: [syzbot] [io-uring?] KCSAN: data-race in io_wq_activate_free_worker / io_wq_worker_running

On 9/14/23 14:25, Marco Elver wrote:
On Thu, 14 Sept 2023 at 15:11, Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:

On 9/13/23 14:07, Marco Elver wrote:
On Wed, 13 Sept 2023 at 14:13, Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:

On 9/13/23 12:29, syzbot wrote:

syzbot found the following issue on:

HEAD commit:    f97e18a3f2fb Merge tag 'gpio-updates-for-v6.6' of git://gi..
git tree:       upstream
console output:
kernel config:
dashboard link:
compiler:       Debian clang version 15.0.6, GNU ld (GNU Binutils for Debian) 2.40

Unfortunately, I don't have any reproducer for this issue yet.

Downloadable assets:
disk image:
kernel image:

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+a36975231499dc24df44@xxxxxxxxxxxxxxxxxxxxxxxxx

BUG: KCSAN: data-race in io_wq_activate_free_worker / io_wq_worker_running

write to 0xffff888127f736c4 of 4 bytes by task 4731 on cpu 1:
    io_wq_worker_running+0x64/0xa0 io_uring/io-wq.c:668
    schedule_timeout+0xcc/0x230 kernel/time/timer.c:2167
    io_wq_worker+0x4b2/0x840 io_uring/io-wq.c:633
    ret_from_fork+0x2e/0x40 arch/x86/kernel/process.c:145
    ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:304

read to 0xffff888127f736c4 of 4 bytes by task 4719 on cpu 0:
    io_wq_get_acct io_uring/io-wq.c:168 [inline]
    io_wq_activate_free_worker+0xfa/0x280 io_uring/io-wq.c:267
    io_wq_enqueue+0x262/0x450 io_uring/io-wq.c:914

1) the worst case scenario we'll choose a wrong type of
worker, which is inconsequential.

2) we're changing the IO_WORKER_F_RUNNING bit, but checking
for IO_WORKER_F_BOUND. The latter one is set at the very
beginning, it would require compiler to be super inventive
to actually hit the problem.

I don't believe it's a problem, but it'll nice to attribute
it properly, READ_ONCE?, or split IO_WORKER_F_BOUND out into
a separate field.

It's a simple bit flag set & read, I'd go for READ_ONCE() (and
WRITE_ONCE() - but up to you, these bitflag sets & reads have been ok
with just the READ_ONCE(), and KCSAN currently doesn't care if there's
a WRITE_ONCE() or not).

value changed: 0x0000000d -> 0x0000000b

This is interesting though - it says that it observed 2 bits being
flipped. We don't see where IO_WORKER_F_FREE was unset though.

__io_worker_busy() clears it, should be it. I assume syz just
missed another false data race with this one. After init only
the worker thread should be changing the flags AFAIR

The data races reported are very real, i.e. it only reports if it
actually observes _real_ concurrency. I guess the question is if these

That's what I'm saying, I assume that syz is not completely
analytical and triggering a race is subject to execution
randomness, and races with IO_WORKER_F_FREE are harder to hit
for syzkaller.

are benign or not. If benign, you can choose to annotate with

Yes, it is, just like the one in the report

READ/WRITE_ONCE [1], data_race, or leave as is (ignoring this report
should not make it re-report any time soon).


Pavel Begunkov

