On Sun, 4 Dec 2022 at 00:08, Dominique Martinet <asmadeus@xxxxxxxxxxxxx> wrote: > > Marco Elver wrote on Sat, Dec 03, 2022 at 05:46:46PM +0100: > > > But I can't really find a problem with what KCSAN complains about -- > > > we are indeed accessing status from two threads without any locks. > > > Instead of a lock, we're using a barrier so that: > > > - recv thread/cb: writes to req stuff || write to req status > > > - p9_client_rpc: reads req status || reads other fields from req > > > > > > Which has been working well enough (at least, without the barrier things > > > blow up quite fast). > > > > > > So can I'll just consider this a false positive, but if someone knows > > > how much one can read into this that'd be appreciated. > > > > The barriers only ensure ordering, but not atomicity of the accesses > > themselves (for one, the compiler is well in its right to transform > > plain accesses in ways that the concurrent algorithm wasn't designed > > for). In this case it looks like it's just missing > > READ_ONCE()/WRITE_ONCE(). > > Aha! Thanks for this! > > I've always believed plain int types accesses are always atomic and the > only thing to watch for would be compilers reordering instrucions, which > would be ensured by the barrier in this case, but I guess there are some > architectures or places where this isn't true? > > > I'm a bit confused though, I can only see five places where wait_event* > functions use READ_ONCE and I believe they more or less all would > require such a marker -- I guess non-equality checks might be safe > (waiting for a value to change from a known value) but if non-atomic > updates are on the table equality and comparisons checks all would need > to be decorated with READ_ONCE; afaiu, unlike usespace loops with > pthread_cond_wait there is nothing protecting the condition itself. > > Should I just update the wrapped condition, as below? > > - err = wait_event_killable(req->wq, req->status >= REQ_STATUS_RCVD); > + err = wait_event_killable(req->wq, > + READ_ONCE(req->status) >= REQ_STATUS_RCVD); Yes, this looks good! > The writes all are straightforward, there's all the error paths to > convert to WRITE_ONCE too but that's not difficult (leaving only the > init without such a marker); I'll send a patch when you've confirmed the > read looks good. > (the other reads are a bit less obvious as some are protected by a lock > in trans_fd, which should cover all cases of possible concurrent updates > there as far as I can see, but this mixed model is definitely hard to > reason with... Well, that's how it was written and I won't ever have time > to rewrite any of this. Enough ranting.) If the lock-protected accesses indeed are non-racy, they should be left unmarked. If some assumption here turns out to be wrong, KCSAN would (hopefully) tell us one way or another. Thanks! -- Marco