On Tue, Nov 16, 2021 at 12:59PM -0800, Shakeel Butt wrote: > On Tue, Nov 16, 2021 at 12:48 PM Mina Almasry <almasrymina@xxxxxxxxxx> wrote: [...] > > > Per above, probably unlikely, but allowed. WRITE_ONCE should prevent it, > > > and at least relieve you to not worry about it (and shift the burden to > > > WRITE_ONCE's implementation). > > > > > > > Thank you very much for the detailed response. I can add READ_ONCE() > > at the no-lock read site, that is no issue. > > > > However, for the writes that happen while holding the lock, the write > > is like so: > > + h_cg->nodeinfo[page_to_nid(page)]->usage[idx] += nr_pages; > > > > And like so: > > + h_cg->nodeinfo[page_to_nid(page)]->usage[idx] -= nr_pages; > > > > I.e. they are increments/decrements. Sorry if I missed it but I can't > > find an INC_ONCE(), and it seems wrong to me to do something like: > > > > + WRITE_ONCE(h_cg->nodeinfo[page_to_nid(page)]->usage[idx], > > + > > h_cg->nodeinfo[page_to_nid(page)] + nr_pages); >From what I gather there are no concurrent writers, right? WRITE_ONCE(a, a + X) is perfectly fine. What it says is that you can have concurrent readers here, but no concurrent writers (and KCSAN will still check that). Maybe we need a more convenient macro for this idiom one day.. Though I think for something like h_cg->nodeinfo[page_to_nid(page)]->usage[idx] += nr_pages; it seems there wants to be an temporary long* so that you could write WRITE_ONCE(*usage, *usage + nr_pages) or something. > > I know we're holding a lock anyway so there is no race, but to the > > casual reader this looks wrong as there is a race between the fetch of > > the value and the WRITE_ONCE(). What to do here? Seems to me the most > > reasonable thing to do is just READ_ONCE() and leave the write plain? > > > > > > How about atomic_long_t? That would work of course; if this is very hot path code it might be excessive if you don't have concurrent writers. Looking at the patch in more detail, the counter is a stat counter that can be read from a stat file, correct? Hypothetically, what would happen if the reader of 'usage' reads approximate values? If the answer is "nothing", then this could classify as an entirely "benign" data race and you could only use data_race() on the reader and leave the writers unmarked using normal +=/-=. Check if it fits "Data-Racy Reads for Approximate Diagnostics" [1]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt#n74