On Tue, Nov 16, 2021 at 1:48 PM Marco Elver <elver@xxxxxxxxxx> wrote: > > On Tue, Nov 16, 2021 at 12:59PM -0800, Shakeel Butt wrote: > > On Tue, Nov 16, 2021 at 12:48 PM Mina Almasry <almasrymina@xxxxxxxxxx> wrote: > [...] > > > > Per above, probably unlikely, but allowed. WRITE_ONCE should prevent it, > > > > and at least relieve you to not worry about it (and shift the burden to > > > > WRITE_ONCE's implementation). > > > > > > > > > > Thank you very much for the detailed response. I can add READ_ONCE() > > > at the no-lock read site, that is no issue. > > > > > > However, for the writes that happen while holding the lock, the write > > > is like so: > > > + h_cg->nodeinfo[page_to_nid(page)]->usage[idx] += nr_pages; > > > > > > And like so: > > > + h_cg->nodeinfo[page_to_nid(page)]->usage[idx] -= nr_pages; > > > > > > I.e. they are increments/decrements. Sorry if I missed it but I can't > > > find an INC_ONCE(), and it seems wrong to me to do something like: > > > > > > + WRITE_ONCE(h_cg->nodeinfo[page_to_nid(page)]->usage[idx], > > > + > > > h_cg->nodeinfo[page_to_nid(page)] + nr_pages); > > From what I gather there are no concurrent writers, right? > > WRITE_ONCE(a, a + X) is perfectly fine. What it says is that you can > have concurrent readers here, but no concurrent writers (and KCSAN will > still check that). Maybe we need a more convenient macro for this idiom > one day.. > > Though I think for something like > > h_cg->nodeinfo[page_to_nid(page)]->usage[idx] += nr_pages; > > it seems there wants to be an temporary long* so that you could write > WRITE_ONCE(*usage, *usage + nr_pages) or something. > Ah, perfect, OK I can do this, and maybe add a comment explaining that we don't have concurrent writers. > > > I know we're holding a lock anyway so there is no race, but to the > > > casual reader this looks wrong as there is a race between the fetch of > > > the value and the WRITE_ONCE(). What to do here? Seems to me the most > > > reasonable thing to do is just READ_ONCE() and leave the write plain? > > > > > > > > > > How about atomic_long_t? > > That would work of course; if this is very hot path code it might be > excessive if you don't have concurrent writers. > > Looking at the patch in more detail, the counter is a stat counter that > can be read from a stat file, correct? Hypothetically, what would happen > if the reader of 'usage' reads approximate values? > > If the answer is "nothing", then this could classify as an entirely > "benign" data race and you could only use data_race() on the reader and > leave the writers unmarked using normal +=/-=. Check if it fits > "Data-Racy Reads for Approximate Diagnostics" [1]. > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/memory-model/Documentation/access-marking.txt#n74 Thank you very much for your quick responses. I think if the usage returns a garbage/approximate value once in a while people will notice and I can see it causing issues. I think it's worth doing it 'properly' here. I'll upload another version with these changes.