> >>> >>>>>> But the example here is different, >>>>> >>>>> That is intentional. Wills discussion partially triggered this. Though I am wondering >>>>> if we should document that as well. >>>>> >>>>>> the compiler can not use the value loaded from line 5 >>>>>> unless the compiler can deduce that the tmp is equals to p in which case the address dependency >>>>>> doesn’t exist anymore. >>>>>> >>>>>> What am I missing here? >>>>> >>>>> Maybe you are trying to rationalize too much that the sequence mentioned cannot result >>>>> in a counter intuitive outcome like I did? >>>>> >>>>> The point AFAIU is not just about line 10 but that the compiler can replace any of the >>>>> lines after the plain access with the cached value. >>>> >>>> Well, IIUC, according to the C standard, the compiler can do anything if there is a data race (undefined behavior). >>>> >>>> However, what if a write is not protected with WRITE_ONCE and the read is marked with READ_ONCE? >>>> That’s also a data race, right? But the kernel considers it is Okay if the write is machine word aligned. >>> >>> Yes, but there is a compiler between the HLL code and what the >>> processor sees which can tear the write. How can not using >>> WRITE_ONCE() prevent store-tearing? See [1]. My understanding is that >>> it is OK only if the reader did a NULL check. In that case the torn >> >> Yes, a write-write data race where the value is the same is also fine. >> >> But they are still data race, if the compiler is within its right to do anything it likes (due to data race), >> we still need WRITE_ONCE() in these cases, though it’s semantically safe. >> >> IIUC, even with _ONCE(), the compiler is within its right do anything according to the standard (at least before the upcoming C23), because the standard doesn’t consider a volatile access to be atomic. >> >> However, the kernel consider the volatile access to be atomic, right? >> >> BTW, line 5 in the example is likely to be optimized away. And yes, the compiler can cache the value loaded from line 5 from the perspective of undefined behavior, even if I believe it would be a compiler bug from the perspective of kernel. > > I am actually a bit lost with what you are trying to say. Are you saying that mixing > plain accesses with marked accesses is an acceptable practice? I’m trying to say that sometimes data race is fine, that’s why we have the data_race(). Even if the standard says data race results in UB. And IMHO, the possible data race at line 5 in this example is also fine, unless the compiler deduces that the value of gp is always the same. > > I would like others to weight in as well since I am not seeing what Alan is suggesting. > AFAICS, in the absence of barrier(), any optimization caused by plain access > makes it a bad practice to mix it. > > Thanks, > > - Joel > > > >> >>> result will not change the semantics of the program. But otherwise, >>> that's bad. >>> >>> [1] https://lwn.net/Articles/793253/#Store%20Tearing >>> >>> thanks, >>> >>> - Joel >>> >>> >>>> >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>>> >>>>>>> +plain accesses of a memory location with rcu_dereference() of the same memory >>>>>>> +location, in code involved in a data race. >>>>>>> + >>>>>>> In short, updaters use rcu_assign_pointer() and readers use >>>>>>> rcu_dereference(), and these two RCU API elements work together to >>>>>>> ensure that readers have a consistent view of newly added data elements. >>>>>>> -- >>>>>>> 2.41.0.585.gd2178a4bd4-goog