On Wed, Nov 13, 2024 at 01:50:14PM +0100, Arnd Bergmann wrote: > On Wed, Nov 13, 2024, at 13:03, Suravee Suthikulpanit wrote: > > > > +static void write_dte_upper128(struct dev_table_entry *ptr, struct > > dev_table_entry *new) > > +{ > > + struct dev_table_entry old = {}; > > + > > + old.data128[1] = __READ_ONCE(ptr->data128[1]); > > The __READ_ONCE() in place of READ_ONCE() does make this a > lot simpler. After seeing how it is used though, I wonder if > this should just be an open-coded volatile pointer access > to avoid complicating __unqual_scalar_typeof() further. I've been skeptical we even need the READ_ONCE. This is all under a lock, what is READ_ONCE even protecting against? It is safe to double read. > > + do { > > + /* > > + * Preserve DTE_DATA2_INTR_MASK. This needs to be > > + * done here since it requires to be inside > > + * spin_lock(&dev_data->dte_lock) context. > > + */ > > + new->data[2] &= ~DTE_DATA2_INTR_MASK; > > + new->data[2] |= old.data[2] & DTE_DATA2_INTR_MASK; > > + > > + /* Note: try_cmpxchg inherently update &old.data128[1] on failure */ > > + } while (!try_cmpxchg128(&ptr->data128[1], &old.data128[1], > > new->data128[1])); > > Since this is always done under the lock, is there ever > a chance that the try_cmpxchg128() fails? No, but if something goes wrong and it does fail it still has to progress. > I see that the existing code doesn't have the loop, which makes > sense if this is just meant to be an atomic store. I think AMD architecture imagined this would be done with a SSE 256 bit store operation. cmpxchg128 is sort of a hacky stand in since we don't have that available. A more understandable version of all of this might be to have a store128 wrapper function that invokes cmpxchg internally and guarantees the new value is stored regardless. How hard would it be to invoke the SSE 256 bit store instruction from the kernel? And do all AMD CPUs with IOMMU have it? Jason