On 2/15/23 22:49, Arnd Bergmann wrote: > On Wed, Feb 15, 2023, at 14:24, Greg Kroah-Hartman wrote: >> On Wed, Feb 15, 2023 at 09:18:48PM +0900, Damien Le Moal wrote: >>> On 2/15/23 21:01, Greg Kroah-Hartman wrote: >>>>>>> @@ -330,6 +330,10 @@ static int pci_epf_test_copy(struct pci_epf_test *epf_test, bool use_dma) >>>>>>> enum pci_barno test_reg_bar = epf_test->test_reg_bar; >>>>>>> volatile struct pci_epf_test_reg *reg = epf_test->reg[test_reg_bar]; >>>>>> >>>>>> note, volatile is almost always wrong, please fix that up. >>>>> >>>>> OK. Will think of something else. >>>> >>>> If this is io memory, use the proper accessors to access it. If it is >>>> not io memory, then why is it marked volatile at all? >>> >>> This is a PCI bar memory. So I can simply copy the structure locally with >>> memcpy_fromio() and memcpy_toio(). >> >> Great, please do so instead of trying to access it directly like this, >> which will break on some platforms. > > I think the reverse is true here: looking at where the pointer comes > from, 'reg' is actually the result of dma_alloc_coherent() in the > memory of the local (endpoint) machine, though it appears as a BAR on > the remote (host) side and gets mapped with ioremap() there. > > This means that the host must use readl/write/memcpy_fromio/memcpy_toio > to access the buffer, matching the __iomem token there, while the > endpoint side not use those. On some machines, readl/write take > arguments that are not compatible with normal pointers, and will > do something completely different there. > > A volatile access is not the worst option here, though this conflicts > with the '__packed' annotation in the structure definition that > may require bytewise access on architectures without unaligned > access. > > I would drop the __packed in the definition, possibly annotating > only the 64-bit src_addr and dst_addr members as __packed to ensure > the layout is unchanged but the structure as a whole is 32-bit > aligned, and then use READ_ONCE()/WRITE_ONCE() to atomically > access each member in the coherent buffer. I guess that would work too. But given that there are accesses to individual members all over the place, I think it would be easier to get a local copy of the reg structure in pci_epf_test_cmd_handler() and pass a pointer of that local copy to the pci_epf_test_xxx() functions. The only READ_ONCE() needed would be to test the command field on entry to pci_epf_test_cmd_handler() to be sure that we have a valid command. The host side always sets the reg command field last, which I think kind of assumes an ordered update on the EP side (all other fields set before the command field). That does seem a bit fragile to me as my understanding is that PCI does not necessarily guarantees ordering of IO TLPs. But I may be wrong here. > If ordering between the accesses is required, you can add > dma_rmb() and dma_wmb() barriers. Which I guess is the one thing we need after testing the reg command field in pci_epf_test_cmd_handler() and before making the local copy, to avoid problems with ordering of the reg fields writes from the host. Will use that in v2. -- Damien Le Moal Western Digital Research