Hi Yong, On Wed, May 10, 2017 at 12:27 AM, Tomasz Figa <tfiga@xxxxxxxxxxxx> wrote: > On Tue, May 2, 2017 at 9:00 PM, Sakari Ailus <sakari.ailus@xxxxxx> wrote: >> Hi Yong, >> >> Thanks for the patches! Some comments below. >> >> On Sat, Apr 29, 2017 at 06:34:36PM -0500, Yong Zhi wrote: >>> + >>> +/**************** FBPT operations ****************/ >>> + >>> +static void cio2_fbpt_exit_dummy(struct cio2_device *cio2) >>> +{ >>> + if (cio2->dummy_lop) { >>> + dma_free_noncoherent(&cio2->pci_dev->dev, PAGE_SIZE, >>> + cio2->dummy_lop, cio2->dummy_lop_bus_addr); >>> + cio2->dummy_lop = NULL; >>> + } >>> + if (cio2->dummy_page) { >>> + dma_free_noncoherent(&cio2->pci_dev->dev, PAGE_SIZE, >>> + cio2->dummy_page, cio2->dummy_page_bus_addr); >>> + cio2->dummy_page = NULL; >>> + } >>> +} >>> + >>> +static int cio2_fbpt_init_dummy(struct cio2_device *cio2) >>> +{ >>> + unsigned int i; >>> + >>> + cio2->dummy_page = dma_alloc_noncoherent(&cio2->pci_dev->dev, PAGE_SIZE, >>> + &cio2->dummy_page_bus_addr, GFP_KERNEL); >>> + cio2->dummy_lop = dma_alloc_noncoherent(&cio2->pci_dev->dev, PAGE_SIZE, >>> + &cio2->dummy_lop_bus_addr, GFP_KERNEL); >>> + if (!cio2->dummy_page || !cio2->dummy_lop) { >>> + cio2_fbpt_exit_dummy(cio2); >>> + return -ENOMEM; >>> + } >>> + /* >>> + * List of Pointers(LOP) contains 1024x32b pointers to 4KB page each >>> + * Initialize each entry to dummy_page bus base address. >>> + */ >>> + for (i = 0; i < PAGE_SIZE / sizeof(*cio2->dummy_lop); i++) >>> + cio2->dummy_lop[i] = cio2->dummy_page_bus_addr >> PAGE_SHIFT; >>> + >>> + dma_sync_single_for_device(&cio2->pci_dev->dev, /* DMA phy addr */ >>> + cio2->dummy_lop_bus_addr, PAGE_SIZE, DMA_TO_DEVICE); >>> + >>> + return 0; >>> +} >>> + >>> +static void cio2_fbpt_entry_enable(struct cio2_device *cio2, >>> + struct cio2_fbpt_entry entry[CIO2_MAX_LOPS]) >>> +{ >>> + dma_wmb(); >> >> Is there a particular reason to have this? >> >> The documentation states that (Documentation/memory-barriers.txt): >> >> These are for use with consistent memory to guarantee the ordering >> of writes or reads of shared memory accessible to both the CPU and a >> DMA capable device. >> >> This is while the device does not do cache coherent DMA. > > I think the first question we should ask is why the driver allocates > non-consistent memory for this kind of structures. Looks like a waste > of cachelines, given that it seems to be a primarily write-only memory > (from CPU point of view), with rare read backs of entry[0] to check > the flags, which can be modified by the device. Moreover, the fact > that the memory is being cached actually defeats the barrier, as you > first write entry[1..N], call the barrier, write entry[0] and then > flush the cache starting from entry[0], which means that the hardware > sees entry[0] (+/- the size of the cacheline) before further entries. > > Other than that, I think the barrier makes sense, as the compiler (or > even the CPU or buses) could reorder the writes without it, regardless > of whether the memory is consistent or not. (Of course the cache > flushing problem above remains if the memory is non-consistent.) Any updates on this and other comments? Best regards, Tomasz