Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
<clabbe.montjoie@xxxxxxxxx> wrote:
>
> Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > >>>> This series is based on the alternatives changes done in my svpbmt series
> > >>>> and thus also depends on Atish's isa-extension parsing series.
> > >>>>
> > >>>> It implements using the cache-management instructions from the  Zicbom-
> > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > >>>>
> > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > >>>> different set of cache instructions. But while they are different,
> > >>>> instructions they provide the same functionality, so a variant can
> > >>>> easly hook into the existing alternatives mechanism on those.
> > >>>>
> > >>>>
> > >>>
> > >>> Hello
> > >>>
> > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > >>>
> > >>> I am hitting a buffer corruption problem with DMA.
> > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > >>>
> > >>> The following small code show the problem:
> > >>>
> > >>> dma_addr_t dma;
> > >>> u8 *buf;
> > >>> #define BSIZE 2048
> > >>> #define DMASIZE 16
> > >>>
> > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > >>> for (i = 0; i < BSIZE; i++)
> > >>>     buf[i] = 0xFE;
> > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > >>
> > >> This function (through dma_direct_map_page()) ends up calling
> > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > >> openrisc, and powerpc). So this appears to be working as intended.
> > >
> > > This behavour is not present at least on ARM and ARM64.
> > > The sample code I provided does not corrupt the buffer on them.
> >
> > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > a dirty cache line. The cache topology and implementation is totally different
> > across the SoCs, so this is not too surprising.
> >
> > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > unidirectional DMA transfer from the device into that buffer. So the contents of
> > the buffer are "undefined" until the DMA transfer completes. If you are also
> > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> >
> > Regards,
> > Samuel
>
> +CC crypto mailing list + maintainer
>
> My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> concat a poison buffer to check that device does write beyond buffer.
>
> But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
>
> So you mean that on SoC D1, this crypto API check strategy is impossible ?

I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
for the testing. (All cache block-aligned data from the device for the
CPU should be invalided.)

+void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum
dma_data_direction dir)
+{
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
+ break;
+ case DMA_FROM_DEVICE:
+ ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
+ break;
+ case DMA_BIDIRECTIONAL:
+ ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
+ break;
+ default:
+ break;
+ }
+}
+
+void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum
dma_data_direction dir)
+{
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ break;
+ case DMA_FROM_DEVICE:
+ case DMA_BIDIRECTIONAL:
+ ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
+ break;
+ default:
+ break;
+ }
+}



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/




[Index of Archives]     [Kernel]     [Gnu Classpath]     [Gnu Crypto]     [DM Crypt]     [Netfilter]     [Bugtraq]

  Powered by Linux