Re: [PATCH v2 5/8] accel/qaic: Add datapath

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/1/2023 10:05 AM, Stanislaw Gruszka wrote:
On Wed, Mar 01, 2023 at 09:08:03AM -0700, Jeffrey Hugo wrote:
This looks a bit suspicious. Are you sure you can modify
sg->dma_address and still use it as valid value ?

A single entry in the sg table is a contiguous mapping of memory.  If it
wasn't contiguous, it would have to be broken up into multiple entries.  In
the simple case, a driver is going to take the dma_address/len pair and hand
that directly to the device.  Then the device is going to access every
address in that range.

If the device can access every address from dma_address to dma_address +
len, why can't it access a subset of that?

Required address alignment can be broken. Not sure if only that.

AIC100 doesn't have required alignment. AIC100 can access any 64-bit address, at a byte level granularity. The only restriction AIC100 has is that the size of a transfer is restricted to a 32-bit value, so max individual transfer size of 4GB. Transferring more than 4GB requires multiple transactions.

Are you suggesting renaming
this function?  I guess I'm not quite understanding your comment here. Can
you elaborate?

Renaming would be nice. I was thinking by simplifying it, not sure
now if that's easy achievable, though.

Ok.  I'll think on this.

Maybe this function could be removed ? And create sg lists
that hardware can handle without any modification.
Just idea to consider, not any requirement.

Ok, so this is part of our "slicing" operation, and thus required.

Maybe how slicing works is not clear.

Lets say that we have a workload on AIC100 that can identify a license plate in a picture (aka lprnet). Lets assume this workload only needs the RGB values of a RGBA file (a "jpeg" we are processing).

Userspace allocates a BO to hold the entire file. A quarter of the file is R values, a quarter is G values, etc. For simplicity, lets assume the R values are all sequentially listed, then the G values, then the B values, finally the A values. When we allocate the BO, we map it once. If we have an IOMMU, this optimizes the IOMMU mappings. BOs can be quite large. We have some test workloads based on real world workloads where each BO is 16-32M in size, and there are multiple BOs. I don't want to map a 32M BO N duplicate times in the IOMMU.

So, now userspace slices the BO. It tells us we need to transfer the RGB values (the first 75% of the BO), but not the A values. So, we create a copy of the mapped SG and edit it to represent this transfer, which is a subset of the entire BO. Using the slice information and the mapping information, we construct the DMA engine commands that can be used to transfer the relevant portions of the BO to the device.

It sounds like you are suggesting, lets flip this around. Don't map the entire BO once. Instead, wait for the slice info from userspace, construct a sg list based on the parts of the BO for the slice, and map that. Then the driver gets a mapped SG it can just use. The issue I see with that is slices can overlap. You can transfer the same part of a BO multiple times. Maybe lprnet has multiple threads on AIC100 where thread A consumes R data, thread B consumes R and G data, and thread C consumes B data. We need to transfer the R data twice to different device locations so that threads A and B can consume the R data independently.

If we map per slice, we are going to map the R part of the BO twice in the IOMMU. Is that valid? It feels possible that there exists some IOMMU implementation that won't allow multiple IOVAs to map to the same DDR PA because that is weird and the implementer thinks its a software bug. I don't want to run into that. Assuming it is valid, that is multiple mappings in the IOMMU TLB which could have been a single mapping. We are wasting IOMMU resources.

There are some ARM systems we support with limited IOVA space in the IOMMU, and we've had some issues with exhausting that space. The current implementation is influenced by those experiences.

I appreciate the outsider perspective (here and elsewhere). Re-evaluating some of these things is resulting in improvements.

-Jeff



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux