On Mon, Sep 13, 2010 at 10:01:20PM +0200, Michael S. Tsirkin wrote: > So I think the following will give the idea of what an API > might look like that will let us avoid the scary hacks in > e.g. the ide layer and other generic layers that need to do DMA, > without either binding us to pci, adding more complexity with > callbacks, or losing type safety with casts and void*. > > Basically we have DMADevice that we can use container_of on > to get a PCIDevice from, and DMAMmu that will get instanciated > in a specific MMU. > > This is not complete code - just a header - I might complete > this later if/when there's interest or hopefully someone interested > in iommu emulation will. Hi, I personally like this approach better. It also seems to make poisoning cpu_physical_memory_*() easier if we convert every device to this API. We could then ban cpu_physical_memory_*(), perhaps by requiring a #define and #ifdef-ing those declarations. > Notes: > the IOMMU_PERM_RW code seem unused, so I replaced > this with plain is_write. Is it ever useful? The original idea made provisions for stuff like full R/W memory maps. In that case cpu_physical_memory_map() would call the translation / checking function with perms == IOMMU_PERM_RW. That's not there yet so it can be removed at the moment, especially since it only affects these helpers. Also, I'm not sure if there are other sorts of accesses besides reads and writes we want to check or translate. > It seems that invalidate callback should be able to > get away with just a device, so I switched to that > from a void pointer for type safety. > Seems enough for the users I saw. I think this makes matters too complicated. Normally, a single DMADevice should be embedded within a <bus>Device, so doing this makes it really hard to invalidate a specific map when there are more of them. It forces device code to act as a bus, provide fake 'DMADevice's for each map and dispatch translation to the real DMATranslateFunc. I see no other way. If you really want more type-safety (although I think this is a case of a true opaque identifying something only device code understands), I have another proposal: have a DMAMap embedded in the opaque. Example from dma-helpers.c: typedef struct { DMADevice *owner; [...] } DMAMap; typedef struct { [...] DMAMap map; [...] } DMAAIOCB; /* The callback. */ static void dma_bdrv_cancel(DMAMap *map) { DMAAIOCB *dbs = container_of(map, DMAAIOCB, map); [...] } The upside is we only need to pass the DMAMap. That can also contain details of the actual map in case the device wants to release only the relevant range and remap the rest. > I saw devices do stl_le_phys and such, these > might need to be wrapped as well. stl_le_phys() is defined and used only by hw/eepro100.c. That's already dealt with by converting the device. Thanks, Eduard > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> > > --- > > diff --git a/hw/dma_rw.h b/hw/dma_rw.h > new file mode 100644 > index 0000000..d63fd17 > --- /dev/null > +++ b/hw/dma_rw.h > @@ -0,0 +1,122 @@ > +#ifndef DMA_RW_H > +#define DMA_RW_H > + > +#include "qemu-common.h" > + > +/* We currently only have pci mmus, but using > + a generic type makes it possible to use this > + e.g. from the generic ide code without callbacks. */ > +typedef uint64_t dma_addr_t; > + > +typedef struct DMAMmu DMAMmu; > +typedef struct DMADevice DMADevice; > + > +typedef int DMATranslateFunc(DMAMmu *mmu, > + DMADevice *dev, > + dma_addr_t addr, > + dma_addr_t *paddr, > + dma_addr_t *len, > + int is_write); > + > +typedef int DMAInvalidateMapFunc(DMADevice *); > +struct DMAMmu { > + /* invalidate, etc. */ > + DmaTranslateFunc *translate; > +}; > + > +struct DMADevice { > + DMAMmu *mmu; > + DMAInvalidateMapFunc *invalidate; > +}; > + > +void dma_device_init(DMADevice *, DMAMmu *, DMAInvalidateMapFunc *); > + > +static inline void dma_memory_rw(DMADevice *dev, > + dma_addr_t addr, > + void *buf, > + uint32_t len, > + int is_write) > +{ > + uint32_t plen; > + /* Fast-path non-iommu. > + * More importantly, makes it obvious what this function does. */ > + if (!dev->mmu) { > + cpu_physical_memory_rw(paddr, buf, plen, is_write); > + return; > + } > + while (len) { > + err = dev->mmu->translate(iommu, dev, addr, &paddr, &plen, is_write); > + if (err) { > + return; > + } > + > + /* The translation might be valid for larger regions. */ > + if (plen > len) { > + plen = len; > + } > + > + cpu_physical_memory_rw(paddr, buf, plen, is_write); > + > + len -= plen; > + addr += plen; > + buf += plen; > + } > +} > + > +void *dma_memory_map(DMADevice *dev, > + dma_addr_t addr, > + uint32_t *len, > + int is_write); > +void dma_memory_unmap(DMADevice *dev, > + void *buffer, > + uint32_t len, > + int is_write, > + uint32_t access_len); > + > + > ++#define DEFINE_DMA_LD(suffix, size) \ > ++uint##size##_t dma_ld##suffix(DMADevice *dev, dma_addr_t addr) \ > ++{ \ > ++ int err; \ > ++ target_phys_addr_t paddr, plen; \ > ++ if (!dev->mmu) { \ > ++ return ld##suffix##_phys(addr, val); \ > ++ } \ > ++ \ > ++ err = dev->mmu->translate(dev->bus->iommu, dev, \ > ++ addr, &paddr, &plen, IOMMU_PERM_READ); \ > ++ if (err || (plen < size / 8)) \ > ++ return 0; \ > ++ \ > ++ return ld##suffix##_phys(paddr); \ > ++} > ++ > ++#define DEFINE_DMA_ST(suffix, size) \ > ++void dma_st##suffix(DMADevice *dev, dma_addr_t addr, uint##size##_t val) \ > ++{ \ > ++ int err; \ > ++ target_phys_addr_t paddr, plen; \ > ++ \ > ++ if (!dev->mmu) { \ > ++ st##suffix##_phys(addr, val); \ > ++ return; \ > ++ } \ > ++ err = dev->mmu->translate(dev->bus->iommu, dev, \ > ++ addr, &paddr, &plen, IOMMU_PERM_WRITE); \ > ++ if (err || (plen < size / 8)) \ > ++ return; \ > ++ \ > ++ st##suffix##_phys(paddr, val); \ > ++} > + > +DEFINE_DMA_LD(ub, 8) > +DEFINE_DMA_LD(uw, 16) > +DEFINE_DMA_LD(l, 32) > +DEFINE_DMA_LD(q, 64) > + > +DEFINE_DMA_ST(b, 8) > +DEFINE_DMA_ST(w, 16) > +DEFINE_DMA_ST(l, 32) > +DEFINE_DMA_ST(q, 64) > + > +#endif > diff --git a/hw/pci.h b/hw/pci.h > index 1c6075e..9737f0e 100644 > --- a/hw/pci.h > +++ b/hw/pci.h > @@ -5,6 +5,7 @@ > #include "qobject.h" > > #include "qdev.h" > +#include "dma_rw.h" > > /* PCI includes legacy ISA access. */ > #include "isa.h" > @@ -119,6 +120,10 @@ enum { > > struct PCIDevice { > DeviceState qdev; > + > + /* For devices that do DMA. */ > + DMADevice dma; > + > /* PCI config space */ > uint8_t *config; > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html