On Thu, Mar 24, 2011 at 10:25 AM, Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> wrote: > On Thu, Mar 24, 2011 at 08:52:20AM +0100, Thomas Hellstrom wrote: >> On 03/23/2011 03:52 PM, Konrad Rzeszutek Wilk wrote: >> >On Wed, Mar 23, 2011 at 02:17:18PM +0100, Thomas Hellstrom wrote: >> >>On 03/23/2011 01:51 PM, Konrad Rzeszutek Wilk wrote: >> >>>>>I was thinking about this a bit after I found that the PowerPC requires >> >>>>>the 'struct dev'. But I got a question first, what do you with pages >> >>>>>that were allocated to a device that can do 64-bit DMA and then >> >>>>>move it to a device than can 32-bit DMA? Obviously the 32-bit card would >> >>>>>set the TTM_PAGE_FLAG_DMA32 flag, but the 64-bit would not. What is the >> >>>>>process then? Allocate a new page from the 32-bit device and then copy over the >> >>>>>page from the 64-bit TTM and put the 64-bit TTM page? >> >>>>Yes, in certain situations we need to copy, and if it's necessary in >> >>>>some cases to use coherent memory with a struct device assoicated >> >>>>with it, I agree it may be reasonable to do a copy in that case as >> >>>>well. I'm against, however, to make that the default case when >> >>>>running on bare metal. >> >>>This situation could occur on native/baremetal. When you say 'default >> >>>case' you mean for every type of page without consulting whether it >> >>>had the TTM_PAGE_FLAG_DMA32? >> >>No, Basically I mean a device that runs perfectly fine with >> >>alloc_pages(DMA32) on bare metal shouldn't need to be using >> >>dma_alloc_coherent() on bare metal, because that would mean we'd need >> >>to take the copy path above. >> >I think we got the scenarios confused (or I did at least). >> >The scenario I used ("I was thinking.."), the 64-bit device would do >> >alloc_page(GFP_HIGHUSER) and if you were to move it to a 32-bit device >> >it would have to make a copy of the page as it could not reach the page >> >from GFP_HIGUSER. >> > >> >The other scenario, which I think is what you are using, is that >> >we have a 32-bit device allocating a page, so TTM_PAGE_FLAG_DMA32 is set >> >and then we if we were to move it a 64-bit device it would need to >> >copied. But I don't think that is the case - the page would be >> >reachable by the 64-bit device. Help me out please if I am misunderstanding this. >> >> Yes, this is completely correct. >> >> Now, with a struct dev attached to each page in a 32-bit system >> (coherent memory) >> we would need to always copy in the 32-bit case, since you can't >> hand over pages >> belonging to other physical devices. >> But on bare metal you don't need coherent memory, but in this case you >> need to copy anyway becase you choose to allocate coherent memory. > > Ok, can we go back one more step back. I am unclear about one other > thing. Lets think on this in terms of 2.6.38 code. > > When a page in the TTM pool is being moved back and forth and also changes > the caching model, what happens on the free part? Is the original caching > state put back on it? Say I allocated a DMA32 page (GFP_DMA32), and move it > to another pool for another radeon device. I also do some cache changes: > make it write-back, then un-cached, then writeback, and when I am done, I > return it back to the pool (DMA32). Once that is done I want to unload > the DRM/TTM driver. Does that page get its caching state reverted > back to what it originally had (probably un-cached)? And where is this done? When ultimately being free all the page are set to write back again as it's default of all allocated page (see ttm_pages_put). ttm_put_pages will add page to the correct pool (uc or wc). > >> >> I see a sort of a hackish way around these problems. >> >> Let's say ttm were trying to detect a hypervisor dummy virtual >> device sitting on the pci bus. That device would perhaps provide pci >> information detailing what GFP masks needing to >> allocate coherent memory. The TTM page pool could then grab that >> device and create a struct dev to use for allocating "anonymous" TTM >> BO memory. >> >> Could that be a way forward? The struct dev would then be private to >> the page pool code, bare metal wouldn't need to allocate coherent >> memory, since the virtual device wouldn't be present. The page pool >> code would need to be updated to be able to cache also coherent >> pages. >> >> Xen would need to create such a device in the guest with a suitable >> PCI ID that it would be explicitly willing to share with other >> hypervisor suppliers.... >> >> It's ugly, I know, but it might work... > > Or just create a phantom 'struct device' that is used with > the TTM layer. And it has the lowest common denominator of the > different graphic cards that exist. But that is a bit strange b/c > for some machines (IBM boxes), you have per-device DMA API specific > calls. This depends on what PCI bus you have the card as some of these > boxes have multiple IOMMUs. So that wouldn't work that nicely.. > > How about a backend TTM alloc API? So the calls to 'ttm_get_page' > and 'ttm_put_page' call to a TTM-alloc API to do allocation. > > The default one is the native, and it would have those 'dma_alloc_coherent' > removed. When booting under virtualized > environment a virtualisation "friendly" backend TTM alloc would > register and all calls to 'put/get/probe' would be diverted to it. > 'probe' would obviously check whether it should use this backend or not. > > It would mean two new files: drivers/gpu/drm/ttm/ttm-memory-xen.c and > a ttm-memory-generic.c and some header work. > > It would still need to keep the 'dma_address[i]' around so that > those can be passed to the radeon/nouveau GTT, but for native it > could just contain BAD_DMA_ADDRESS - and the code in the radeon/nouveau > GTT binding is smart to figure out to do 'pci_map_single' if the > dma_addr_t has BAD_DMA_ADDRESS. > > The issuer here is with the caching I had a question about. We > would need to reset the caching state back to the original one > before free-ing it. So does the TTM pool de-alloc code deal with this? > Sounds doable. Thought i don't understand why you want virtualized guest to be able to use hw directly. From my point of view all device in a virtualized guest should be virtualized device that talks to the host system driver. Cheers, Jerome _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel