Re: vm binding interfaces and parallel with mmap

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 22, 2022 at 8:27 AM Christian König <christian.koenig@xxxxxxx> wrote:
Am 22.08.22 um 10:34 schrieb Bas Nieuwenhuizen:
> On Mon, Aug 22, 2022 at 9:28 AM Dave Airlie <airlied@xxxxxxxxx> wrote:
>> On Mon, 22 Aug 2022 at 17:05, Dave Airlie <airlied@xxxxxxxxx> wrote:
>>> Hey,
>>>
>>> I've just been looking at the vm bind type interfaces and wanted to at
>>> least document how we think the unmapping API should work. I know I've
>>> talked on irc before about this, but wanted to solidify things a bit
>>> more around what is required vs what is a nice to have.
>>>
>>> My main concerns/thoughts are around the unbind interfaces and how
>>> close to munmap they should be.
>>>
>>> I think the mapping operation is mostly consistent
>>> MAP(bo handle, offset into bo, range, VM offset, VM flags)
>>> which puts the range inside to bo at the offset in the current VM
>>> (maybe take an optional vm_id).
>>>
>>> now the simplest unmap I can see if one that parallel munmap
>>> UNMAP(vmaddr, range);
>>>
>>> But it begs the question on then how much the kernel needs to deal
>>> with here, if we support random vmaddr,range then we really need to be
>>> able to do everything munmap does for CPU VMA, which means splitting
>>> ranges, joining ranges etc.
>>>
>>> like
>>> MAP(1, 0, 0x8000, 0xc0000)
>>> UNMAP(0xc1000, 0x1000)
>>> should that be possible?
>>>
>>> Do we have any API usage (across Vulkan/CL/CUDA/ROCm etc) that
>>> requires this sort of control, or should we be fine with only
>>> unmapping objects exactly like how they were mapped in the first
>>> place, and not have any splitting/joining?
> Vulkan allows for this, though I haven't checked to what extent apps use it.

This is massively used for partial resident textures under OpenGL as far
as I know.

E.g. you map a range like 1->10 as PRT and then then map real textures
at 2, 5 and 7 or something like that.

Saying that a functionality to map/enable PRT for a range is necessary
as well. On amdgpu we have a special flag for that and in this case the
BO to map can be NULL.

NVIDIA has similar bits.  I don't remember if intel does or not.  Handling this as a map with BO=0 and a PRT flag of some sort seems like a perfectly reasonable way to handle it.
 
> We could technically split all mapping/unmapping to be per single tile
> in the userspace driver, which avoids the need for splitting/merging,
> but that could very much be a pessimization.

That would be pretty much a NAK from my side. A couple of hardware
optimizations require mappings to be as large as possible.

Otherwise we wouldn't be able to use huge/giant (2MiB, 1GiB) pages,
power of two TLB reach optimizations (8KiB, 16KiB, 32KiB.....) as well
as texture fetcher optimizations.

Agreed.  Intel has such optimizations as well and they really do matter.  IDK about nvidia but I'd be surprised if they don't at least have a 2M variant or something.  Reducing page-table depth matters a lot for latency.
 
>> I suppose it also asks the question around paralleling
>>
>> fd = open()
>> ptr = mmap(fd,)
>> close(fd)
>> the mapping is still valid.
>>
>> I suppose our equiv is
>> handle = bo_alloc()
>> gpu_addr = vm_bind(handle,)
>> gem_close(handle)
>> is the gpu_addr still valid does the VM hold a reference on the kernel
>> bo internally.
> For Vulkan it looks like this is undefined and the above is not necessary:
>
> "It is important to note that freeing a VkDeviceMemory object with
> vkFreeMemory will not cause resources (or resource regions) bound to
> the memory object to become unbound. Applications must not access
> resources bound to memory that has been freed."
> (32.7.6)

I'm not sure about this particular question.  We need to be sure that maps get cleaned up eventually.  On the one hand, I think it's probably a valid API implementation to have each mapped page hold a reference similar to mmap and have vkDestroyImage or vkDestroyBuffer do an unmap to clean up the range.  However, clients may be surprised when they destroy a large memory object and can't reap the memory because of extra BO references they don't know about.  If BOs unmap themselves on close or if we had a way to take a VM+BO and say "unmap this BO from everywhere, please", we can clean up the memory pretty easily.  Without that, it's a giant PITA to do entirely inside the userspace driver because it requires us to globally track every mapping and that means data structures and locks.  Yes, such an ioctl would require the kernel to track things but the kernel is already tracking everything that's bound, so hopefully it doesn't add much.

--Jason
 
Additional to what was discussed here so far we need an array on in and
out drm_syncobj for both map as well as unmap.

E.g. when the mapping/unmapping should happen and when it is completed
etc...

Christian.

>
>
>> Dave.
>>> Dave.


[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux