Am 20.09.23 um 16:02 schrieb Thomas Hellström:
[SNIP]
Do you by "relocation" list refer to what gpuvm calls "evict" list
or something else? Like the relocaton/validation list that used to
be sent from user-space for non-VM_BIND vms?
The BOs send into the kernel with each command submission on the
classic IOCTLs.
The vm bos plus the external/shared bos bound to the VM (the
external list) are the bos being referenced by the current batch. So
the bos on the VM's external list are the ones being locked and
fenced and checked for eviction. If they weren't they could be
evicted before the current batch completes?
That only applies to a certain use case, e.g. Vulkan or user mode
queues.
Multimedia APIs and especially OpenGL work differently, here only the
BOs mentioned in the relocation list are guaranteed to not be evicted.
This is intentional because those APIs tend to over allocate memory
all the time, so for good performance you need to be able to evict
BOs from the VM while other parts of the VM are currently in use.
Without that especially OpenGL performance would be completely
crippled at least on amdgpu.
OK, I've always wondered how overcommiting a local VM would be handled
on VM_BIND, where we don't have the relocation list, at least not in
xe, so we have what you refer to as the user mode queues.
I figure those APIs that suffer from overcommitting would maintain a
"current working set" in user-space and send changes as deltas to the
kernel as unbinds/binds. Or at least "can be unbound / can no longer
be unbound" advises.
This may turn out interesting.
Essentially this is how Windows used to work till (I think) Windows 8.
Basically the kernel is responsible to figure out which BO to move
in/out of VRAM for each submission an application does. And it is
perfectly acceptable for an application to allocate 8GiB of VRAM when
only 4GiB is physical available.
To be honest I think it's one of the worst things every invented, but we
somehow have to support it for some use cases.
Christian.
/Thomas