On 1/16/24 15:05, Christian König wrote:
Am 16.01.24 um 14:48 schrieb Joshua Ashton:
[SNIP]
Going to adjust the implementation accordingly.
Awesome, please CC me know when you have something.
Sure, going to keep that in mind.
In the short-term I have changed if (r && r != -ENODATA) to if (r)
and that seems to work nicely for me.
One problem with solely relying on the CS submission return value
from userspace is cancelled syncobj waits.
For example, if we have an application with one thread that makes a
submission, and then kicks off a vkWaitSemaphores to wait on a
semaphore on another thread and that submission hangs, the syncobj
relating to the vkWaitSemaphores should be signalled which is fine,
but we need to return VK_ERROR_DEVICE_LOST if the context loss
resulted in the signal for the VkSemaphore.
The way this was previously integrated was with the query thing,
which as we have established does not provide correct information
regarding soft recovery at the moment.
Unless you have an alternative for us to get some error out of the
syncobj (eg. -ENODATA), then right now we still require the query.
I think fixing the -ENODATA reporting back for submit is a good
step, but I believe we still need the query to report the same
information as we would have gotten from that here.
Yeah, exactly that was one of the reasons why the guilty handling
approach didn't solved the problem sufficiently.
If I remember correctly at least for the syncobj used on Android you can
actually query the result of the execution after waiting for them to
signal.
So not only the issuer of a submission can get the result, but also
everybody waiting for the result. In other words Wayland, X, etc... can
implement graceful handling should an application send them nonsense.
That would be great. Right now the vkQueueSubmit for a hanging
application ends up going through as the Wayland commit has been made,
and the fence gets signalled.
This means we can get an image that is complete garbage from a hung app
-- no DCC retile occured for it, which is pretty funny as the scanout
image looks garbage, but it displays whatever it last was in composite. xD
Getting the fence error on the compositor side would be great to
disregard such images.
We could also do more useful device loss handling with that on the
driver side.
Thanks!
- Joshie 🐸✨
I don't think we ever implemented something similar for drm_syncobj and
we might need error forwarding in the dma_fence_chain container, but
stuff like that is trivial to implement should the requirement for this
ever come up.
Hmmm, actually the spec states that VK_SUCCESS is valid in this
situation:
Commands that wait indefinitely for device execution (namely
vkDeviceWaitIdle, vkQueueWaitIdle, vkWaitForFences with a maximum
timeout, and vkGetQueryPoolResults with the VK_QUERY_RESULT_WAIT_BIT
bit set in flags) must return in finite time even in the case of a
lost device, and return either VK_SUCCESS or VK_ERROR_DEVICE_LOST.
...
Once a device is lost, command execution may fail, and certain
commands that return a VkResult may return VK_ERROR_DEVICE_LOST.
I guess for now disregard last email regarding us potentially needing
the query, it does seem that returning SUCCESS is completely valid.
Well how you interpret the information the kernel gives you in userspace
is your thing. But we should at least make sure that the general
approach inside the kernel has a design which can handle those
requirements.
Christian.
- Joshie 🐸✨
- Joshie 🐸✨
Thanks
- Joshie 🐸✨
- Joshie 🐸✨
Christian.
Marek