Am 25.11.24 um 18:27 schrieb Matthew Brost:
On Mon, Nov 25, 2024 at 05:19:54PM +0100, Christian König wrote:
Am 25.11.24 um 16:29 schrieb Matthew Brost:
On Fri, Nov 15, 2024 at 10:27:59AM -0800, Matthew Brost wrote:
[SNIP]
We use this interface to read a BO marked with a dumpable flag during a
GPU hang in our error capture code. This is an internal KMD feature, not
directly exposed to user space. Would adding this helper be acceptable
for this use case? I can add kernel indicating the current restrictions
of the helper (do not directly expose to user space) too if that would
help.
Christian - ping on above.
Sorry, I will try to give those mailing list tasks a bit more time in before
the xmas holidays.
That is an acceptable use case, but the problem is that this helper won't
work for that.
See during a GPU hang you can't lock BOs, so how do you want to look into
their content with the peek helper?
Agree we cannot lock BO directly in GPU hang path (TDR). Our error
capture code takes a snapshot of some the GPU state which is small and
safe to capture in TDR and kicks a worker which opportunistically
captures the VM state which has been marked to be captured. This is
where the helper is called and it is safe to lock the BO.
Yeah that sounds like it should work.
No objections from my side for that use case, but I would rather like to
keep the code inside ttm_bo_vm.c.
Crash dumping is usually something associated with the VMA even if it's
a bit special here for the VM state.
Regards,
Christian.
Matt
The only thing you could potentially do is to trylock the BO and then dump,
but that would most likely be a bit unreliable.
Regards,
Christian.
Matt