Re: [PATCH] drm/amdgpu: cache in more vm fault information

"Khatri, Sunil" <sukhatri@xxxxxxx> · Wed, 6 Mar 2024 21:56:48 +0530

On 3/6/2024 9:49 PM, Christian König wrote:
Am 06.03.24 um 17:06 schrieb Khatri, Sunil:

On 3/6/2024 9:07 PM, Christian König wrote:
Am 06.03.24 um 16:13 schrieb Khatri, Sunil:

On 3/6/2024 8:34 PM, Christian König wrote:
Am 06.03.24 um 15:29 schrieb Alex Deucher:
On Wed, Mar 6, 2024 at 8:04 AM Khatri, Sunil <sukhatri@xxxxxxx> 
wrote:

On 3/6/2024 6:12 PM, Christian König wrote:
Am 06.03.24 um 11:40 schrieb Khatri, Sunil:
On 3/6/2024 3:37 PM, Christian König wrote:
Am 06.03.24 um 10:04 schrieb Sunil Khatri:
When an  page fault interrupt is raised there
is a lot more information that is useful for
developers to analyse the pagefault.
Well actually those information are not that interesting because
they are hw generation specific.

You should probably rather use the decoded strings here, e.g. 
hub,
client, xcc_id, node_id etc...

See gmc_v9_0_process_interrupt() an example.
I saw this v9 does provide more information than what v10 and 
v11
provide like node_id and fault from which die but thats again 
very
specific to IP_VERSION(9, 4, 3)) i dont know why thats 
information
is not there in v10 and v11.
I agree to your point but, as of now during a pagefault we are
dumping this information which is useful like which client
has generated an interrupt and for which src and other 
information
like address. So i think to provide the similar information in 
the
devcoredump.

Currently we do not have all this information from either job 
or vm
being derived from the job during a reset. We surely could add 
more
relevant information later on as per request but this 
information is
useful as
eventually its developers only who would use the dump file 
provided
by customer to debug.

Below is the information that i dump in devcore and i feel 
that is
good information but new information could be added which 
could be
picked later.

Page fault information
[gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32773)
in page starting at address 0x0000000000000000 from client 
0x1b (UTCL2)
This is a perfect example what I mean. You record in the patch 
is the
client_id, but this is is basically meaningless unless you have 
access
to the AMD internal hw documentation.

What you really need is the client in decoded form, in this case
UTCL2. You can keep the client_id additionally, but the decoded 
client
string is mandatory to have I think.

Sure i am capturing that information as i am trying to minimise 
the
memory interaction to minimum as we are still in interrupt context
here that why i recorded the integer information compared to 
decoding
and writing strings there itself but to postpone till we dump.

Like decoding to the gfxhub/mmhub based on vmhub/vmid_src and 
client
string from client id. So are we good to go with the information 
with
the above information of sharing details in devcoredump using the
additional information from pagefault cached.
I think amdgpu_vm_fault_info() has everything you need already 
(vmhub,
status, and addr).  client_id and src_id are just tokens in the
interrupt cookie so we know which IP to route the interrupt to. We
know what they will be because otherwise we'd be in the interrupt
handler for a different IP.  I don't think ring_id has any useful
information in this context and vmid and pasid are probably not too
useful because they are just tokens to associate the fault with a
process.  It would be better to have the process name.

Just to share context here Alex, i am preparing this for 
devcoredump, my intention was to replicate the information which in 
KMD we are sharing in Dmesg for page faults. If assuming we do not 
add client id specially we would not be able to share enough 
information in devcoredump.
It would be just address and hub(gfxhub/mmhub) and i think that is 
partial information as src id and client id and ip block shares 
good information.

For process related information we are capturing that information 
part of dump from existing functionality.
**** AMDGPU Device Coredump ****
version: 1
kernel: 6.7.0-amd-staging-drm-next
module: amdgpu
time: 45.084775181
process_name: soft_recovery_p PID: 1780

Ring timed out details
IP Type: 0 Ring Name: gfx_0.0.0

Page fault information
[gfxhub] page fault (src_id:0 ring:24 vmid:3 pasid:32773)
in page starting at address 0x0000000000000000 from client 0x1b 
(UTCL2)
VRAM is lost due to GPU reset!

Regards
Sunil


The decoded client name would be really useful I think since the 
fault handled is a catch all and handles a whole bunch of 
different clients.

But that should be ideally passed in as const string instead of 
the hw generation specific client_id.

As long as it's only a pointer we also don't run into the trouble 
that we need to allocate memory for it.

I agree but i prefer adding the client id and decoding it in 
devcorecump using soc15_ih_clientid_name[fault_info->client_id]) is 
better else we have to do an sprintf this string to fault_info in 
irq context which is writing more bytes to memory i guess compared 
to an integer:)

Well I totally agree that we shouldn't fiddle to much in the 
interrupt handler, but exactly what you suggest here won't work.

The client_id is hw generation specific, so the only one who has 
that is the hw generation specific fault handler. Just compare the 
defines here:

https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c#L83 


and here:

https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/gfxhub_v11_5_0.c#L38 


Got your point. Let me see but this is a lot of work in irq context. 
Either we can drop totally the client id thing as alex is suggesting 
here as its always be same client and src id or let me come up with a 
patch and see if its acceptable.

Wait a second, I now realized that you are mixing something up here. 
As Alex said the src_id and client_id in the IV are always the same, 
e.g. the VMC or the UTCL2.

This is the client_id which send the IV to IH so that the IH can write 
it into the ring buffer and we end up in the fault handler.

But additional to that we also have a client_id inside the fault and 
that is the value printed in the logs. This is the client which caused 
the fault inside the VMC or UTCL2.

Yes the value remains the same irrespective of the family. Client always 
will be VMC/UTCL2 so i think as Alex suggested we can drop this 
information or just add a hardcoded string for information purposes only.

Also as Alex pointed we need to decode from status register which 
kind of page fault it is (permission, read, write etc) this all is 
again family specific and thats all in IRQ context. Not feeling good 
about it but let me try to share all that in a new patch.

Yeah, but that is all hw specific. I'm not sure how best to put it 
into a devcoredump.

Maybe just record the 32bit value and re-design the GMC code to have 
that decoded into a string for both the system log and the devcoredump.

Alex suggested a good way to just share the value of status register 
and add family information and let developer use the family/asic id to 
check the register value and decode it manually.

Regards

Sunil.




Regards
Sunil.

Regards,
Christian.


We can argue on values like pasid and vmid and ring id to be taken 
off if they are totally not useful.

Regards
Sunil


Christian.


Alex

regards
sunil

Regards,
Christian.

Regards
Sunil Khatri

Regards,
Christian.

Add all such information in the last cached
pagefault from an interrupt handler.

Signed-off-by: Sunil Khatri <sunil.khatri@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 +++++++--
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 7 ++++++-
   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +-
   drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 2 +-
   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  | 2 +-
   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  | 2 +-
   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 2 +-
   7 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 4299ce386322..b77e8e28769d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2905,7 +2905,7 @@ void amdgpu_debugfs_vm_bo_info(struct
amdgpu_vm *vm, struct seq_file *m)
    * Cache the fault info for later use by userspace in 
debugging.
    */
   void amdgpu_vm_update_fault_cache(struct amdgpu_device 
*adev,
-                  unsigned int pasid,
+                  struct amdgpu_iv_entry *entry,
                     uint64_t addr,
                     uint32_t status,
                     unsigned int vmhub)
@@ -2915,7 +2915,7 @@ void amdgpu_vm_update_fault_cache(struct
amdgpu_device *adev,
xa_lock_irqsave(&adev->vm_manager.pasids, flags);
   -    vm = xa_load(&adev->vm_manager.pasids, pasid);
+    vm = xa_load(&adev->vm_manager.pasids, entry->pasid);
       /* Don't update the fault cache if status is 0.  In 
the multiple
        * fault case, subsequent faults will return a 0 
status which is
        * useless for userspace and replaces the useful fault
status, so
@@ -2924,6 +2924,11 @@ void amdgpu_vm_update_fault_cache(struct
amdgpu_device *adev,
       if (vm && status) {
           vm->fault_info.addr = addr;
           vm->fault_info.status = status;
+        vm->fault_info.client_id = entry->client_id;
+        vm->fault_info.src_id = entry->src_id;
+        vm->fault_info.vmid = entry->vmid;
+        vm->fault_info.pasid = entry->pasid;
+        vm->fault_info.ring_id = entry->ring_id;
           if (AMDGPU_IS_GFXHUB(vmhub)) {
               vm->fault_info.vmhub = AMDGPU_VMHUB_TYPE_GFX;
               vm->fault_info.vmhub |=
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 047ec1930d12..c7782a89bdb5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -286,6 +286,11 @@ struct amdgpu_vm_fault_info {
       uint32_t    status;
       /* which vmhub? gfxhub, mmhub, etc. */
       unsigned int    vmhub;
+    unsigned int    client_id;
+    unsigned int    src_id;
+    unsigned int    ring_id;
+    unsigned int    pasid;
+    unsigned int    vmid;
   };
     struct amdgpu_vm {
@@ -605,7 +610,7 @@ static inline void
amdgpu_vm_eviction_unlock(struct amdgpu_vm *vm)
   }
     void amdgpu_vm_update_fault_cache(struct amdgpu_device 
*adev,
-                  unsigned int pasid,
+                  struct amdgpu_iv_entry *entry,
                     uint64_t addr,
                     uint32_t status,
                     unsigned int vmhub);
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index d933e19e0cf5..6b177ce8db0e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -150,7 +150,7 @@ static int 
gmc_v10_0_process_interrupt(struct
amdgpu_device *adev,
           status = RREG32(hub->vm_l2_pro_fault_status);
WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
   -        amdgpu_vm_update_fault_cache(adev, entry->pasid, 
addr,
status,
+        amdgpu_vm_update_fault_cache(adev, entry, addr, 
status,
                            entry->vmid_src ? 
AMDGPU_MMHUB0(0) :
AMDGPU_GFXHUB(0));
       }
   diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 527dc917e049..bcf254856a3e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -121,7 +121,7 @@ static int 
gmc_v11_0_process_interrupt(struct
amdgpu_device *adev,
           status = RREG32(hub->vm_l2_pro_fault_status);
WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
   -        amdgpu_vm_update_fault_cache(adev, entry->pasid, 
addr,
status,
+        amdgpu_vm_update_fault_cache(adev, entry, addr, 
status,
                            entry->vmid_src ? 
AMDGPU_MMHUB0(0) :
AMDGPU_GFXHUB(0));
       }
   diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
index 3da7b6a2b00d..e9517ebbe1fd 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c
@@ -1270,7 +1270,7 @@ static int 
gmc_v7_0_process_interrupt(struct
amdgpu_device *adev,
       if (!addr && !status)
           return 0;
   -    amdgpu_vm_update_fault_cache(adev, entry->pasid,
+    amdgpu_vm_update_fault_cache(adev, entry,
                        ((u64)addr) << AMDGPU_GPU_PAGE_SHIFT,
status, AMDGPU_GFXHUB(0));
         if (amdgpu_vm_fault_stop == 
AMDGPU_VM_FAULT_STOP_FIRST)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index d20e5f20ee31..a271bf832312 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -1438,7 +1438,7 @@ static int 
gmc_v8_0_process_interrupt(struct
amdgpu_device *adev,
       if (!addr && !status)
           return 0;
   -    amdgpu_vm_update_fault_cache(adev, entry->pasid,
+    amdgpu_vm_update_fault_cache(adev, entry,
                        ((u64)addr) << AMDGPU_GPU_PAGE_SHIFT,
status, AMDGPU_GFXHUB(0));
         if (amdgpu_vm_fault_stop == 
AMDGPU_VM_FAULT_STOP_FIRST)
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 47b63a4ce68b..dc9fb1fb9540 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -666,7 +666,7 @@ static int 
gmc_v9_0_process_interrupt(struct
amdgpu_device *adev,
       rw = REG_GET_FIELD(status, 
VM_L2_PROTECTION_FAULT_STATUS, RW);
       WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
   -    amdgpu_vm_update_fault_cache(adev, entry->pasid, addr,
status, vmhub);
+    amdgpu_vm_update_fault_cache(adev, entry, addr, status, 
vmhub);
         dev_err(adev->dev,
"VM_L2_PROTECTION_FAULT_STATUS:0x%08X\n",