Re: [PATCH v8 2/6] drm/xe/xe_gt_pagefault: Move pagefault struct to header

Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx> · Sat, 15 Mar 2025 15:45:13 +0100

On 14.03.2025 23:06, Cavitt, Jonathan wrote:
> -----Original Message-----
> From: Wajdeczko, Michal <Michal.Wajdeczko@xxxxxxxxx> 
> Sent: Friday, March 14, 2025 10:02 AM
> To: Cavitt, Jonathan <jonathan.cavitt@xxxxxxxxx>; intel-xe@xxxxxxxxxxxxxxxxxxxxx
> Cc: Gupta, saurabhg <saurabhg.gupta@xxxxxxxxx>; Zuo, Alex <alex.zuo@xxxxxxxxx>; joonas.lahtinen@xxxxxxxxxxxxxxx; Brost, Matthew <matthew.brost@xxxxxxxxx>; Zhang, Jianxun <jianxun.zhang@xxxxxxxxx>; Lin, Shuicheng <shuicheng.lin@xxxxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx; michal.mzorek@xxxxxxxxx
> Subject: Re: [PATCH v8 2/6] drm/xe/xe_gt_pagefault: Move pagefault struct to header
>>
>> On 13.03.2025 19:34, Jonathan Cavitt wrote:
>>> Move the pagefault struct from xe_gt_pagefault.c to the
>>> xe_gt_pagefault_types.h header file, along with the associated enum values.
>>>
>>> v2:
>>> - Normalize names for common header (Matt Brost)
>>>
>>> v3:
>>> - s/Migrate/Move (Michal W)
>>> - s/xe_pagefault/xe_gt_pagefault (Michal W)
>>> - Create new header file, xe_gt_pagefault_types.h (Michal W)
>>> - Add kernel docs (Michal W)
>>>
>>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@xxxxxxxxx>
>>> Cc: Michal Wajdeczko <Michal.Wajdeczko@xxxxxxxxx>
>>> ---
>>
>> ...
>>
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.h b/drivers/gpu/drm/xe/xe_gt_pagefault.h
>>> index 839c065a5e4c..69b700c4915a 100644
>>> --- a/drivers/gpu/drm/xe/xe_gt_pagefault.h
>>> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.h
>>> @@ -8,6 +8,8 @@
>>>  
>>>  #include <linux/types.h>
>>>  
>>> +#include "xe_gt_pagefault_types.h"
>>
>> it's not needed here, move it to .c
>>
>>> +
>>>  struct xe_gt;
>>>  struct xe_guc;
>>>  
>>> diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault_types.h b/drivers/gpu/drm/xe/xe_gt_pagefault_types.h
>>> new file mode 100644
>>> index 000000000000..90b7085d4b8e
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/xe/xe_gt_pagefault_types.h
>>> @@ -0,0 +1,67 @@
>>> +/* SPDX-License-Identifier: MIT */
>>> +/*
>>> + * Copyright (c) 2022-2025 Intel Corporation
>>> + */
>>> +
>>> +#ifndef _XE_GT_PAGEFAULT_TYPES_H_
>>> +#define _XE_GT_PAGEFAULT_TYPES_H_
>>> +
>>
>> don't forget to
>>
>> #include <linux/types.h>
> 
> That explains why the kernel failed to compile on CI.  It was compiling just
> fine locally, so that's why I missed this.
> 
>>
>>> +/**
>>> + * struct xe_gt_pagefault - Structure of pagefaults returned by the
>>> + * pagefault handler
>>> + */
>>> +struct xe_gt_pagefault {
>>> +	/** @page_addr: faulted address of this pagefault */
>>> +	u64 page_addr;
>>> +	/** @asid: ASID of this pagefault */
>>> +	u32 asid;
>>> +	/** @pdata: PDATA of this pagefault */
>>> +	u16 pdata;
>>> +	/** @vfid: VFID of this pagefault */
>>> +	u8 vfid;
>>
>> btw, IIRC the VFID from the descriptor will be zero'ed
>> does it make sense to keep it here?
> 
> Is the argument that every time pf->vfid is accessed, it's guaranteed to be

it's FIELD_GET(PFD_VFID, desc->dw2) guaranteed to be zero

> zero?  I can't counter that claim, but wouldn't it be safer to keep reporting
> the VFID in case we ever hit a case where it's no longer zero?

it can't be non-zero, look at GuC ABI spec that says:

".. with the VF number being zero’d out in the descriptor."

> 
> Also, did we know it would always be zero when we were making the
> pagefault struct originally?  If so, why did we include the vfid originally?

dunno, ask authors, it was pushed part of the initial commit
dd08ebf6c352 that was *not* going through normal review cycle

> 
>>
>>> +	/**
>>> +	 * @access_type: access type of this pagefault, as a value
>>> +	 * from xe_gt_pagefault_access_type
>>> +	 */
>>> +	u8 access_type;
>>> +	/**
>>> +	 * @fault_type: fault type of this pagefault, as a value
>>> +	 * from xe_gt_pagefault_fault_type
>>> +	 */
>>> +	u8 fault_type;
>>> +	/** @fault_level: fault level of this pagefault */
>>> +	u8 fault_level;
>>> +	/** @engine_class: engine class this pagefault was reported on */
>>> +	u8 engine_class;
>>> +	/** @engine_instance: engine instance this pagefault was reported on */
>>> +	u8 engine_instance;
>>> +	/** @fault_unsuccessful: flag for if the pagefault recovered or not */
>>> +	u8 fault_unsuccessful;
>>> +	/** @prefetch: unused */
>>> +	bool prefetch;
>>> +	/** @trva_fault: is set if this is a TRTT fault */
>>> +	bool trva_fault;
>>> +};
>>> +
>>> +/**
>>> + * enum xe_gt_pagefault_access_type - Access type reported to the xe_gt_pagefault
>>> + * struct.  Saved to xe_gt_pagefault@access_type
>>
>> this seems to be copied from G2H descriptor as-is.
>> so shouldn't this be part of the GuC ABI?
>> or based on HW ABI if GuC is just a proxy
> 
> What information should I be including in the kernel docs for these enums?

are you asking about xe enums or GuC ABI defs/enums?

for the xe enums, we can say this is what HW/GuC provides
for the HW defs, we can likely drop kernel-doc
we can just point to the Bspec in the commit msg

> -Jonathan Cavitt
> 
>>
>>> + */
>>> +enum xe_gt_pagefault_access_type {
>>> +	XE_GT_PAGEFAULT_ACCESS_TYPE_READ = 0,
>>> +	XE_GT_PAGEFAULT_ACCESS_TYPE_WRITE = 1,
>>> +	XE_GT_PAGEFAULT_ACCESS_TYPE_ATOMIC = 2,
>>> +	XE_GT_PAGEFAULT_ACCESS_TYPE_RESERVED = 3,
>>> +};
>>> +
>>> +/**
>>> + * enum xe_gt_pagefault_fault_type - Fault type reported to the xe_gt_pagefault
>>> + * struct.  Saved to xe_gt_pagefault@fault_type
>>
>> ditto
>>
>>> + */
>>> +enum xe_gt_pagefault_fault_type {
>>> +	XE_GT_PAGEFAULT_TYPE_NOT_PRESENT = 0,
>>> +	XE_GT_PAGEFAULT_TYPE_WRITE_ACCESS_VIOLATION = 1,
>>> +	XE_GT_PAGEFAULT_TYPE_ATOMIC_ACCESS_VIOLATION = 2,
>>> +};
>>> +
>>> +#endif
>>
>>