On Mon, Nov 22, 2021 at 9:23 AM Liang, Prike <Prike.Liang@xxxxxxx> wrote: > > [Public] > > > -----Original Message----- > > From: Alex Deucher <alexdeucher@xxxxxxxxx> > > Sent: Friday, November 19, 2021 12:18 AM > > To: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > > Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Christian König > > <ckoenig.leichtzumerken@xxxxxxxxx>; Liang, Prike <Prike.Liang@xxxxxxx>; > > Huang, Ray <Ray.Huang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > Subject: Re: [PATCH] drm/amdgpu: reset asic after system-wide suspend > > aborted > > > > On Thu, Nov 18, 2021 at 10:01 AM Lazar, Lijo <Lijo.Lazar@xxxxxxx> wrote: > > > > > > [Public] > > > > > > > > > BTW, I'm not sure if 'reset always' on resume is a good idea for GPUs in a > > hive (assuming those systems also get suspended and get hiccups). At this > > point the hive isn't reinitialized. > > > > Yeah, we should probably not reset if we are part of a hive. > > > > Alex > > > For the GPU hive reset in this suspend abort case need treat specially, does that because of GPU hive need take care each node reset dependence and synchronous reset? For this purpose, can we skip the hive reset case and only do GPU reset under adev->gmc.xgmi.num_physical_nodes == 0 ? Yes, exactly. For the aborted suspend reset, we can check the value before doing a reset. I think you want to check if adev->gmc.xgmi.num_physical_nodes <= 1. Alex > > > > > > > Thanks, > > > Lijo