[AMD Official Use Only - Internal Distribution Only] > -----Original Message----- > From: Borislav Petkov <bp@xxxxxxxxx> > Sent: Thursday, May 13, 2021 10:58 AM > To: Alex Deucher <alexdeucher@xxxxxxxxx> > Cc: Joshi, Mukul <Mukul.Joshi@xxxxxxx>; x86-ml <x86@xxxxxxxxxx>; > Kasiviswanathan, Harish <Harish.Kasiviswanathan@xxxxxxx>; lkml <linux- > kernel@xxxxxxxxxxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH] drm/amdgpu: Register bad page handler for Aldebaran > > [CAUTION: External Email] > > On Thu, May 13, 2021 at 10:32:45AM -0400, Alex Deucher wrote: > > Right. The sys admin can query the bad page count and decide when to > > retire the card. > > Yap, although the driver should actively "tell" the sysadmin when some critical > counts of retired VRAM pages are reached because I doubt all admins would go > look at those counts on their own. > > Btw, you say "admin" - am I to understand that those are some high end GPU > cards with ECC memory? If consumer grade stuff has this too, then the driver > should very much warn on such levels on its own because normal users won't > know what and where to look. > > Other than that, the big picture sounds good to me. > Since now you are OK with how page retirement works, lets revisit the original question. Are you OK with a new MCE priority (MCE_PRIO_ACCEL) or do you want us to use something else? Thanks, Mukul > Thx. > > -- > Regards/Gruss, > Boris. > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpeople. > kernel.org%2Ftglx%2Fnotes-about- > netiquette&data=04%7C01%7CMukul.Joshi%40amd.com%7C50588f11ed5 > 3456b03e008d9161f765c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0 > %7C637565146658376385%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw > MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata > =Es0FMDNzNEKgxvFiqe1kOo9aEPK6%2BOXrhI5aWs3QH9Q%3D&reserved= > 0 _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx