On Wed, Feb 8, 2017 at 9:42 AM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote: > On Wed, Feb 8, 2017 at 7:10 AM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote: >> Dan Williams <dan.j.williams@xxxxxxxxx> writes: >> >>> If the platform supports machine-check-recovery then there is little >>> reason to kick off opportunistic scrubs to collect a media error list. >>> That initial scrub is only useful when it might prevent a kernel panic >>> from consuming poison (a media error from memory). >> >> How expensive is the scrub? > > The ACPI spec is not clear, but it could range from benign to > expensive and degrading system performance for 10's of minutes after > boot > >> Even on platforms that support recoverable >> machine checks, it's possible that you get one that is not recoverable. >> You haven't sold me on this change. ;-) >> > > Adding Tony so he can either confirm, or point and laugh at my > assumptions. In general you're right that there are machine check > events that are not recoverable, but I'm thinking of problems like bus > lockups and other disasters out of the direct cpu-to-memory data path. > The question is whether should we avoid the cpu consuming media errors > at all costs regardless of machine-check recovery. Tony might there be > system-fatal gaps in memcpy_mcsafe() or userspace poison consumption > handling that you would recommend aggressively trying to avoid media > errors? > I was able to chat with Ashok and he warned that not all instructions that consume poison can generate a recovery point. So, thanks for prompting the double-check, we should definitely try to collect the badblocks list regardless of the machine check recovery capability of the system. -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html