> Adding Tony so he can either confirm, or point and laugh at my > assumptions. In general you're right that there are machine check > events that are not recoverable, but I'm thinking of problems like bus > lockups and other disasters out of the direct cpu-to-memory data path. > The question is whether should we avoid the cpu consuming media errors > at all costs regardless of machine-check recovery. Tony might there be > system-fatal gaps in memcpy_mcsafe() or userspace poison consumption > handling that you would recommend aggressively trying to avoid media > errors? TL;DR - I think it is worth it ... but I worry more about errors than most people. In current generation systems the two most common sources of machine checks are memory, and I/O. They dwarf all the others like cache and bus lockups. So it is worth trying to avoid memory issues. Whether you can recover from a machine check triggered from a CPU read of memory depends on which instructions you use, and the alignment of the access. That's why memcpy_mcsafe() will start with a few byte reads if needed to align the source address while other copy routines prefer to align the destination ... memory writes that straddle cache lines are more expensive than reads that do that ... but the point of the routine is to be safe, so we drop a tiny amount of performance in the unaligned case to make sure we will be able to recover. We can't control how userspace will access memory ... so if we can find the errors before they stumble into them it is a win. -Tony ��.n��������+%������w��{.n�����{�����ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f