On 08/11/12 08:09, Andi Kleen wrote: > Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> writes: > >> Current memory error handling on dirty pagecache has a bug that user >> processes who use corrupted pages via read() or write() can't be aware >> of the memory error and result in discarding dirty data silently. >> >> The following patch is to improve handling/reporting memory errors on >> this case, but as a short term solution I suggest that we should undo >> the present error handling code and just leave errors for such cases >> (which expect the 2nd MCE to panic the system) to ensure data consistency. > > Not sure that's the right approach. It's not worse than any other IO > errors isn't it? IMO, it's worse in certain cases. For example, producer-consumer type program which uses file as a temporary storage. Current memory-failure.c drops produced data from dirty pagecache and allows reader to consume old or empty data from disk (silently!), that's what I think HWPOISON should prevent. Similar thing could happen theoretically with disk I/O errors, though, practically those errors are often persistent and reader will likely get errors again instead of bad data. Also, ext3/ext4 has an option to panic when an error is detected, for people who want to avoid corruption on intermittent errors. -- Jun'ichi Nomura, NEC Corporation -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>