Hello, On Fri, Aug 10, 2012 at 04:13:03PM -0700, Andi Kleen wrote: > Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> writes: > > > Current error reporting of memory errors on dirty pagecache has silent > > data lost problem because AS_EIO in struct address_space is cleared > > once checked. > > Seems very complicated. I think I would prefer something simpler > if possible, especially unless it's proven the case is common. > It's hard to maintain rarely used error code when it's complicated. I'm not sure if memory error is a rare event, because I don't have any numbers about that on real systems. But assuming that hwpoison events are not rare, dirty pagecache error is not an ignorable case because dirty page ratio is typically ~10% of total physical memory in average systems. It may be small but not negligible. > Maybe try Fengguang's simple proposal first? That would fix other IO > errors too. In my understanding, Fengguang's patch (specified in this patch's description) only fixes memory error reporting. And I'm not sure that similar appoarch (like making AS_EIO sticky) really fixes the IO errors because this change can break userspace applications which expect the current behavior. Anyway, OK, I agree to start with Fengguang's one and separate out the additional suggestion about "making dirty pagecache error recoverable". And if possible, I want your feedback about the additional part of my idea. Can I ask a favor? Thanks, Naoya -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>