On Thu, Mar 13, 2014 at 05:39:42PM -0400, Naoya Horiguchi wrote: > Unifying error reporting between memory error and normal IO errors is ideal > in a long run, but at first let's solve it separately. I hope that some code > in this patch will be helpful when thinking of the unification. The mechanisms should be very similar, right? It may be better to do both at the same time. > index 60829565e552..1e8966919044 100644 > --- v3.14-rc6.orig/include/linux/fs.h > +++ v3.14-rc6/include/linux/fs.h > @@ -475,6 +475,9 @@ struct block_device { > #define PAGECACHE_TAG_DIRTY 0 > #define PAGECACHE_TAG_WRITEBACK 1 > #define PAGECACHE_TAG_TOWRITE 2 > +#ifdef CONFIG_MEMORY_FAILURE > +#define PAGECACHE_TAG_HWPOISON 3 > +#endif No need to ifdef defines > @@ -1133,6 +1139,10 @@ static void do_generic_file_read(struct file *filp, loff_t *ppos, > if (unlikely(page == NULL)) > goto no_cached_page; > } > + if (unlikely(PageHWPoison(page))) { > + error = -EHWPOISON; > + goto readpage_error; > + } Didn't we need this check before independent of the rest of the patch? > if (PageReadahead(page)) { > page_cache_async_readahead(mapping, > ra, filp, page, > @@ -2100,6 +2110,10 @@ inline int generic_write_checks(struct file *file, loff_t *pos, size_t *count, i > if (unlikely(*pos < 0)) > return -EINVAL; > > + if (unlikely(mapping_hwpoisoned_range(file->f_mapping, *pos, > + *pos + *count))) > + return -EHWPOISON; How expensive is that check? This will happen on every write. Can it be somehow combined with the normal page cache lookup? > * Dirty pagecache page > + * > + * Memory error reporting (important especially on dirty pagecache error > + * because dirty data is lost) with AS_EIO flag has some problems: It doesn't make sense to have changelogs in comments. That is what git is for. At some point noone will care about the previous code. > + * To solve these, we handle dirty pagecache errors by replacing the error This part of the comment is good. > + pgoff_t index; > + struct inode *inode = NULL; > + struct page *new; > > SetPageError(p); > - /* TBD: print more information about the file. */ > if (mapping) { > + index = page_index(p); > + /* > + * we take inode refcount to keep it's pagecache or mapping > + * on the memory until the error is resolved. How does that work? Who "resolves" the error? > + */ > + inode = igrab(mapping->host); > + pr_info("MCE %#lx: memory error on dirty pagecache (page offset:%lu, inode:%lu, dev:%s)\n", Add the word file somewhere, you need to explain this in terms normal sysadmins and not only kernel hackers can understand. -Andi -- ak@xxxxxxxxxxxxxxx -- Speaking for myself only. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>