Hi, On Thu, 2013-08-22 at 17:30 +0300, Kirill A. Shutemov wrote: [snip] > > Andrew's proposed solution makes sense to me, and is probably the > > easiest way to solve this. > > Move check to no_cached_page? Yes > I don't see how it makes any difference for > page cache miss case: we anyway exclude ->readpage() if it's beyond local > i_size. > And for cache hit local i_size will be most likely cover locally cached > pages. The difference is that as the function is currently written, you cannot get to no_cached_page without first calling page_cache_sync_readahead(), i.e. ->readpages() so that i_size will have been updated, even if ->readpages() doesn't return any read-ahead pages. I guess that it is not very obvious that a call to ->readpages is hidden in page_cache_sync_readahead() but that is the path that should in the common case provide the pages from the fs, rather than the ->readpage call thats further down do_generic_file_read() > > Should we introduce an aop which can be called before i_size check in > no_cached_page path to refresh local i_size? > No, due to the call to ->readpages via page_cache_sync_readahead() it should not be necessary. [snip] > > Yes, it is a different use case, but the same issue applies that without > > a prior call to ->readpage() or ->readpages() then i_size may not be > > correct. > > I believe it's correct, but stale. I think it makes difference in the > context. Use stale value for read vs. write race is okay, but not for read > vs. truncate. > Well stale means possibly incorrect. We just don't know until we've got the cluster lock whether the data is correct or not, so it might as well be incorrect for all the difference it makes. As soon as we drop the cluster lock, the i_size may change. We always invalidate the page cache for the inode if we drop the cluster lock though, so that we know that ->readpage(s) will be called again on the next read to refresh the information. Obviously if someone called fstat, for example, on the inode in the mean time, that would also result in a refresh, as would a number of other fs operations. In both the page_ok: and proposed no_cached_page: situations the i_size can change as soon as the ->readpages call has completed, but the time difference is very small, so that returning an EOF indication in that case is not a problem. The issue with not doing a ->readpages() call prior to the i_size check is that the size may have been sitting around stale for hours, days or weeks, having been updated on other nodes, and thus it would give the wrong result. > Will i_size be set to correct value on the first file open (struct inode > creation)? > Yes, open has to read the inode in order to know what the permissions are. So the i_size will be set then, along with all the other inode information. Steve. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html