On Tue, Jan 18, 2011 at 01:15:27PM +0800, Li, Shaohua wrote: > On Tue, 2011-01-18 at 12:41 +0800, Wu, Fengguang wrote: > > On Mon, Jan 17, 2011 at 09:32:37AM +0800, Li, Shaohua wrote: > > > On Sun, 2011-01-16 at 11:38 +0800, Wu, Fengguang wrote: > > > > On Wed, Jan 12, 2011 at 10:55:16AM +0800, Li, Shaohua wrote: > > > > > On Tue, Jan 11, 2011 at 05:13:53PM +0800, Wu, Fengguang wrote: > > > > > > On Tue, Jan 11, 2011 at 11:27:33AM +0800, Li, Shaohua wrote: > > > > > > > On Tue, 2011-01-11 at 11:07 +0800, Wu, Fengguang wrote: > > > > > > > > On Tue, Jan 11, 2011 at 10:03:16AM +0800, Li, Shaohua wrote: > > > > > > > > > > > fincore can takes a parameter or it returns a bit to distinguish > > > > > > > > > referenced pages, but I don't think it's a good API. This should be > > > > > > > > > transparent to userspace. > > > > > > > > > > > > > > > > Users care about the "cached" status may well be interested in the > > > > > > > > "active/referenced" status. They are co-related information. fincore() > > > > > > > > won't be a simple replication of mincore() anyway. fincore() has to > > > > > > > > deal with huge sparsely accessed files. The accessed bits of a file > > > > > > > > page are normally more meaningful than the accessed bits of mapped > > > > > > > > (anonymous) pages. > > > > > > > if all filesystems have the bit set, I'll buy-in. Otherwise, this isn't generic enough. > > > > > > > > > > > > It's a reasonable thing to set the accessed bits. So I believe the > > > > > > various filesystems are calling mark_page_accessed() on their metadata > > > > > > inode, or can be changed to do it. > > > > > yes, we can, with a lot of pain. And filesystems must be smart to avoid marking the bit > > > > > for pages which are readahead in but actually are invalid. The second patch in the series > > > > > > > > "invalid" means !PG_uptodate? I wonder why there is a need to test > > > > that bit at all. !PG_uptodate seems an unrelated transitional state. > > > not PG_update, it's referenced bit. A readahead metadata page will have update bit set, > > > but it might not have referenced bit if it's an obsolete page. btrfs > > > doesn't use the buffer_head > > > > I do see PageUptodate() tests in your patch, perhaps they be removed? > uptodate bit isn't really needed, but I added it to make sure the page > is valid. It may be nit pick, but I always try to remove optional code. The PageUptodate() looks like an irrelevant test and a good candidate to remove. > > > > > has more detailed infomation about this issue. The problem is if this is really worthy > > > > > for metadata readahead. Some filesystems might don't care about metadata readahead. If > > > > > we make fincore check the bit, then fincore syscall will not work for such filesystems, > > > > > which is bad. > > > > > > > > fincore() will always work as is. If the filesystem don't care about > > > > metadata readahead, then the metadata readahead that makes use of the > > > > bits will naturally not work for them? > > > yes, they don't care about readahead, but they do care about fincore > > > output. > > > > fincore() just reports the accessed bits as is. If the filesystem does > > not use blockdev or export its internal metadata inode, the user won't > > be able to run fincore() on the metadata inode at all. > > > > > if fincore() checks the bits, it doesn't work even for normal file > > > pages, if the pages get deactivated. > > > > That's a problem independent of the interface. And for user space > > readahead, it can be nicely fixed by collecting the pages-to-readahead > > before the free pages drop low, ie. before any page reclaim actions. > > It's "nice" because you don't want to readahead more data than > > cache-able anyway and avoid thrashing for small memory systems. > My point is fincore() isn't designed only for readahead. People will use > it like mincore, which is its normal usage. Checking the bits will break > its normal usage, because fincore just doesn't check if the fd means a > metadata inode. Sorry, you missed my point :) I mean to export the accessed bits as-is via the fincore() interface, not to check the accessed bits and then report "page not cached" to user space for !PG_referenced pages. Thanks, Fengguang -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html