On Fri, 2009-02-27 at 12:26 +0100, Nick Piggin wrote: > On Thu, Feb 26, 2009 at 08:21:45AM -0500, Chris Mason wrote: > > > > One problem I have with the btrfs extent state code is that I might > > > > choose to release the extent state in releasepage, but the VM might not > > > > choose to free the page. So I've got an up to date page without any of > > > > the rest of my state. > > > > > > I'm not sure. What semantics do you want there? In most cases (including > > > fsblock default case where the filesystem does not have a pin), we're > > > happy to leave clean, uptodate pages in pagecache in that case. > > > > Right, but it really limits the state that we can keep outside the page > > bits. Take a subpage block, where we know the first 1k is up to date. > > releasepage comes and we free our tracking that says the first 1k is up > > to date, but the VM doesn't free the page. > > > > Now we have a page where the uptodate bit isn't set, but the first 1k > > has valid data. We have to reread it. > > Well I don't see how that limits us? Either we prefer to keep the > metadata, or we throw it away and it is inevitable that we lose > information. > We can't have metadata that isn't freed by releasepage unless we want to pin the page completely. There was a time when the btrfs metadata had a bit for 'this block needs defrag', and I ended up not being able to use it because releasepage was consistently freeing my extra data while the page was still around. > Regardless of whether you store the data in a tree of extends in the > inode, or per-page buffers, you have the same problem (buffer heads > have that same problem too). > Right. > > > I'd like a form of releasepage that knows if the vm is going to really > > get rid of the page. Or another callback that happens when the VM is > > sure the page will be freed so we can drop extra metadata that doesn't > > pin the page, but we always want to stay with the page. > > Well, for page reclaim/invalidate/truncate, we have releasepage that you > can use even if the metadata is stored outside the page, just set PagePrivate > and it will still get called when the page is about to be freed. > For clean pages, shrink_page_list seems to check the page count after the releasepage call. It was a big enough window for me to see it in practice under normal workloads. > There are *some* races that can result in the page subsequently not being > freed, but I don't think that should be a big deal. I don't want to add > a callback in the pagecache remove path if possible, but we can try to > rework or improve things if btrfs needs something specific.. Btrfs doesn't need it today, but it should help once I finally get subpage blocks going again (and metadata defrag as well). -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html