Re: [PATCH 2/3] Add shrink_pagecache_parent

Li Wang <liwang@xxxxxxxxxxxxxxx> · Wed, 08 Jan 2014 10:06:31 +0800

Hi,

On 01/03/2014 07:55 AM, Andrew Morton wrote:
On Mon, 30 Dec 2013 21:45:17 +0800 Li Wang <liwang@xxxxxxxxxxxxxxx> wrote:

Analogous to shrink_dcache_parent except that it collects inodes.
It is not very appropriate to be put in dcache.c, but d_walk can only
be invoked from here.

Please cc Dave Chinner on future revisions.  He be da man.

The overall intent of the patchset seems reasonable and I agree that it
can't be efficiently done from userspace with the current kernel API.
We *could* do it from userspace by providing facilities for userspace to
query the VFS caches: "is this pathname in the dentry cache" and "is
this inode in the inode cache".

Even we have these available, i am afraid it will still introduce
non-negligible overhead due to frequent system calls for a directory
 walking operation, especially under massive small file situations.

--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1318,6 +1318,42 @@ void shrink_dcache_parent(struct dentry *parent)
  }
  EXPORT_SYMBOL(shrink_dcache_parent);

+static enum d_walk_ret gather_inode(void *data, struct dentry *dentry)
+{
+	struct list_head *list = data;
+	struct inode *inode = dentry->d_inode;
+
+	if ((inode == NULL) || ((!inode_owner_or_capable(inode)) &&
+				(!capable(CAP_SYS_ADMIN))))
+		goto out;
+	spin_lock(&inode->i_lock);
+	if ((inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) ||

It's unclear what rationale lies behind this particular group of tests.

+		(inode->i_mapping->nrpages == 0) ||
+		(!list_empty(&inode->i_lru))) {

arg, the "Inode locking rules" at the top of fs/inode.c needs a
refresh, I suspect.  It is too vague.

Formally, inode->i_lru is protected by
i_sb->s_inode_lru->node[nid].lock, not by ->i_lock.  I guess you can
just do a list_lru_add() and that will atomically add the inode to your
local list_lru if ->i_lru wasn't being used for anything else.

I *think* that your use of i_lock works OK, because code which fiddles
with i_lru and s_inode_lru also takes i_lock.  However we need to
decide which is the preferred and official lock.  ie: what is the
design here??

However...  most inodes will be on an LRU list, won't they?  Doesn't
this reuse of i_lru mean that many inodes will fail to be processed?
If so, we might need to add a new list_head to the inode, which will be
problematic.

As far as I know, fix me if i am wrong, only when inode has zero
reference count, it will be put into superblock lru list. For most
situations, there is at least a dentry refers to it, so it will not
be on any lru list.


Aside: inode_lru_isolate() fiddles directly with inode->i_lru without
taking i_sb->s_inode_lru->node[nid].lock.  Why doesn't this make a
concurrent s_inode_lru walker go oops??  Should we be using
list_lru_del() in there?  (which should have been called
list_lru_del_init(), sigh).

It seems inode_lru_isolate() only called by prune_icache_sb() as
a callback function. Before calling it, the caller has hold
the lock.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>