On 05/11/2016 10:17 PM, Soumya Koduri wrote:
On 05/11/2016 06:12 PM, Raghavendra Gowdappa wrote:
----- Original Message -----
From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
To: "Soumya Koduri" <skoduri@xxxxxxxxxx>
Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
Sent: Wednesday, May 11, 2016 4:28:28 PM
Subject: Re: gfapi, readdirplus and forced lookup
after inode_link
----- Original Message -----
From: "Soumya Koduri" <skoduri@xxxxxxxxxx>
To: "Mohammed Rafi K C" <rkavunga@xxxxxxxxxx>, "Raghavendra Gowdappa"
<rgowdapp@xxxxxxxxxx>, "Niels de Vos"
<ndevos@xxxxxxxxxx>, "Raghavendra Talur" <rtalur@xxxxxxxxxx>, "Poornima
Gurusiddaiah" <pgurusid@xxxxxxxxxx>
Cc: "+rhs-zteam" <rhs-zteam@xxxxxxxxxx>, "Rajesh Joseph"
<rjoseph@xxxxxxxxxx>, "jtho >> Jiffin Thottan"
<jthottan@xxxxxxxxxx>
Sent: Wednesday, May 11, 2016 3:55:05 PM
Subject: Re: gfapi, readdirplus and forced lookup after inode_link
On 05/11/2016 12:41 PM, Mohammed Rafi K C wrote:
On 05/11/2016 12:28 PM, Soumya Koduri wrote:
Hi Raghavendra,
On 05/11/2016 12:01 PM, Raghavendra Gowdappa wrote:
Hi all,
There are certain code-paths where the layers managing inodes
(gfapi,
fuse, nfsv3 etc) need to do a lookup even though the inode is found
in inode-table. readdirplus is one such codepath (but not only one).
The reason for doing this is that
1. not all xlators have enough information in readdirp_cbk to make
inode usable (for eg., dht cannot build layout for directory
inodes).
2. There are operations (like dht directory self-healing) which are
needed for maintaining internal consistency and these operations
cannot be done in readdirp.
This forcing of lookup on a linked inode is normally achieved in two
ways:
1. lower layers (like dht) setting entry->inode to NULL (without
entry->inode, interface layers cannot link the inode).
Rafi (CC'ed) had made changes to fix readdirp specific issue
(required
for tiered volumes) as part of
http://review.gluster.org/#/c/14109/ to
do explicit lookup if either entry->inode is set to NULL or inode_ctx
is NULL in gfapi. And I think he had made similar changes for
gluster-NFS as well to provide support for tiered volumes. I am not
sure if it is handled in common resolver code-path. Have to look at
the code. Rafi shall be able to confirm it.
The changes I made in the three access layers are for inodes which was
linked from lower layers. Which means the inodes linked from lower
layer
won't have inode ctx set in upper xlators, ie, during resolving we
will
send explicit lookup.
With this changes during resolve if inode_ctx is not set then it will
send a lookup + if set_need_lookup flag is set in inode_ctx, then also
we will send a lookup
That's correct. I think gfapi and fuse-bridge are handling this
properly i.e., sending a lookup before resuming fop if:
1. No context of xlator (fuse/gfapi) is present in inode.
Or
2. Context is set and it says resolution is necessary.
Note that case 1 is necessary as inode_linking is done in dht during
directory healing. So, other fops might encounter an inode on which
resolution is still in progress and not complete yet. As inode-context
is set in fuse-bridge/gfapi only after a successful lookup, absence of
context can be used as a hint for resolution being in progress.
I am not sure NFSv3 server is doing this.
Case (1) is definitely handled in gluster-NFS. In fact looks like it was
as part of single BZ# 1297311 [1], all the required changes were made in
these layers (fuse,gNFS, gfapi) to handle the cases you had mentioned
above. However, at-least when compared the patches [2], [3] & [4], I do
not see need_lookup changes in gluster-NFS.
Rafi, do you recall why its been so? Was it intentional?
I think the reason being, in gluster-NFS, " nfs_fix_generation()", where
inode_ctx is set, is being called only at selective places. So at-least
in readdirp_cbk(), we seem to be just linking inodes but not setting
inode_ctx. So that would result in forced lookup next time any fop is
performed on that inode.
Rafi,
Could you please confirm if that was indeed the case?
Thanks,
Soumya
Thanks,
Soumya
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1297311
[2] http://review.gluster.org/#/c/13224/
[3] http://review.gluster.org/#/c/13225
[4] http://review.gluster.org/#/c/13226
Also, on bricks too quota-enforcer and bit-rot does inode-linking. So,
protocol/server also needs to do similar things.
As Du mentioned, readdirp set need_lookup everytime for entries in
readdirp, I saw that code in fuse, and gfapi. But I don't remember
such
code in gNFS.
There are checks for "entry->inode == NULL" in gNFS case as well. Looks
like it was Jiffin who made those changes (again wrt to tiered volumes)
- http://review.gluster.org/#/c/12960/
But all these checks seem to be in only readdirp_cbk codepath where
directory entries are filled. What are other fops which need such
special handling?
There are some codepaths, where linking is done by xlators who don't do
resolution. A rough search shows following components:
1. quota enforcer
2. bitrot
3. dht/tier (needed, but currently not doing).
4. trash (for .trash I suppose)
However, none of these are explicitly setting need_lookup. So, there are
windows of time where lookup is partially complete in an xlator
graph, but
other fops start using them. I am currently working on a fix to solve
the
issue for dht/tier on fuse. We have to do similar work on other
xlators/interface layers too.
Thanks,
Soumya
Regards
Rafi KC
Thanks,
Soumya
2. interface layers (at least fuse) setting a flag in inode to let
resolver know that a lookup is to be done before resuming the fop.
I am sure that fuse-bridge does this correctly. Need inputs from you
about the behavior of other interface layers like gfapi, nfsv3 etc.
regards,
Raghavendra
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel