----- Original Message ----- > From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > To: "Soumya Koduri" <skoduri@xxxxxxxxxx> > Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx> > Sent: Wednesday, May 11, 2016 4:28:28 PM > Subject: Re: gfapi, readdirplus and forced lookup after inode_link > > > > ----- Original Message ----- > > From: "Soumya Koduri" <skoduri@xxxxxxxxxx> > > To: "Mohammed Rafi K C" <rkavunga@xxxxxxxxxx>, "Raghavendra Gowdappa" > > <rgowdapp@xxxxxxxxxx>, "Niels de Vos" > > <ndevos@xxxxxxxxxx>, "Raghavendra Talur" <rtalur@xxxxxxxxxx>, "Poornima > > Gurusiddaiah" <pgurusid@xxxxxxxxxx> > > Cc: "+rhs-zteam" <rhs-zteam@xxxxxxxxxx>, "Rajesh Joseph" > > <rjoseph@xxxxxxxxxx>, "jtho >> Jiffin Thottan" > > <jthottan@xxxxxxxxxx> > > Sent: Wednesday, May 11, 2016 3:55:05 PM > > Subject: Re: gfapi, readdirplus and forced lookup after inode_link > > > > > > > > On 05/11/2016 12:41 PM, Mohammed Rafi K C wrote: > > > > > > > > > On 05/11/2016 12:28 PM, Soumya Koduri wrote: > > >> Hi Raghavendra, > > >> > > >> > > >> > > >> On 05/11/2016 12:01 PM, Raghavendra Gowdappa wrote: > > >>> Hi all, > > >>> > > >>> There are certain code-paths where the layers managing inodes (gfapi, > > >>> fuse, nfsv3 etc) need to do a lookup even though the inode is found > > >>> in inode-table. readdirplus is one such codepath (but not only one). > > >>> The reason for doing this is that > > >>> 1. not all xlators have enough information in readdirp_cbk to make > > >>> inode usable (for eg., dht cannot build layout for directory inodes). > > >>> 2. There are operations (like dht directory self-healing) which are > > >>> needed for maintaining internal consistency and these operations > > >>> cannot be done in readdirp. > > >>> > > >>> This forcing of lookup on a linked inode is normally achieved in two > > >>> ways: > > >>> 1. lower layers (like dht) setting entry->inode to NULL (without > > >>> entry->inode, interface layers cannot link the inode). > > >> > > >> Rafi (CC'ed) had made changes to fix readdirp specific issue (required > > >> for tiered volumes) as part of http://review.gluster.org/#/c/14109/ to > > >> do explicit lookup if either entry->inode is set to NULL or inode_ctx > > >> is NULL in gfapi. And I think he had made similar changes for > > >> gluster-NFS as well to provide support for tiered volumes. I am not > > >> sure if it is handled in common resolver code-path. Have to look at > > >> the code. Rafi shall be able to confirm it. > > > > > > The changes I made in the three access layers are for inodes which was > > > linked from lower layers. Which means the inodes linked from lower layer > > > won't have inode ctx set in upper xlators, ie, during resolving we will > > > send explicit lookup. > > > > > > With this changes during resolve if inode_ctx is not set then it will > > > send a lookup + if set_need_lookup flag is set in inode_ctx, then also > > > we will send a lookup That's correct. I think gfapi and fuse-bridge are handling this properly i.e., sending a lookup before resuming fop if: 1. No context of xlator (fuse/gfapi) is present in inode. Or 2. Context is set and it says resolution is necessary. Note that case 1 is necessary as inode_linking is done in dht during directory healing. So, other fops might encounter an inode on which resolution is still in progress and not complete yet. As inode-context is set in fuse-bridge/gfapi only after a successful lookup, absence of context can be used as a hint for resolution being in progress. I am not sure NFSv3 server is doing this. Also, on bricks too quota-enforcer and bit-rot does inode-linking. So, protocol/server also needs to do similar things. > > > > > > As Du mentioned, readdirp set need_lookup everytime for entries in > > > readdirp, I saw that code in fuse, and gfapi. But I don't remember such > > > code in gNFS. > > > > There are checks for "entry->inode == NULL" in gNFS case as well. Looks > > like it was Jiffin who made those changes (again wrt to tiered volumes) > > - http://review.gluster.org/#/c/12960/ > > > > But all these checks seem to be in only readdirp_cbk codepath where > > directory entries are filled. What are other fops which need such > > special handling? > > There are some codepaths, where linking is done by xlators who don't do > resolution. A rough search shows following components: > 1. quota enforcer > 2. bitrot > 3. dht/tier (needed, but currently not doing). > 4. trash (for .trash I suppose) > > However, none of these are explicitly setting need_lookup. So, there are > windows of time where lookup is partially complete in an xlator graph, but > other fops start using them. I am currently working on a fix to solve the > issue for dht/tier on fuse. We have to do similar work on other > xlators/interface layers too. > > > > > Thanks, > > Soumya > > > > > > > > > > Regards > > > Rafi KC > > > > > >> > > >> > > >> Thanks, > > >> Soumya > > >> > > >>> 2. interface layers (at least fuse) setting a flag in inode to let > > >>> resolver know that a lookup is to be done before resuming the fop. > > >>> > > >>> I am sure that fuse-bridge does this correctly. Need inputs from you > > >>> about the behavior of other interface layers like gfapi, nfsv3 etc. > > >>> > > >>> regards, > > >>> Raghavendra > > >>> > > > > > > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-devel > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel