Re: Inconsistent behavior due to lack of lookup on entry followed by readdirp

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 






From: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
To: "Krutika Dhananjay" <kdhananj@xxxxxxxxxx>
Cc: "Mohammed Rafi K C" <rkavunga@xxxxxxxxxx>, "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Dan Lambright" <dlambrig@xxxxxxxxxx>, "Nithya Balachandran" <nbalacha@xxxxxxxxxx>, "Ben Turner" <bturner@xxxxxxxxxx>, "Ben England" <bengland@xxxxxxxxxx>, "Manoj Pillai" <mpillai@xxxxxxxxxx>, "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>, "Ravishankar Narayanankutty" <ranaraya@xxxxxxxxxx>, xhernandez@xxxxxxxxxx
Sent: Thursday, August 13, 2015 9:06:37 AM
Subject: Re: Inconsistent behavior due to lack of lookup on entry followed by readdirp



----- Original Message -----
> From: "Krutika Dhananjay" <kdhananj@xxxxxxxxxx>
> To: "Mohammed Rafi K C" <rkavunga@xxxxxxxxxx>
> Cc: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>, "Dan Lambright" <dlambrig@xxxxxxxxxx>, "Nithya Balachandran"
> <nbalacha@xxxxxxxxxx>, "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>, "Ben Turner" <bturner@xxxxxxxxxx>, "Ben
> England" <bengland@xxxxxxxxxx>, "Manoj Pillai" <mpillai@xxxxxxxxxx>, "Pranith Kumar Karampuri"
> <pkarampu@xxxxxxxxxx>, "Ravishankar Narayanankutty" <ranaraya@xxxxxxxxxx>, xhernandez@xxxxxxxxxx
> Sent: Wednesday, August 12, 2015 9:02:44 PM
> Subject: Re: Inconsistent behavior due to lack of lookup on entry followed by readdirp
>
> I faced the same issue with the sharding translator. I fixed it by making its
> readdirp callback initialize individual entries' inode ctx, some of these
> being xattr values, which are filled in entry->dict by the posix translator.
> Here is the patch that got merged recently: http://review.gluster.org/11854
> Would that be as easy to do in DHT as well?

The problem is not just filling out state in the inode. The bigger problem is healing, which is supposed to maintain a directory/file to be in state consistent with our design before a successful reply to lookup. The operations can involve creating directories on missing subvols, setting appropriate layout, etc. Effectively for readdirp to replace lookup, it should be calling dht_lookup on each of the dentry it is passing back to application.
OK.
>
> As far as AFR is concerned, it indirectly forces LOOKUP on entries which are
> being retrieved for the first time through a READDIRP (and as a result do
> not have their inode ctx etc initialised yet) by setting entry->inode to
> NULL. See afr_readdir_transform_entries().

Hmm. Then we already have "disabled" readdirp through code :). Without an inode corresponding to entry, readdirp will be effectively readdir stripping any performance benefits by having readdirp as a "batched" lookup (of all the dentries).
No. Not every single READDIRP will be transformed into a READDIR by AFR. AFR resets the inode corresponding to an entry, before responding to its parent, _only_ under the following two conditions:
1) if this entry in question is being retrieved by this client for the first time through a READDIRP. In other words, this client has not _yet_ performed a LOOKUP on it.
2) if that sub-volume of AFR on which the parent directory is being READDIRP'd (remember AFR would only need to serve inode and directory reads from one of the replicas) does _not_ contain a good copy of the entry.
    In other words this entry needs to be healed on parent's read child. This is because we do not want the caching translators or the application itself to get incorrect entry attributes.

This means that more often than not, AFR _would_ be leaving the inode corresponding to the entry as it is, and not setting it to NULL.


> This is the default behavior which is being made optional as part of
> http://review.gluster.org/#/c/11846/ which is still under review (see BZ
> 1250803, a performance bug :) ).

If it is made optional, when we enable setting entry->inode we still see consistency issues. Also, it seems to me that there is no point in having each individual xlator option controlling this behaviour. Instead we can make each xlator behave in compliance to global mount option "--use-readdirp=yes/no". Is there any specific reason to have an option to control this behaviour in afr?
Agreed that there will be consistency issues.
The reason to move away from letting 1) and 2) above be the default behavior is performance. :) And I guess it is also partly because AFR-v1 does not have these restrictions in READDIRP.  But the patch is still under review.

-Krutika



>
> -Krutika
>
> ----- Original Message -----
>
> > From: "Mohammed Rafi K C" <rkavunga@xxxxxxxxxx>
> > To: "Gluster Devel" <gluster-devel@xxxxxxxxxxx>
> > Cc: "Dan Lambright" <dlambrig@xxxxxxxxxx>, "Nithya Balachandran"
> > <nbalacha@xxxxxxxxxx>, "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>, "Ben
> > Turner" <bturner@xxxxxxxxxx>, "Ben England" <bengland@xxxxxxxxxx>, "Manoj
> > Pillai" <mpillai@xxxxxxxxxx>, "Pranith Kumar Karampuri"
> > <pkarampu@xxxxxxxxxx>, "Ravishankar Narayanankutty" <ranaraya@xxxxxxxxxx>,
> > kdhananj@xxxxxxxxxx, xhernandez@xxxxxxxxxx
> > Sent: Wednesday, August 12, 2015 7:29:48 PM
> > Subject: Inconsistent behavior due to lack of lookup on entry followed by
> > readdirp
>
> > Hi All,
>
> > We are facing some inconsistent behavior for fops like rename, unlink
> > etc due to lack of lookup followed by a readdirp, more specifically if
> > inodes/gfid are populated via readdirp call and this nodeid is shared
> > with kernal, md-cache will cache this based on base-name. Then
> > subsequent named lookup will be served from md-cache and it winds-back
> > immediately. So there is a chance to have an FOP triggered with out
> > having a lookup on an entry. DHT does lot of things like creating link
> > files and populate inode_ctx etc, during lookup. In such scenario it is
> > must to have at least one lookup to be happened on an entry. Since
> > readdirp preventing the lookup, it has been very hard for fops to
> > proceed without a first lookup on the entry. We are also suspecting some
> > problems due to same with afr/ec self healing also. So If we remove
> > readdirp from md-cache ([1], [2]) it causes, an additional hop for first
> > lookup for every entry. I'm mostly concerned with this one extra network
> > call, and the performance degradation caused by the same.
>
> > Now with this, the only advantage with readdirp is, it removes one
> > context switch between kernal and userspace. Is it really worth to
> > sacrifice this for consistency ?
>
> > What do you think about removing readdirp functionality?
>
> > Please provide your input/suggestion/ideas.
>
> > [1] : http://review.gluster.org/#/c/11892/
>
> > [2] : http://review.gluster.org/#/c/11894/
>
> > Thanks in Advance
> > Rafi KC
>

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux