Re: Impact of reduced lookup on other xlators

Anuradha Talur <atalur@xxxxxxxxxx> · Wed, 24 Aug 2016 02:49:02 -0400 (EDT)

Response inline.

----- Original Message -----
> From: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
> To: "Anuradha Talur" <atalur@xxxxxxxxxx>
> Cc: "Poornima Gurusiddaiah" <pgurusid@xxxxxxxxxx>, "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>, "Susant Palai"
> <spalai@xxxxxxxxxx>, "Ashish Pandey" <aspandey@xxxxxxxxxx>, "Ravishankar N" <ravishankar@xxxxxxxxxx>, "Krutika
> Dhananjay" <kdhananj@xxxxxxxxxx>, "Nithya Balachandran" <nbalacha@xxxxxxxxxx>, "Dan Lambright" <dlambrig@xxxxxxxxxx>
> Sent: Tuesday, August 23, 2016 4:27:56 PM
> Subject: Re: Impact of reduced lookup on other xlators
> 
> On Tue, Aug 23, 2016 at 1:56 PM, Anuradha Talur <atalur@xxxxxxxxxx> wrote:
> 
> > Hi,
> >
> > As we (Poornima, Krutika, Ravi and I) discussed, listing down the
> > dependencies from AFR's perspective:
> >
> > 1) AFR needs lookups to detect if a file's inode ctx is fresh, otherwise
> > it might lead to stale reads. This problem might be exposed frequently when
> > timeout is increased.
> >
> 
> I didn't get this part. Could you elaborate? Is it because of readdirp's
> need-heal? Other than this I  don't see any other requirement for lookups?
> This is something we can fix in different ways like getting need-heal in
> other fops which are wound to bricks.
> 
The easiest situation to understand is in multi-client scenario.

Two clients have accessed a file and have its details in their inode_ctx.
One client won't be aware of the failure seen by the other client in non-brick down
cases. Lookup was the way AFR was able to detect this failure.
Now, given that md-cache timeout is increased, lookups won't reach till AFR.
Which means, there is a chance that AFR's inode_ctx of a file is not up to date.
As event-gen doesn't change in unaware client reads might be served from stale brick.
This problem exists even now but increasing timeout will expose this more often.

Upcall can send invalidates. AFR should act on these invalidates selectively,
based on whether the file really needs heal or not.
We discussed some ways of doing this, but haven't arrived on any conclusion yet.

Along with this, we should also get need-heal in other fops too.
> 
> > 2) Client side healing needs stats to be wound down to AFR. (This can be
> > handled by switching off stat-prefetch or reducing its timeout only for
> > healing)
> >
> 
> I think we should do something similar to EC where we heal on getfattr on a
> virtual xattr or something. It is important to keep I/O as performant as
> possible.
> 
Hmm, correct.
> 
> >
> > Requirement on AFR from md-cache that Poornima pointed out:
> > 1) How to distinguish child_modified due to child_down from child_modified
> > from child_up so that cache is invalidated only on child_down, not on up?
> >
> 
> Didn't understand this part as well. May be you discussed it in the
> meeting. Could you clarify why we need to invalidate cache when child is
> down? If a child is down and we invalidate the cache, it will send a lookup
> which still goes to the remaining bricks right?
> 

In a non-replicate scenario, consider that md-cache has cached details of X files.
Now, one child goes down on which these files mapped. ls on the dir will succeed
and list these files but when the application tries to perform any operation on
one of these files, it will fail. This would be incorrect.
This is why md-cache's cache needs to be invalidated on child-down.

In a replicate scenario too this will happen when both subvols of AFR die/disconnect.
> 
> I think the goal should be to prevent dependency on lookups as much as
> possible. So let us work on the things that need to be taken care for that.
Correct.
> 
> >
> > If there are no objections about the first 2 points mentioned, I will
> > raise bugs to track them.
> > ------------------------------
> >
> > *From: *"Poornima Gurusiddaiah" <pgurusid@xxxxxxxxxx>
> > *To: *"Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>, "Susant Palai" <
> > spalai@xxxxxxxxxx>, "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>,
> > "Ashish Pandey" <aspandey@xxxxxxxxxx>, "Anuradha Talur"
> > <atalur@xxxxxxxxxx>,
> > "Ravishankar N" <ravishankar@xxxxxxxxxx>, "Krutika Dhananjay" <
> > kdhananj@xxxxxxxxxx>, "Nithya Balachandran" <nbalacha@xxxxxxxxxx>, "Dan
> > Lambright" <dlambrig@xxxxxxxxxx>
> > *Sent: *Tuesday, August 23, 2016 10:43:40 AM
> > *Subject: *Impact of reduced lookup on other xlators
> >
> > Hi All,
> >
> > Because of certain improvements in md-cache and longer caching time in
> > md-cache,
> > we will have reduced lookups sent to the cluster xlators and below. The
> > guarantee that every IO fop
> > will be preceded by a lookup will not hold true anymore(it was the case
> > earlier as well, but the
> > timeout was 1s). This may break certain assumptions made by cluster
> > xlators- dht, afr, tier, ec etc.
> >
> > This thread is intended to identify and discuss those issues and have bugs
> > filed. Request the cluster
> > xlator maintainers to identify if there are any such issues that needs to
> > be fixed.
> >
> >
> > Thank you,
> > Poornima
> >
> >
> >
> >
> > --
> > Thanks,
> > Anuradha.
> >
> 
> 
> 
> --
> Pranith
> 

-- 
Thanks,
Anuradha.
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel