hi,
I think this is the RCA for the issue:
Basically with distributed ec + disctributed replicate as cold, hot
tiers. tier
sends a lookup which fails on ec. (By this time dict already
contains ec
xattrs) After this lookup_everywhere code path is hit in tier which
triggers
lookup on each of distribute's hash lookup but fails which leads to
the cold,
hot dht's lookup_everywhere in two parallel epoll threads where in
ec's thread it
tries to set trusted.ec.version/dirty/size in the dictionary, the older
values against the same key get erased. While this erasing is going
on if the
thread that is doing lookup on afr's subvolume accesses these
members either in
dict_copy_with_ref or client xlator trying to serialize, that can
either lead
to crash or hang based on when the spin/mutex lock is called on
invalid memory.
At the moment I sent http://review.gluster.org/13680 (I am pressed for
time because I need to provide a build for our customer with a fix),
which avoids parallel accesses of elements which step on each other.
Raghavendra G and I discussed about this problem and the right way to
fix it is to take a copy(without dict_foreach) of the dictionary in
dict_foreach inside a lock and then loop over the local dictionary. I am
worried about the performance implication of this, so wondering if
anyone has a better idea.
Also included Xavi, who earlier said we need to change dict.c but it is
a bigger change. May be the time has come? I would love to gather all
your inputs and implement a better version of dict if we need one.
Pranith
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel