On 12/17/2011 12:40 AM, Emmanuel Dreyfus wrote:
Pranith Kumar K<pranithk@xxxxxxxxxxx> wrote:
2) When using AFR, if a peer goes down, processes that have I/O pending
will se an error. Just retrying the same operation is fine, but that is
a bit furstrating.
Emmanuel,
Could you give the test case for the afr issue.
It is quite bold, I have not yet narrowed it down to something simple.
My test case is building NetBSD. You can grab the tarballs here:
http://ftp.fr.netbsd.org/pub/NetBSD/NetBSD-5.1/source/sets/src.tgz
http://ftp.fr.netbsd.org/pub/NetBSD/NetBSD-5.1/source/sets/sharesrc.tge
http://ftp.fr.netbsd.org/pub/NetBSD/NetBSD-5.1/source/sets/syssrc.tgz
http://ftp.fr.netbsd.org/pub/NetBSD/NetBSD-5.1/source/sets/gnusrc.tgz
Unpack, then cd usr/src&& ./build.sh -U release
I wait for the build to actually start, then pkill glusterfsd on a
replica, and the build stops because of an I/O error.
Emmanuel,
I think I will get a chance to take a closer look into this issue
next week if you are willing to wait.
I am suspecting the following code. Could you make the following
change and check if this is the issue you are hitting. could you send me
the logs of the client, on the next run.
struct iatt *buf = NULL;
struct iatt *postparent = NULL;
dict_t **xattr = NULL;
+ afr_private_t *priv = NULL;
GF_ASSERT (local);
+ priv = this->private;
buf = &local->cont.lookup.buf;
postparent = &local->cont.lookup.postparent;
@@ -787,6 +789,9 @@ afr_lookup_build_response_params (afr_local_t
*local, xlator_t *this)
read_child = afr_read_child (this, local->cont.lookup.inode);
gf_log (this->name, GF_LOG_DEBUG, "Building lookup response
from %d",
read_child);
+ GF_ASSERT (afr_is_child_present (local->cont.lookup.child_success,
+ priv->child_count, read_child));
+ GF_ASSERT (local->cont.lookup.sources[read_child]);
//honor the xattr set by data-self-heal
if (!*xattr)
*xattr = dict_ref (local->cont.lookup.xattrs[read_child]);
Pranith