Hi I have been tracking down a bug reported by /tests/basic/afr/entry-self-heal.t on NetBSD, and now I wonder how glustershd is supposed to work. In xlators/cluster/afr/src/afr-self-heald.c, we create a healder for each AFR subvolume. In afr_selfheal_tryinodelk(), each healer performs the INODELK for each AFR subvolume, using AFR_ONALL(). The result is that healers compete for the locks on the same inodes in the subvolumes. They sometime conflict, and if we have only two subvolumes, we ran into this condition: if (ret < AFR_SH_MIN_PARTICIPANTS) { /* Either less than two subvols available, or another selfheal (from another server) is in progress. Skip for now in any case there isn't anything to do. */ ret = -ENOTCONN; goto unlock; } Since there is no glustershd doing the work on another server, the entry will remain unhealed. I beleive this is exactly the same problem I am trying to address in http://review.gluster.org/9074 What is wrong here? Should there really be healers for each subvolume, or is it the AFR_ONALL() usage that is wrong? Or did I completely miss the thing? -- Emmanuel Dreyfus manu@xxxxxxxxxx _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-devel