> I understand this code checks that the layout cover the whole space, > is that right? Then it must be upset that layout->list[0] does not cover > anything. Since the error is transcient, I susepct a race condition: > the layout would be filled after that check. Is it possible? Where is > the layout crafted? I improved my test by completely deleting and re-creating the volume before adding a brick. Here is what happens when I add a brick: 1-vndfs-client-1: Connected to 192.0.2.103:24027, attached to remote volume '/export/vnd1a'. 1-vndfs-client-1: Server and Client lk-version numbers are not same, reopening the fds 0-fuse: switched to graph 1 1-vndfs-client-1: Server lk version = 1 1-vndfs-dht: missing disk layout on vndfs-client-0. err = -1 1-dht_layout_merge: ==> layout[0] 0 - 0 err -1 1-dht_layout_merge: ==> layout[1] 0 - 0 err 0 1-vndfs-dht: missing disk layout on vndfs-client-1. err = -1 1-dht_layout_merge: ==> layout[0] 0 - 0 err -1 1-dht_layout_merge: ==> layout[1] 0 - 0 err -1 I am not sure it is expected behavior. The broken layout does not raise EINVAL to process using the filesystem, but later similar treatment will. After playing a bit I tested the race condition with this patch: --- a/xlators/cluster/dht/src/dht-common.c +++ b/xlators/cluster/dht/src/dht-common.c @@ -477,6 +477,11 @@ unlock: ret = dht_layout_normalize (this, &local->loc, layout); if (ret != 0) { + if (strcmp(local->loc.path, "/") == 0) { + gf_log (this->name, GF_LOG_WARNING, + "wAit 2s for DHT to settle..."); + sleep(2); + } gf_log (this->name, GF_LOG_DEBUG, "fixing assignment on %s", local->loc.path); Here is the kind og log it procudes. I do not always see the EINVAL in the log, but it is never seen by processes using the filesystem. At least during the tests I did. [2012-08-19 06:04:06.288131] I [fuse-bridge.c:4195:fuse_graph_setup] 0-fuse: switched to graph 1 [2012-08-19 06:04:06.289052] I [client-handshake.c:453: client_set_lk_version_cbk] 1-vndfs-client-1: Server lk version = 1 [2012-08-19 06:04:06.294234] W [dht-common.c:482:dht_lookup_dir_cbk] 1-vndfs-dht: wait 2s for DHT to settle... [2012-08-19 06:04:08.306937] I [client.c:2151:notify] 0-vndfs-client-0: current graph is no longer active, destroying rpc_client [2012-08-19 06:04:08.308114] I [client.c:2090:client_rpc_notify] 0-vndfs-client-0: disconnected [2012-08-19 06:04:08.309833] W [fuse-resolve.c:151:fuse_resolve_gfid_cbk] 0-fuse: 4e4b4110-a585-4aae-b919-b2416355f5d1: failed to resolve (Invalid argument) [2012-08-19 06:04:08.310275] E [fuse-bridge.c:353:fuse_lookup_resume] 0-fuse: failed to resolve path (null) But this probably does not really fix the problem. I got an unreproductible ENOENT for a directory while copying a hierarchy for instance. -- Emmanuel Dreyfus manu@xxxxxxxxxx