Emmanuel Dreyfus <manu@xxxxxxxxxx> wrote: > Since I am porting glusterfs, I am never fully sure it really works as > intended. When I try gluster volume rebalance, I get no error reported, > but the operation seems to take forever. I found a probable cause for the problem. This is glusterfsd on NetBSD with a UFS1 filesystem. Extended attributes are stored in backing files that are located in a .attribute directory on the root of the filesystems. Of couse, if glusterfs starts distributing .attribute content among servers, then chaos occurs, since on a given server, some extended attributes will disapear during the operation. Logs indeed suggest something goes wrong with .attribute (see below) I am not sure of how this should be fixed. We can have a dht.ignodre-dir xlator option to specify a directory to ignore (or a comma-separated list?). I can contribute that if this is considered the right way. I just wonder how the administrator could request such an option to be added to glusterfs/glusterfsd arguments when glusterd launch them. Any suggestion? Another approach, simplier to implement but less general, is to add an #ifdef __NetBSD__ code section to dht xlator to skip .attribute treatments. The advantage of this approach is to avoid the problem of specifying the xlator option from glusterd. Theses two alternative can also be done in storage/posix xlator, rather that in cluster/dht, so that .attribute become just absent from any client view. In fact I think this is better to do it in storage/posix than in cluster/dht, since the same problem may happen later with other xlators. Annd finaly, if it is considered that it should not be glusterfs' job to address that, I can add a special case in NetBSD FUSE implementation so that .attribute just disapear from glusterfs view, but I do not think this is the right place do that. W [afr-common.c:634:afr_lookup_self_heal_check] 0-gfs1-replicate-1: /.attribute: gfid different on subvolume W [afr-common.c:634:afr_lookup_self_heal_check] 0-gfs1-replicate-1: /.attribute: gfid different on subvolume W [dht-common.c:177:dht_lookup_dir_cbk] 0-gfs1-dht: /.attribute: gfid different on gfs1-replicate-1 W [afr-common.c:634:afr_lookup_self_heal_check] 0-gfs1-replicate-0: /.attribute: gfid different on subvolume W [dht-common.c:177:dht_lookup_dir_cbk] 0-gfs1-dht: /.attribute: gfid different on gfs1-replicate-0 I [client3_1-fops.c:411:client3_1_stat_cbk] 0-gfs1-client-2: remote operation failed: No such file or directory I [client3_1-fops.c:411:client3_1_stat_cbk] 0-gfs1-client-1: remote operation failed: No such file or directory I [client3_1-fops.c:411:client3_1_stat_cbk] 0-gfs1-client-3: remote operation failed: No such file or directory I [client3_1-fops.c:2132:client3_1_opendir_cbk] 0-gfs1-client-1: remote operation failed: No such file or directory W [client3_1-fops.c:5041:client3_1_readdir] 0-gfs1-client-1: (2795520): failed to get fd ctx. EBADFD W [client3_1-fops.c:5106:client3_1_readdir] 0-gfs1-client-1: failed to send the fop: Bad file descriptor I [afr-dir-read.c:120:afr_examine_dir_readdir_cbk] 0-gfs1-replicate-0: /.attribute: failed to do opendir on gfs1-client-1 I [afr-dir-read.c:174:afr_examine_dir_readdir_cbk] 0-gfs1-replicate-0: entry self-heal triggered. path: /.attribute, reason: checksums of directory differ, forced merge option set I [client3_1-fops.c:2132:client3_1_opendir_cbk] 0-gfs1-client-2: remote operation failed: No such file or directory I [client3_1-fops.c:1303:client3_1_entrylk_cbk] 0-gfs1-client-1: remote operation failed: No such file or directory -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@xxxxxxxxxx