Hello All- The rebalancing completed successfully with version 3.1.1 but quite a lot of files ended up with root ownership. Doing a recursive chown on each user's directory solved the problem but it did take a long time. I am using 3.1.2 now and there is rebalance of the same volume going on now. As before some files have ended up being owned by root, and I have had to recursively chown some users' directories more than once to allow them to continue working. This has proved to be quite disruptive, more so than before because last time the rebalancing was done over the Christmas holidays. We have to continue using the volume while rebalcing is taking place because it is taking so long - 1 week and counting, and still only 2/3 the way through. Has anybody else experienced this file ownership problem during and/or after rebalancing? The nfs.log file on the server is full of errors like the following. [2011-01-30 15:07:52.890125] I [dht-common.c:369:dht_revalidate_cbk] atmos-dht: subvolume atmos-replicate-5 returned -1 (Invalid argument) However they don't seem to be generated in response to any particular operation such as reading or modifying files (or none that I can repeat anyway). There is something else I should mention that may or may not be relevant. On most NFS and GlusterFS clients the affected files show up as being owned by root. However on one NFS client, which just happens to be the only one we have that runs SLES11, I get the following when listing a user's directory. friesian:~ # ls -l /glusterfs/atmos/users/tjp ls: cannot access /glusterfs/atmos/users/tjp/2hrprocessing.csh: No such file or directory ls: cannot access /glusterfs/atmos/users/tjp/bbb.nc: No such file or directory ls: cannot access /glusterfs/atmos/users/tjp/rrr.nc: No such file or directory ls: cannot access /glusterfs/atmos/users/tjp/6hrprocessing.csh: No such file or directory ls: cannot access /glusterfs/atmos/users/tjp/sss.nc: No such file or directory ls: cannot access /glusterfs/atmos/users/tjp/aaa.nc: No such file or directory ls: cannot access /glusterfs/atmos/users/tjp/ttt.nc: No such file or directory total 1458552 -????????? ? ? ? ? ? 2hrprocessing.csh -????????? ? ? ? ? ? 6hrprocessing.csh -????????? ? ? ? ? ? aaa.nc -rwxr----- 1 tjp essc 51709 2011-02-01 16:50 ALL_post_processing.csh -????????? ? ? ? ? ? bbb.nc -rwxr----- 1 tjp essc 5250 2011-02-01 15:00 DJF_2hr_post_processing.csh -rwxr----- 1 tjp essc 5250 2011-02-01 15:00 DJF_6hr_post_processing.csh drwxr-xr-x 2 tjp essc 24576 2011-01-30 12:42 interpQS drwxr-x--- 2 tjp essc 24576 2011-01-30 13:06 interpWIND10M -rwxr----- 1 tjp essc 5050 2011-02-01 15:00 JJA_2hr_post_processing.csh -rwxr----- 1 tjp essc 5050 2011-02-01 15:00 JJA_6hr_post_processing.csh -rwxr----- 1 tjp essc 5050 2011-02-01 15:00 MAM_2hr_post_processing.csh -rwxr----- 1 tjp essc 5050 2011-02-01 15:00 MAM_6hr_post_processing.csh drwxr-x--- 2 tjp essc 24576 2011-01-30 14:52 maskedWIND10M drwxr-xr-x 2 tjp essc 24576 2011-01-30 13:23 maskedWIND10M_interpQS drwxr-xr-x 2 tjp essc 24576 2011-01-30 13:12 NEWinterpQS drwxr-xr-x 2 tjp essc 24576 2011-01-30 13:18 NEWmaskedWIND10M_interpQS drwxr-xr-x 2 tjp essc 24576 2011-02-01 13:03 NEWQS-INTERIM -rwxr----- 1 tjp essc 51709 2011-02-01 16:44 .nfs2cc8b82b3307859b0000000b -rw-r--r-- 1 root root 1493004472 2011-01-30 12:37 qc.nc drwxr-xr-x 2 tjp essc 24576 2011-01-30 14:32 QS-INTERIM -????????? ? ? ? ? ? rrr.nc drwxr-xr-x 4 tjp essc 24576 2011-01-25 20:08 seasonalQSncfiles -rwxr----- 1 tjp essc 5051 2011-02-01 15:00 SON_2hr_post_processing.csh -rwxr----- 1 tjp essc 5051 2011-02-01 15:00 SON_6hr_post_processing.csh -????????? ? ? ? ? ? sss.nc drwxr-xr-x 2 tjp essc 24576 2011-01-30 14:32 TEMP -????????? ? ? ? ? ? ttt.nc The funny file in this listing are the ones that the other clients see as owned by root. I know that SuSE Linux is not supported so I am not going to pursue this particular machine's problem any more, but I just thought it might provide some useful information about the rebalancing problem. -Dan. On 02/01/2011 04:39 PM, Dan Bretherton wrote: > Dan - > Gluster 3.1.1 s out, can you recreate the issue using that version? > Please let us know. > > http://www.gluster.com/community/documentation/index.php/Gluster_3.1_Filesystem_Installation_and_Configuration_Guide > > Thanks, > > Craig > > --> > Craig Carl > Senior Systems Engineer > Gluster > > On 11/21/2010 03:57 AM, Dan Bretherton wrote: > >/ Hello all- > />/ I have just added two replicated bricks (2x2) to a > />/ distributed-replicated volume. When mounted the new size is correct > />/ (11TB) and the "gluster volume info" shows what I would expect. However > />/ rebalancing the volume seems to have no effect, and happens almost > />/ instantaneously. I expected it to take several hours because the > />/ original bricks have 5TB on them. These are the commands I ran to start > />/ the rebalance and check the status. > />/ > />/ [root at bdan4 <http://gluster.org/cgi-bin/mailman/listinfo/gluster-users> ~]# gluster volume rebalance atmos start > />/ starting rebalance on volume atmos has been successful > />/ [root at bdan4 <http://gluster.org/cgi-bin/mailman/listinfo/gluster-users> ~]# gluster volume rebalance atmos status > />/ rebalance completed > />/ > />/ These are the last two messages in etc-glusterfs-glusterd.vol.log > />/ > />/ [2010-11-21 11:34:52.316470] I > />/ [glusterd-rebalance.c:292:glusterd_defrag_start] rebalance: rebalance on > />/ /etc/glusterd/mount/atmos complete > />/ [2010-11-21 11:35:04.653629] I > />/ [glusterd-rebalance.c:385:glusterd_handle_defrag_volume] glusterd: > />/ Received rebalance volume on atmos > />/ > />/ The messages look to me as if they are in the wrong order. I confirmed > />/ this with another "gluster volume rebalance atmos status", which results > />/ in the message "Received rebalance volume on atmos". > />/ > />/ It wondered if it would have been better to add just one replicated > />/ brick at a time instead of two at once, but when I went to remove one of > />/ them I was scared off by the warning about possible data loss. The new > />/ volume does seem to be working and some files are being put on the new > />/ bricks, but most seem to be going onto the original bricks which were > />/ ~80% full before the expansion. I would like to press on and use the > />/ volume but I don't know what will happen when the original bricks get > />/ completely full, and there is also likely to be a performance penalty in > />/ using an un-balanced volume, as I understand it from reading previous > />/ mailing list postings. Any suggestions would be much appreciated. > />/ > />/ Regards, > />/ -Dan > />/ > />/ _______________________________________________ > />/ Gluster-users mailing list > />/ Gluster-users at gluster.org <http://gluster.org/cgi-bin/mailman/listinfo/gluster-users> > />/ http://gluster.org/cgi-bin/mailman/listinfo/gluster-users > /