Re: Is rebalance completely broken on 3.5.3 ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Alessandro,
what you describe here reminds me of this issue:
http://www.spinics.net/lists/gluster-users/msg20144.html

And now that you mention it, the mess on our cluster could indeed have been triggered by an aborted rebalance.
This is a very important clue, since apparently developers were never able to reproduce the issue in the lab. I also tried to reproduce the issue on a test cluster, but never succeeded.

The example you describe below seems to me relatively easy to fix. A rebalance fix-layout would eventually get rid of the sticky bit files (---------T) on your brick 5 and 6 and you could manually remove the files created on 10/03 as long as you also remove the corresponding link file in the .glusterfs dir on that brick.

I whole heartedly agree with you that this needs urgent attention of developers before they start working on new features. A mess like this in a distributed file system makes the file system unusable for production. This should never happen, never! And if it does a rebalance should be able to detect and fix it... fast and efficiently. I also agree that the status of a rebalance should be more telling, giving a clear idea how long it would still take to complete. On large clusters a rebalance often takes ages and makes the entire cluster extremely vulnerable. (another scary operation is a remove-brick operation, but this is another story)

What I did in our case, maybe this could help you too as a quick fix for the most critical directories, is to rsync to a different storage (via a mount point). rsync only copies one file of duplicated files and you could separately copy a good version (in the case below e.g.: -rw-r--r-- 2 seviri users 68 May 26 2014 /data/glusterfs/home/brick1/seviri/.forward) of the problem files. But probably, as soon as you remove the files created on 10/03 (incl. the gluster link file in .glusterfs), the listing via your NFS mount will be restored. Try this out with a couple of files you have back-upped to be sure.

Hope this helps!

Cheers,
Olav
 





    
On 20/03/15 12:22, Alessandro Ipe wrote:

Hi,

 

 

After lauching a "rebalance" on an idle gluster system one week ago, its status told me it has scanned

more than 23 millions files on each of my 6 bricks. However, without knowing at least the total files to

be scanned, this status is USELESS from an end-user perspective, because it does not allow you to

know WHEN the rebalance could eventually complete (one day, one week, one year or never). From

my point of view, the total files per bricks could be obtained and maintained when activating quota,

since the whole filesystem has to be crawled...

 

After one week being offline and still no clue when the rebalance would complete, I decided to stop it...

Enormous mistake... It seems that rebalance cannot manage to not screw some files. Example, on

the only client mounting the gluster system, "ls -la /home/seviri" returns

ls: cannot access /home/seviri/.forward: Stale NFS file handle

ls: cannot access /home/seviri/.forward: Stale NFS file handle

-????????? ? ? ? ? ? .forward

-????????? ? ? ? ? ? .forward

while this file could perfectly be accessed before (being rebalanced) and has not been modifed for at

least 3 years.

 

Getting the extended attributes on the various bricks 3, 4, 5, 6 (3-4 replicate, 5-6 replicate)

Brick 3:

ls -l /data/glusterfs/home/brick?/seviri/.forward

-rw-r--r-- 2 seviri users 68 May 26 2014 /data/glusterfs/home/brick1/seviri/.forward

-rw-r--r-- 2 seviri users 68 Mar 10 10:22 /data/glusterfs/home/brick2/seviri/.forward

 

getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward

# file: data/glusterfs/home/brick1/seviri/.forward

trusted.afr.home-client-8=0x000000000000000000000000

trusted.afr.home-client-9=0x000000000000000000000000

trusted.gfid=0xc1d268beb17443a39d914de917de123a

 

# file: data/glusterfs/home/brick2/seviri/.forward

trusted.afr.home-client-10=0x000000000000000000000000

trusted.afr.home-client-11=0x000000000000000000000000

trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce

trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x0000000000000200

trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x00000001

 

Brick 4:

ls -l /data/glusterfs/home/brick?/seviri/.forward

-rw-r--r-- 2 seviri users 68 May 26 2014 /data/glusterfs/home/brick1/seviri/.forward

-rw-r--r-- 2 seviri users 68 Mar 10 10:22 /data/glusterfs/home/brick2/seviri/.forward

 

getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward

# file: data/glusterfs/home/brick1/seviri/.forward

trusted.afr.home-client-8=0x000000000000000000000000

trusted.afr.home-client-9=0x000000000000000000000000

trusted.gfid=0xc1d268beb17443a39d914de917de123a

 

# file: data/glusterfs/home/brick2/seviri/.forward

trusted.afr.home-client-10=0x000000000000000000000000

trusted.afr.home-client-11=0x000000000000000000000000

trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce

trusted.glusterfs.quota.4138a9fa-a453-4b8e-905a-e02cce07d717.contri=0x0000000000000200

trusted.pgfid.4138a9fa-a453-4b8e-905a-e02cce07d717=0x00000001

 

Brick 5:

ls -l /data/glusterfs/home/brick?/seviri/.forward

---------T 2 root root 0 Mar 18 08:19 /data/glusterfs/home/brick2/seviri/.forward

 

getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward

# file: data/glusterfs/home/brick2/seviri/.forward

trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce

trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400

 

Brick 6:

ls -l /data/glusterfs/home/brick?/seviri/.forward

---------T 2 root root 0 Mar 18 08:19 /data/glusterfs/home/brick2/seviri/.forward

 

getfattr -d -m . -e hex /data/glusterfs/home/brick?/seviri/.forward

# file: data/glusterfs/home/brick2/seviri/.forward

trusted.gfid=0x14a1c10eb1474ef2bf72f4c6c64a90ce

trusted.glusterfs.dht.linkto=0x686f6d652d7265706c69636174652d3400

 

Looking at the results from bricks 3 & 4 shows something weird. The file exists on 2 sub-bricks

storage directories, while it should only be found once on each brick server. Or is the issue lying in the

results of bricks 5 & 6 ? How can I fix this, please ? By the way, the split-brain tutorial only covers

BASIC split-brain conditions and not complex (real life) cases like this one. It would definitely benefit if

enriched by this one.

 

More generally, I think the concept of gluster is promising, but if basic commands (rebalance,

absolutely needed after adding more storage) from its own cli allows to put the system into an

unstable state, I am really starting to question its ability to be used in a production environment. And

from an end-user perspective, I do not care about new features added, no matter how appealing they

could be, if the basic ones are not almost totally reliable. Finally, testing gluster under high load on the

brick servers (real world conditions) would certainly gives insight to the developpers on what it failing

and what needs therefore to be fixed to mitigate this and improve gluster reliability.

 

Forgive my harsh words/criticisms, but having to struggle with gluster issues for two weeks now is

getting on my nerves since my colleagues can not use the data stored on it and I do not see any time

from now when it will be back online.

 

 

Regards,

 

 

Alessandro.

 



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux