Hi Pranith, We're running 3.4.2. I've attached the report from both servers. I don't know why there's such a massive difference in the file sizes. Regards, John On 27/06/14 03:32, Pranith Kumar Karampuri wrote: > > On 06/26/2014 04:10 AM, John Gardeniers wrote: >> Hi Pranith, >> >> jupiter currently has no gluster processes running. >> jupiter.om.net:/gluster_backup is a geo-replica. >> >> [root@nix ~]# gluster volume info >> Volume Name: gluster-backup >> Type: Distribute >> Volume ID: 0905fb11-f95a-4533-ae1c-05be43a8fe1f >> Status: Started >> Number of Bricks: 1 >> Transport-type: tcp >> Bricks: >> Brick1: jupiter.om.net:/gluster_backup >> Volume Name: gluster-rhev >> Type: Replicate >> Volume ID: b210cba9-56d3-4e08-a4d0-2f1fe8a46435 >> Status: Started >> Number of Bricks: 1 x 2 = 2 >> Transport-type: tcp >> Bricks: >> Brick1: jupiter.om.net:/gluster_brick_1 >> Brick2: nix.om.net:/gluster_brick_1 >> Options Reconfigured: >> geo-replication.indexing: on >> nfs.disable: on > > I am extremely sorry, I should have asked for this information also > yesterday. > 1) What is the version of gluster you are using? In 3.4.x there is > this issue where if operations are happening on VM self-heal wouldn't > start, which is not the case in 3.5 I believe. So it is important. I > remembered it only in the morning. > > 2) I believe the number of files on the bricks should be very less > considering it is a rhevm setup. Could you please also attach the > output of > > For each brick > find <brick-path> | xargs getfattr -d -m. -e hex > > file-you-need-to-send-us.txt > > This should help us see the xattrs of the files to help you on how to > fix the split-brains where necessary. > > Pranith > >> regards, >> John >> >> >> On 25/06/14 19:05, Pranith Kumar Karampuri wrote: >>> On 06/25/2014 04:29 AM, John Gardeniers wrote: >>>> Hi All, >>>> >>>> We're using Gluster as the storage for our virtualization. This >>>> consists >>>> of 2 servers with a single brick each configured as a replica pair. We >>>> also have a geo-replica on one of those two servers. >>>> >>>> For reasons that don't really matter, last weekend we had a situation >>>> which cause one server to reboot a number of times, which in turn >>>> resulted in a lot of heal-failed and split-brain errors. Because at >>>> the >>>> same time VMs were being migrated across hosts we ended up with many >>>> crashed VMs. >>>> >>>> Due to the need get the VMs up and running with as quickly as possible >>>> we decided to shut down one Gluster replica and use the "primary" one >>>> alone. As the geo-replica is also on the node we shut down that leaves >>>> us with just a single copy, which makes us rather nervous. >>>> >>>> As we have decided to treat the files on the currently running node as >>>> "correct", I'd appreciate advise on the best way to get the other node >>>> back into the replication. Should we simply bring it back on line and >>>> try to correct the errors that I expect will be many or should we >>>> treat >>>> it as a failed server and bring it back with an empty brick, rather >>>> than >>>> what is currently in the existing brick? The volume/bricks are 5TB, of >>>> which we're currently using around 2TB and the servers are on a 10Gb >>>> network, so I imagine it shouldn't take too long to rebuild and this >>>> would all be done out of hours anyway. >>> Considering you are saying there were split-brain related errors as >>> well. I suggest you bring up empty brick. >>> Could you give "gluster volume info" output and tell me which brick >>> went down. Based on that I will tell you >>> what you need to do. >>> >>> Pranith >>>> regards, >>>> John >>>> >>>> _______________________________________________ >>>> Gluster-users mailing list >>>> Gluster-users@xxxxxxxxxxx >>>> http://supercolony.gluster.org/mailman/listinfo/gluster-users >>> >>> ______________________________________________________________________ >>> This email has been scanned by the Symantec Email Security.cloud >>> service. >>> For more information please visit http://www.symanteccloud.com >>> ______________________________________________________________________ > > > ______________________________________________________________________ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com > ______________________________________________________________________
Attachment:
nix_jupiter.tar.gz
Description: application/gzip
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users