Re: How much disk can fail after a catastrophic failure occur?

Andreas Schwibbe <a.schwibbe@xxxxxxx> · Sun, 20 Oct 2024 13:29:53 +0200

Gilberto,

this totally depends on your setup.

With replica 2 you always have 2 copies of the same file.
So when you add bricks to your volume you'll want to add Server1/disco1TB-0  and Server2/disco1TB-0 as a pair.
Meaning that each file goes to 1 server to 1 disk.
Thus your system can fail each 1 disk of any pair OR 1 server and still be up.

However I recommend not to use replica 2 as you'll get into problems with split-brain when 1 server is down.
When it is coming back up, you might have 2 versions of the same file and you need a strategy to figure out which one of the two copies is the actual one.
You can however set the volume to read-only if 1 server is down, then you cannot get any splitbrains, but this comes maybe with downtime depending on your usecase.

Hence why you can use at least replica 2 + 1 arbiter
Arbiter will hold metadata copies of each file (so the hardware requirement is pretty low for this server and also doesn't need huge disks) making it easy to find the valid filecopy and heal the invalid one. (once had a NUC as arbiter, running totally fine) [when using arbiter, be sure to create xfs with imaxpct=75 on arbiter as the bricks will hold metadata only not files]

If you've got enough resources for 3 servers, replica 3 is best.

When you do 
gluster v status
and you have replica 2
then the first two rows are a pair
if you have set replica 3
then the first three rows are paired and will hold copies of the same file.

Cheers,
A.

Am Samstag, dem 19.10.2024 um 12:25 -0300 schrieb Gilberto Ferreira:
Hi there.I have 2 servers with this number of disks in each side:

pve01:~# df | grep disco
/dev/sdd          1.0T  9.4G 1015G   1% /disco1TB-0
/dev/sdh          1.0T  9.3G 1015G   1% /disco1TB-3
/dev/sde          1.0T  9.5G 1015G   1% /disco1TB-1
/dev/sdf          1.0T  9.4G 1015G   1% /disco1TB-2
/dev/sdg          2.0T   19G  2.0T   1% /disco2TB-1
/dev/sdc          2.0T   19G  2.0T   1% /disco2TB-0
/dev/sdj          1.0T  9.2G 1015G   1% /disco1TB-4

I have a Type: Distributed-Replicate gluster
So my question is: how much disk can be in fail state after losing data or something?

Thanks in advance

---

Gilberto Nunes Ferreira

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users