Re: Adding arbiter on a large existing replica 2 set

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

The new cluster is set up with two physical servers with HDDs and a VM backed by an all-flash stretched vSAN.
The old cluster will be set up the same way.

The main volume that I'm concerned about usually takes about 20-30 minutes to finish the self-heal, the network is 10Gbps.


Best regards
-- 
THORGEIR MARTHINUSSEN
Senior Systems Consultant
BASEFARM

-----Original Message-----
From: Strahil <hunter86_bg@xxxxxxxxx>
Subject: Re: [Gluster-users] Adding arbiter on a large existing replica 2 set
Date: Wed, 16 Oct 2019 21:04:50 +0300

Hi Thorgeir,

Did you try adding an arbiter with SSD brick/bricks ?

SSD/NVMe is the best type of storage for an arbiter - yes , it's more expensive but you will need less disks than a data brick .

Of course , arbiter is only one side of the equasion and the time to heal might depend on your data bricks' IOPS.

How much time does a node in the cluster need to heal after being reboot ?

Best Regards,
Strahil Nikolov

On Oct 16, 2019 16:37, Thorgeir Marthinussen <thorgeir.marthinussen@xxxxxxxxxxxx> wrote:
Hi,

We have an old Gluster cluster setup, running a replica 2 across two datacenters, and currently on version 4.1.5

I need to add an arbiter to this setup, but I'm concerned about the performance impact of this on the volumes.

I recently set up a new cluster, for a different purpose, and decided to test adding an arbiter to the volume after adding in some data.
Had a volume with ~435,000 files totaling about 12TB.
Adding the arbiter initiated a heal-operation that took almost 3 hours.

The older cluster, one of the volumes is about 14TB, but ~45,5 million files.

Since arbiter is only concerned about metadata and checksums, I'm concerned about the fact that we have 100 times the amount of files, i.e. 100 times the amount of I/O operations to execute during healing, and possibly 100 times the time which would mean about 12,5 days.

Another "issue" is that the 'gluster volume heal <vol-name> info summary' command seems to "count" all the files, so the command can take a very long time to complete.
The metrics-scraping script I created for us, with a timeout of 110seconds, fails to complete when a volume has over ~800-900 files unsynced (which happens regularily when taking one cluster-node down for patching).


Does anyone have any experience with adding arbiter afterwards, performance impact, time to heal, etc.
Also other ways to get the status on healing.

Any advice would be appreciated.


Best regards
-- 
THORGEIR MARTHINUSSEN
Senior Systems Consultant
BASEFARM
________

Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux