Re: gluster heal performance

Strahil Nikolov <hunter86_bg@xxxxxxxxx> · Fri, 11 Sep 2020 16:58:58 +0000 (UTC)

If you run a 'find /<full-path-to-brick>' would probably take a lot of time. Now imagine that gluster also has to check the extended attributes of each file/dir and it becomes quite slow.

Also , we do not know anything of the following:
- HDDs IOPS rating
- mount options for the bricks
- I/O scheduler for your HDDs and DM devices
- is selinux enabled
- Network bandwidth
- Load on the 2 servers before and after the heal has been started
- Volume settings

I would expect that gluster heal is with lower ionice than regular read/write operations coming from the clients - nobody wants a cluster dead due to a healing operation ongoing.Also , check the logs on all nodes for any errors during the healing - maybe you got some issues that were not noticed before.

Best Regards,
Strahil Nikolov

В петък, 11 септември 2020 г., 12:13:06 Гринуич+3, Martin Bähr <mbaehr+gluster@xxxxxxxxxx> написа: 

Excerpts from Gionatan Danti's message of 2020-09-11 08:34:04 +0200:
> > we have a 3-node replica cluster running version 3.12.9 
> > with 32TiB of space. each node has a single brick on
> > top of a 7-disk raid5 (linux softraid)
> 3.12.9, while being the official RHEL 7 release, is very old now.

yes, i am aware. we didn't bother upgrading as we need to expand
capacity and it's cheaper to rent new servers than expand the old ones.

> > the accumulated changes of one month of operation took 4 months to
> > heal.
> Wow, 4 months is a lot... but you had at least internal redundancy 
> (RAID5 bricks). 

right, that, and we had 3 replicas. we could have just dropped the third
node, and would still have been ok.

for the new cluster we decided that 2 nodes is enough, because the data
is all backups anyways. even if we loose both nodes, we can at least in
theory still recover all the data. whether that's a good decision is a
risk calculation. is a third server worth the extra expense? we decided
that, for what is essentially a backup, it's not.

i considered 3 nodes but dropping the raid instead, but several comments
inclusing yours convinced me that keeping the raid is good. on the new
servers we'll each have 3 bricks with 5 disks in a raid 5 per brick.

> > the initial heal count was about 6million files for one node and
> > 5.4million for the other.
> > ...
> > we do have a few huge directories with 250000, 88000, 60000 and 29000
> > subdirectories each. in total 26TiB of small files, but no more than
> > a few 1000 per directory. (it's user data, some have more, some have
> > less)
> > 
> > could those huge directories be responsible for the slow healing?
> 
> The very high number of to-be-healed files surely has a negative impact 
> on your heal speed.

that sounds like that there is an inefficiency within the healing
process that causes the healing speed to be non-linear depending on the
number of files.

greetings, martin.
--
general manager                                                    realss.com
student mentor                                                  fossasia.org
community mentor    blug.sh                                  beijinglug.club
pike programmer      pike.lysator.liu.se    caudium.net    societyserver.org
Martin Bähr          working in china        http://societyserver.org/mbaehr/

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users