Re: gluster heal performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Il 2020-09-11 05:27 Martin Bähr ha scritto:
Excerpts from Gionatan Danti's message of 2020-09-11 00:35:52 +0200:
The main point was the potentially long heal time

could you (or anyone else) please elaborate on what long heal times are
to be expected?

Hi, there are multiple factor at works here:
- healing via network (gluster) vs internal bus data transfer (RAID rebuild); - gluster being a user-space application which commands a significant CPU load; - healing proceeding per-file and not in LBA order (ie: it has to traverse all the affected files/dirs, which means scattered random IO for the most part);
- other things which I am surely missing.

we have a 3-node replica cluster running version 3.12.9 (we are building
a new cluster now) with 32TiB of space. each node has a single brick on
top of a 7-disk raid5 (linux softraid)

3.12.9, while being the official RHEL 7 release, is very old now.

at one point we had one node unavailable for one month (gluster failed
to start up properly on that node and we didn't have monitoring in place
to notice) and the accumulated changes of one month of operation took 4
months to heal. i would have expected this ideally to take 2 weeks or
less, one month at the worst (ie faster than or at least as fast as it
took to create the data but not slower, and especially not 4 times
slower)

Wow, 4 months is a lot... but you had at least internal redundancy (RAID5 bricks). The OP was asking about running with *no* internal redundancy and this is the reason I suggest against it: losing a disk while needing weeks to heal is not good.

the initial heal count was about 6million files for one node and
5.4million for the other.
...
we do have a few huge directories with 250000, 88000, 60000 and 29000
subdirectories each. in total 26TiB of small files, but no more than
a few 1000 per directory. (it's user data, some have more, some have
less)

could those huge directories be responsible for the slow healing?

The very high number of to-be-healed files surely has a negative impact on your heal speed.

Regards.

--
Danti Gionatan
Supporto Tecnico
Assyoma S.r.l. - www.assyoma.it
email: g.danti@xxxxxxxxxx - info@xxxxxxxxxx
GPG public key ID: FF5F32A8
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users




[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux