Killed glfsheal, after a day there were 218 processes, then they got
killed by OOM during the weekend. Now there are no processes active.
Trying to run "heal info" reports lots of files quite quickly but does
not spawn any glfsheal process. And neither does restarting glusterd.
Is there some way to selectively run glfsheal to fix one brick at a time?
Diego
Il 21/03/2023 01:21, Strahil Nikolov ha scritto:
Theoretically it might help.
If possible, try to resolve any pending heals.
Best Regards,
Strahil Nikolov
On Thu, Mar 16, 2023 at 15:29, Diego Zuccato
<diego.zuccato@xxxxxxxx> wrote:
In Debian stopping glusterd does not stop brick processes: to stop
everything (and free the memory) I have to
systemctl stop glusterd
killall glusterfs{,d}
killall glfsheal
systemctl start glusterd
[this behaviour hangs a simple reboot of a machine running glusterd...
not nice]
For now I just restarted glusterd w/o killing the bricks:
root@str957-clustor00:~# ps aux|grep glfsheal|wc -l ; systemctl restart
glusterd ; ps aux|grep glfsheal|wc -l
618
618
No change neither in glfsheal processes nor in free memory :(
Should I "killall glfsheal" before OOK kicks in?
Diego
Il 16/03/2023 12:37, Strahil Nikolov ha scritto:
> Can you restart glusterd service (first check that it was not
modified
> to kill the bricks)?
>
> Best Regards,
> Strahil Nikolov
>
> On Thu, Mar 16, 2023 at 8:26, Diego Zuccato
> <diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>> wrote:
> OOM is just just a matter of time.
>
> Today mem use is up to 177G/187 and:
> # ps aux|grep glfsheal|wc -l
> 551
>
> (well, one is actually the grep process, so "only" 550 glfsheal
> processes.
>
> I'll take the last 5:
> root 3266352 0.5 0.0 600292 93044 ? Sl 06:55 0:07
> /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
> root 3267220 0.7 0.0 600292 91964 ? Sl 07:00 0:07
> /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
> root 3268076 1.0 0.0 600160 88216 ? Sl 07:05 0:08
> /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
> root 3269492 1.6 0.0 600292 91248 ? Sl 07:10 0:07
> /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
> root 3270354 4.4 0.0 600292 93260 ? Sl 07:15 0:07
> /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
>
> -8<--
> root@str957-clustor00:~# ps -o ppid= 3266352
> 3266345
> root@str957-clustor00:~# ps -o ppid= 3267220
> 3267213
> root@str957-clustor00:~# ps -o ppid= 3268076
> 3268069
> root@str957-clustor00:~# ps -o ppid= 3269492
> 3269485
> root@str957-clustor00:~# ps -o ppid= 3270354
> 3270347
> root@str957-clustor00:~# ps aux|grep 3266345
> root 3266345 0.0 0.0 430536 10764 ? Sl 06:55 0:00
> gluster volume heal cluster_data info summary --xml
> root 3271532 0.0 0.0 6260 2500 pts/1 S+ 07:21 0:00
grep
> 3266345
> root@str957-clustor00:~# ps aux|grep 3267213
> root 3267213 0.0 0.0 430536 10644 ? Sl 07:00 0:00
> gluster volume heal cluster_data info summary --xml
> root 3271599 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00
grep
> 3267213
> root@str957-clustor00:~# ps aux|grep 3268069
> root 3268069 0.0 0.0 430536 10704 ? Sl 07:05 0:00
> gluster volume heal cluster_data info summary --xml
> root 3271626 0.0 0.0 6260 2516 pts/1 S+ 07:22 0:00
grep
> 3268069
> root@str957-clustor00:~# ps aux|grep 3269485
> root 3269485 0.0 0.0 430536 10756 ? Sl 07:10 0:00
> gluster volume heal cluster_data info summary --xml
> root 3271647 0.0 0.0 6260 2480 pts/1 S+ 07:22 0:00
grep
> 3269485
> root@str957-clustor00:~# ps aux|grep 3270347
> root 3270347 0.0 0.0 430536 10672 ? Sl 07:15 0:00
> gluster volume heal cluster_data info summary --xml
> root 3271666 0.0 0.0 6260 2568 pts/1 S+ 07:22 0:00
grep
> 3270347
> -8<--
>
> Seems glfsheal is spawning more processes.
> I can't rule out a metadata corruption (or at least a desync),
but it
> shouldn't happen...
>
> Diego
>
> Il 15/03/2023 20:11, Strahil Nikolov ha scritto:
> > If you don't experience any OOM , you can focus on the heals.
> >
> > 284 processes of glfsheal seems odd.
> >
> > Can you check the ppid for 2-3 randomly picked ?
> > ps -o ppid= <pid>
> >
> > Best Regards,
> > Strahil Nikolov
> >
> > On Wed, Mar 15, 2023 at 9:54, Diego Zuccato
> > <diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>
<mailto:diego.zuccato@xxxxxxxx>> wrote:
> > I enabled it yesterday and that greatly reduced memory
pressure.
> > Current volume info:
> > -8<--
> > Volume Name: cluster_data
> > Type: Distributed-Replicate
> > Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a
> > Status: Started
> > Snapshot Count: 0
> > Number of Bricks: 45 x (2 + 1) = 135
> > Transport-type: tcp
> > Bricks:
> > Brick1: clustor00:/srv/bricks/00/d
> > Brick2: clustor01:/srv/bricks/00/d
> > Brick3: clustor02:/srv/bricks/00/q (arbiter)
> > [...]
> > Brick133: clustor01:/srv/bricks/29/d
> > Brick134: clustor02:/srv/bricks/29/d
> > Brick135: clustor00:/srv/bricks/14/q (arbiter)
> > Options Reconfigured:
> > performance.quick-read: off
> > cluster.entry-self-heal: on
> > cluster.data-self-heal-algorithm: full
> > cluster.metadata-self-heal: on
> > cluster.shd-max-threads: 2
> > network.inode-lru-limit: 500000
> > performance.md-cache-timeout: 600
> > performance.cache-invalidation: on
> > features.cache-invalidation-timeout: 600
> > features.cache-invalidation: on
> > features.quota-deem-statfs: on
> > performance.readdir-ahead: on
> > cluster.granular-entry-heal: enable
> > features.scrub: Active
> > features.bitrot: on
> > cluster.lookup-optimize: on
> > performance.stat-prefetch: on
> > performance.cache-refresh-timeout: 60
> > performance.parallel-readdir: on
> > performance.write-behind-window-size: 128MB
> > cluster.self-heal-daemon: enable
> > features.inode-quota: on
> > features.quota: on
> > transport.address-family: inet
> > nfs.disable: on
> > performance.client-io-threads: off
> > client.event-threads: 1
> > features.scrub-throttle: normal
> > diagnostics.brick-log-level: ERROR
> > diagnostics.client-log-level: ERROR
> > config.brick-threads: 0
> > cluster.lookup-unhashed: on
> > config.client-threads: 1
> > cluster.use-anonymous-inode: off
> > diagnostics.brick-sys-log-level: CRITICAL
> > features.scrub-freq: monthly
> > cluster.data-self-heal: on
> > cluster.brick-multiplex: on
> > cluster.daemon-log-level: ERROR
> > -8<--
> >
> > htop reports that memory usage is up to 143G, there are 602
> tasks and
> > 5232 threads (~20 running) on clustor00, 117G/49 tasks/1565
> threads on
> > clustor01 and 126G/45 tasks/1574 threads on clustor02.
> > I see quite a lot (284!) of glfsheal processes running on
> clustor00 (a
> > "gluster v heal cluster_data info summary" is running
on clustor02
> > since
> > yesterday, still no output). Shouldn't be just one per
brick?
> >
> > Diego
> >
> > Il 15/03/2023 08:30, Strahil Nikolov ha scritto:
> > > Do you use brick multiplexing ?
> > >
> > > Best Regards,
> > > Strahil Nikolov
> > >
> > > On Tue, Mar 14, 2023 at 16:44, Diego Zuccato
> > > <diego.zuccato@xxxxxxxx
<mailto:diego.zuccato@xxxxxxxx> <mailto:diego.zuccato@xxxxxxxx>
> <mailto:diego.zuccato@xxxxxxxx>> wrote:
> > > Hello all.
> > >
> > > Our Gluster 9.6 cluster is showing increasing
problems.
> > > Currently it's composed of 3 servers (2x Intel Xeon
> 4210 [20
> > cores dual
> > > thread, total 40 threads], 192GB RAM, 30x HGST
> HUH721212AL5200
> > [12TB]),
> > > configured in replica 3 arbiter 1. Using Debian
> packages from
> > Gluster
> > > 9.x latest repository.
> > >
> > > Seems 192G RAM are not enough to handle 30 data
bricks + 15
> > arbiters
> > > and
> > > I often had to reload glusterfsd because glusterfs
> processed
> > got killed
> > > for OOM.
> > > On top of that, performance have been quite bad,
especially
> > when we
> > > reached about 20M files. On top of that, one of
the servers
> > have had
> > > mobo issues that resulted in memory errors that
> corrupted some
> > > bricks fs
> > > (XFS, it required "xfs_reparir -L" to fix).
> > > Now I'm getting lots of "stale file handle"
errors and
> other
> > errors
> > > (like directories that seem empty from the
client but still
> > containing
> > > files in some bricks) and auto healing seems
unable to
> complete.
> > >
> > > Since I can't keep up continuing to manually fix
all the
> > issues, I'm
> > > thinking about backup+destroy+recreate strategy.
> > >
> > > I think that if I reduce the number of bricks per
> server to just 5
> > > (RAID1 of 6x12TB disks) I might resolve RAM
issues - at the
> > cost of
> > > longer heal times in case a disk fails. Am I
right or it's
> > useless?
> > > Other recommendations?
> > > Servers have space for another 6 disks. Maybe those
> could be
> > used for
> > > some SSDs to speed up access?
> > >
> > > TIA.
> > >
> > > --
> > > Diego Zuccato
> > > DIFA - Dip. di Fisica e Astronomia
> > > Servizi Informatici
> > > Alma Mater Studiorum - Università di Bologna
> > > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> > > tel.: +39 051 20 95786
> > > ________
> > >
> > >
> > >
> > > Community Meeting Calendar:
> > >
> > > Schedule -
> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > > Bridge: https://meet.google.com/cpu-eiue-hvk
<https://meet.google.com/cpu-eiue-hvk>
> <https://meet.google.com/cpu-eiue-hvk
<https://meet.google.com/cpu-eiue-hvk>>
> > <https://meet.google.com/cpu-eiue-hvk
<https://meet.google.com/cpu-eiue-hvk>
> <https://meet.google.com/cpu-eiue-hvk
<https://meet.google.com/cpu-eiue-hvk>>>
> > > <https://meet.google.com/cpu-eiue-hvk
<https://meet.google.com/cpu-eiue-hvk>
> <https://meet.google.com/cpu-eiue-hvk
<https://meet.google.com/cpu-eiue-hvk>>
> > <https://meet.google.com/cpu-eiue-hvk
<https://meet.google.com/cpu-eiue-hvk>
> <https://meet.google.com/cpu-eiue-hvk
<https://meet.google.com/cpu-eiue-hvk>>>>
> > > Gluster-users mailing list
> > > Gluster-users@xxxxxxxxxxx
<mailto:Gluster-users@xxxxxxxxxxx>
> <mailto:Gluster-users@xxxxxxxxxxx>
<mailto:Gluster-users@xxxxxxxxxxx>
> > <mailto:Gluster-users@xxxxxxxxxxx>
> > >
https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>
> <https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>>
> >
<https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>
> <https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>>>
> > >
> <https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>
> <https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>>
> >
<https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>
> <https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>>>>
>
> >
> > >
> >
> > --
> > Diego Zuccato
> > DIFA - Dip. di Fisica e Astronomia
> > Servizi Informatici
> > Alma Mater Studiorum - Università di Bologna
> > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> > tel.: +39 051 20 95786
> > ________
> >
> >
> >
> > Community Meeting Calendar:
> >
> > Schedule -
> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
> > Bridge: https://meet.google.com/cpu-eiue-hvk
<https://meet.google.com/cpu-eiue-hvk>
> <https://meet.google.com/cpu-eiue-hvk
<https://meet.google.com/cpu-eiue-hvk>>
> > <https://meet.google.com/cpu-eiue-hvk
<https://meet.google.com/cpu-eiue-hvk>
> <https://meet.google.com/cpu-eiue-hvk
<https://meet.google.com/cpu-eiue-hvk>>>
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
<mailto:Gluster-users@xxxxxxxxxxx> <mailto:Gluster-users@xxxxxxxxxxx>
> <mailto:Gluster-users@xxxxxxxxxxx>
> > https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>
> <https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>>
> >
<https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>
> <https://lists.gluster.org/mailman/listinfo/gluster-users
<https://lists.gluster.org/mailman/listinfo/gluster-users>>>
> >
>
> --
> Diego Zuccato
> DIFA - Dip. di Fisica e Astronomia
> Servizi Informatici
> Alma Mater Studiorum - Università di Bologna
> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
> tel.: +39 051 20 95786
>
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
________
Community Meeting Calendar:
Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users