Re: How to configure?

Diego Zuccato <diego.zuccato@xxxxxxxx> · Thu, 16 Mar 2023 07:25:50 +0100

OOM is just just a matter of time.

Today mem use is up to 177G/187 and:
# ps aux|grep glfsheal|wc -l
551

(well, one is actually the grep process, so "only" 550 glfsheal processes.

I'll take the last 5:
root     3266352  0.5  0.0 600292 93044 ?        Sl   06:55   0:07 
/usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
root     3267220  0.7  0.0 600292 91964 ?        Sl   07:00   0:07 
/usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
root     3268076  1.0  0.0 600160 88216 ?        Sl   07:05   0:08 
/usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
root     3269492  1.6  0.0 600292 91248 ?        Sl   07:10   0:07 
/usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
root     3270354  4.4  0.0 600292 93260 ?        Sl   07:15   0:07 
/usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml

-8<--
root@str957-clustor00:~# ps -o ppid= 3266352
3266345
root@str957-clustor00:~# ps -o ppid= 3267220
3267213
root@str957-clustor00:~# ps -o ppid= 3268076
3268069
root@str957-clustor00:~# ps -o ppid= 3269492
3269485
root@str957-clustor00:~# ps -o ppid= 3270354
3270347
root@str957-clustor00:~# ps aux|grep 3266345
root     3266345  0.0  0.0 430536 10764 ?        Sl   06:55   0:00 
gluster volume heal cluster_data info summary --xml
root     3271532  0.0  0.0   6260  2500 pts/1    S+   07:21   0:00 grep 
3266345
root@str957-clustor00:~# ps aux|grep 3267213
root     3267213  0.0  0.0 430536 10644 ?        Sl   07:00   0:00 
gluster volume heal cluster_data info summary --xml
root     3271599  0.0  0.0   6260  2480 pts/1    S+   07:22   0:00 grep 
3267213
root@str957-clustor00:~# ps aux|grep 3268069
root     3268069  0.0  0.0 430536 10704 ?        Sl   07:05   0:00 
gluster volume heal cluster_data info summary --xml
root     3271626  0.0  0.0   6260  2516 pts/1    S+   07:22   0:00 grep 
3268069
root@str957-clustor00:~# ps aux|grep 3269485
root     3269485  0.0  0.0 430536 10756 ?        Sl   07:10   0:00 
gluster volume heal cluster_data info summary --xml
root     3271647  0.0  0.0   6260  2480 pts/1    S+   07:22   0:00 grep 
3269485
root@str957-clustor00:~# ps aux|grep 3270347
root     3270347  0.0  0.0 430536 10672 ?        Sl   07:15   0:00 
gluster volume heal cluster_data info summary --xml
root     3271666  0.0  0.0   6260  2568 pts/1    S+   07:22   0:00 grep 
3270347
-8<--

Seems glfsheal is spawning more processes.
I can't rule out a metadata corruption (or at least a desync), but it 
shouldn't happen...

Diego

Il 15/03/2023 20:11, Strahil Nikolov ha scritto:
If you don't experience any OOM , you can focus on the heals.

284 processes of glfsheal seems odd.

Can you check the ppid for 2-3 randomly picked ?
ps -o ppid= <pid>

Best Regards,
Strahil Nikolov

    On Wed, Mar 15, 2023 at 9:54, Diego Zuccato
    <diego.zuccato@xxxxxxxx> wrote:
    I enabled it yesterday and that greatly reduced memory pressure.
    Current volume info:
    -8<--
    Volume Name: cluster_data
    Type: Distributed-Replicate
    Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a
    Status: Started
    Snapshot Count: 0
    Number of Bricks: 45 x (2 + 1) = 135
    Transport-type: tcp
    Bricks:
    Brick1: clustor00:/srv/bricks/00/d
    Brick2: clustor01:/srv/bricks/00/d
    Brick3: clustor02:/srv/bricks/00/q (arbiter)
    [...]
    Brick133: clustor01:/srv/bricks/29/d
    Brick134: clustor02:/srv/bricks/29/d
    Brick135: clustor00:/srv/bricks/14/q (arbiter)
    Options Reconfigured:
    performance.quick-read: off
    cluster.entry-self-heal: on
    cluster.data-self-heal-algorithm: full
    cluster.metadata-self-heal: on
    cluster.shd-max-threads: 2
    network.inode-lru-limit: 500000
    performance.md-cache-timeout: 600
    performance.cache-invalidation: on
    features.cache-invalidation-timeout: 600
    features.cache-invalidation: on
    features.quota-deem-statfs: on
    performance.readdir-ahead: on
    cluster.granular-entry-heal: enable
    features.scrub: Active
    features.bitrot: on
    cluster.lookup-optimize: on
    performance.stat-prefetch: on
    performance.cache-refresh-timeout: 60
    performance.parallel-readdir: on
    performance.write-behind-window-size: 128MB
    cluster.self-heal-daemon: enable
    features.inode-quota: on
    features.quota: on
    transport.address-family: inet
    nfs.disable: on
    performance.client-io-threads: off
    client.event-threads: 1
    features.scrub-throttle: normal
    diagnostics.brick-log-level: ERROR
    diagnostics.client-log-level: ERROR
    config.brick-threads: 0
    cluster.lookup-unhashed: on
    config.client-threads: 1
    cluster.use-anonymous-inode: off
    diagnostics.brick-sys-log-level: CRITICAL
    features.scrub-freq: monthly
    cluster.data-self-heal: on
    cluster.brick-multiplex: on
    cluster.daemon-log-level: ERROR
    -8<--

    htop reports that memory usage is up to 143G, there are 602 tasks and
    5232 threads (~20 running) on clustor00, 117G/49 tasks/1565 threads on
    clustor01 and 126G/45 tasks/1574 threads on clustor02.
    I see quite a lot (284!) of glfsheal processes running on clustor00 (a
    "gluster v heal cluster_data info summary" is running on clustor02
    since
    yesterday, still no output). Shouldn't be just one per brick?

    Diego

    Il 15/03/2023 08:30, Strahil Nikolov ha scritto:
     > Do you use brick multiplexing ?
     >
     > Best Regards,
     > Strahil Nikolov
     >
     >    On Tue, Mar 14, 2023 at 16:44, Diego Zuccato
     >    <diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>> wrote:
     >    Hello all.
     >
     >    Our Gluster 9.6 cluster is showing increasing problems.
     >    Currently it's composed of 3 servers (2x Intel Xeon 4210 [20
    cores dual
     >    thread, total 40 threads], 192GB RAM, 30x HGST HUH721212AL5200
    [12TB]),
     >    configured in replica 3 arbiter 1. Using Debian packages from
    Gluster
     >    9.x latest repository.
     >
     >    Seems 192G RAM are not enough to handle 30 data bricks + 15
    arbiters
     >    and
     >    I often had to reload glusterfsd because glusterfs processed
    got killed
     >    for OOM.
     >    On top of that, performance have been quite bad, especially
    when we
     >    reached about 20M files. On top of that, one of the servers
    have had
     >    mobo issues that resulted in memory errors that corrupted some
     >    bricks fs
     >    (XFS, it required "xfs_reparir -L" to fix).
     >    Now I'm getting lots of "stale file handle" errors and other
    errors
     >    (like directories that seem empty from the client but still
    containing
     >    files in some bricks) and auto healing seems unable to complete.
     >
     >    Since I can't keep up continuing to manually fix all the
    issues, I'm
     >    thinking about backup+destroy+recreate strategy.
     >
     >    I think that if I reduce the number of bricks per server to just 5
     >    (RAID1 of 6x12TB disks) I might resolve RAM issues - at the
    cost of
     >    longer heal times in case a disk fails. Am I right or it's
    useless?
     >    Other recommendations?
     >    Servers have space for another 6 disks. Maybe those could be
    used for
     >    some SSDs to speed up access?
     >
     >    TIA.
     >
     >    --
     >    Diego Zuccato
     >    DIFA - Dip. di Fisica e Astronomia
     >    Servizi Informatici
     >    Alma Mater Studiorum - Università di Bologna
     >    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
     >    tel.: +39 051 20 95786
     >    ________
     >
     >
     >
     >    Community Meeting Calendar:
     >
     >    Schedule -
     >    Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
     >    Bridge: https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>
     >    <https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>>
     >    Gluster-users mailing list
     > Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
    <mailto:Gluster-users@xxxxxxxxxxx>
     > https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>
     >    <https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>>

     >

    -- 
    Diego Zuccato
    DIFA - Dip. di Fisica e Astronomia
    Servizi Informatici
    Alma Mater Studiorum - Università di Bologna
    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
    tel.: +39 051 20 95786
    ________

    Community Meeting Calendar:

    Schedule -
    Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
    Bridge: https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>
    Gluster-users mailing list
    Gluster-users@xxxxxxxxxxx <mailto:Gluster-users@xxxxxxxxxxx>
    https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users