Re: How to configure?

Diego Zuccato <diego.zuccato@xxxxxxxx> · Tue, 21 Mar 2023 08:55:55 +0100

Killed glfsheal, after a day there were 218 processes, then they got 
killed by OOM during the weekend. Now there are no processes active.
Trying to run "heal info" reports lots of files quite quickly but does 
not spawn any glfsheal process. And neither does restarting glusterd.
Is there some way to selectively run glfsheal to fix one brick at a time?

Diego

Il 21/03/2023 01:21, Strahil Nikolov ha scritto:
Theoretically it might help.
If possible, try to resolve any pending heals.

Best Regards,
Strahil Nikolov

    On Thu, Mar 16, 2023 at 15:29, Diego Zuccato
    <diego.zuccato@xxxxxxxx> wrote:
    In Debian stopping glusterd does not stop brick processes: to stop
    everything (and free the memory) I have to
    systemctl stop glusterd
       killall glusterfs{,d}
       killall glfsheal
       systemctl start glusterd
    [this behaviour hangs a simple reboot of a machine running glusterd...
    not nice]

    For now I just restarted glusterd w/o killing the bricks:

    root@str957-clustor00:~# ps aux|grep glfsheal|wc -l ; systemctl restart
    glusterd ; ps aux|grep glfsheal|wc -l
    618
    618

    No change neither in glfsheal processes nor in free memory :(
    Should I "killall glfsheal" before OOK kicks in?

    Diego

    Il 16/03/2023 12:37, Strahil Nikolov ha scritto:
     > Can you restart glusterd service (first check that it was not
    modified
     > to kill the bricks)?
     >
     > Best Regards,
     > Strahil Nikolov
     >
     >    On Thu, Mar 16, 2023 at 8:26, Diego Zuccato
     >    <diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>> wrote:
     >    OOM is just just a matter of time.
     >
     >    Today mem use is up to 177G/187 and:
     >    # ps aux|grep glfsheal|wc -l
     >    551
     >
     >    (well, one is actually the grep process, so "only" 550 glfsheal
     >    processes.
     >
     >    I'll take the last 5:
     >    root    3266352  0.5  0.0 600292 93044 ?        Sl  06:55  0:07
     >    /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
     >    root    3267220  0.7  0.0 600292 91964 ?        Sl  07:00  0:07
     >    /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
     >    root    3268076  1.0  0.0 600160 88216 ?        Sl  07:05  0:08
     >    /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
     >    root    3269492  1.6  0.0 600292 91248 ?        Sl  07:10  0:07
     >    /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
     >    root    3270354  4.4  0.0 600292 93260 ?        Sl  07:15  0:07
     >    /usr/libexec/glusterfs/glfsheal cluster_data info-summary --xml
     >
     >    -8<--
     >    root@str957-clustor00:~# ps -o ppid= 3266352
     >    3266345
     >    root@str957-clustor00:~# ps -o ppid= 3267220
     >    3267213
     >    root@str957-clustor00:~# ps -o ppid= 3268076
     >    3268069
     >    root@str957-clustor00:~# ps -o ppid= 3269492
     >    3269485
     >    root@str957-clustor00:~# ps -o ppid= 3270354
     >    3270347
     >    root@str957-clustor00:~# ps aux|grep 3266345
     >    root    3266345  0.0  0.0 430536 10764 ?        Sl  06:55  0:00
     >    gluster volume heal cluster_data info summary --xml
     >    root    3271532  0.0  0.0  6260  2500 pts/1    S+  07:21  0:00
    grep
     >    3266345
     >    root@str957-clustor00:~# ps aux|grep 3267213
     >    root    3267213  0.0  0.0 430536 10644 ?        Sl  07:00  0:00
     >    gluster volume heal cluster_data info summary --xml
     >    root    3271599  0.0  0.0  6260  2480 pts/1    S+  07:22  0:00
    grep
     >    3267213
     >    root@str957-clustor00:~# ps aux|grep 3268069
     >    root    3268069  0.0  0.0 430536 10704 ?        Sl  07:05  0:00
     >    gluster volume heal cluster_data info summary --xml
     >    root    3271626  0.0  0.0  6260  2516 pts/1    S+  07:22  0:00
    grep
     >    3268069
     >    root@str957-clustor00:~# ps aux|grep 3269485
     >    root    3269485  0.0  0.0 430536 10756 ?        Sl  07:10  0:00
     >    gluster volume heal cluster_data info summary --xml
     >    root    3271647  0.0  0.0  6260  2480 pts/1    S+  07:22  0:00
    grep
     >    3269485
     >    root@str957-clustor00:~# ps aux|grep 3270347
     >    root    3270347  0.0  0.0 430536 10672 ?        Sl  07:15  0:00
     >    gluster volume heal cluster_data info summary --xml
     >    root    3271666  0.0  0.0  6260  2568 pts/1    S+  07:22  0:00
    grep
     >    3270347
     >    -8<--
     >
     >    Seems glfsheal is spawning more processes.
     >    I can't rule out a metadata corruption (or at least a desync),
    but it
     >    shouldn't happen...
     >
     >    Diego
     >
     >    Il 15/03/2023 20:11, Strahil Nikolov ha scritto:
     >      > If you don't experience any OOM , you can focus on the heals.
     >      >
     >      > 284 processes of glfsheal seems odd.
     >      >
     >      > Can you check the ppid for 2-3 randomly picked ?
     >      > ps -o ppid= <pid>
     >      >
     >      > Best Regards,
     >      > Strahil Nikolov
     >      >
     >      >    On Wed, Mar 15, 2023 at 9:54, Diego Zuccato
     >      >    <diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>
    <mailto:diego.zuccato@xxxxxxxx>> wrote:
     >      >    I enabled it yesterday and that greatly reduced memory
    pressure.
     >      >    Current volume info:
     >      >    -8<--
     >      >    Volume Name: cluster_data
     >      >    Type: Distributed-Replicate
     >      >    Volume ID: a8caaa90-d161-45bb-a68c-278263a8531a
     >      >    Status: Started
     >      >    Snapshot Count: 0
     >      >    Number of Bricks: 45 x (2 + 1) = 135
     >      >    Transport-type: tcp
     >      >    Bricks:
     >      >    Brick1: clustor00:/srv/bricks/00/d
     >      >    Brick2: clustor01:/srv/bricks/00/d
     >      >    Brick3: clustor02:/srv/bricks/00/q (arbiter)
     >      >    [...]
     >      >    Brick133: clustor01:/srv/bricks/29/d
     >      >    Brick134: clustor02:/srv/bricks/29/d
     >      >    Brick135: clustor00:/srv/bricks/14/q (arbiter)
     >      >    Options Reconfigured:
     >      >    performance.quick-read: off
     >      >    cluster.entry-self-heal: on
     >      >    cluster.data-self-heal-algorithm: full
     >      >    cluster.metadata-self-heal: on
     >      >    cluster.shd-max-threads: 2
     >      >    network.inode-lru-limit: 500000
     >      >    performance.md-cache-timeout: 600
     >      >    performance.cache-invalidation: on
     >      >    features.cache-invalidation-timeout: 600
     >      >    features.cache-invalidation: on
     >      >    features.quota-deem-statfs: on
     >      >    performance.readdir-ahead: on
     >      >    cluster.granular-entry-heal: enable
     >      >    features.scrub: Active
     >      >    features.bitrot: on
     >      >    cluster.lookup-optimize: on
     >      >    performance.stat-prefetch: on
     >      >    performance.cache-refresh-timeout: 60
     >      >    performance.parallel-readdir: on
     >      >    performance.write-behind-window-size: 128MB
     >      >    cluster.self-heal-daemon: enable
     >      >    features.inode-quota: on
     >      >    features.quota: on
     >      >    transport.address-family: inet
     >      >    nfs.disable: on
     >      >    performance.client-io-threads: off
     >      >    client.event-threads: 1
     >      >    features.scrub-throttle: normal
     >      >    diagnostics.brick-log-level: ERROR
     >      >    diagnostics.client-log-level: ERROR
     >      >    config.brick-threads: 0
     >      >    cluster.lookup-unhashed: on
     >      >    config.client-threads: 1
     >      >    cluster.use-anonymous-inode: off
     >      >    diagnostics.brick-sys-log-level: CRITICAL
     >      >    features.scrub-freq: monthly
     >      >    cluster.data-self-heal: on
     >      >    cluster.brick-multiplex: on
     >      >    cluster.daemon-log-level: ERROR
     >      >    -8<--
     >      >
     >      >    htop reports that memory usage is up to 143G, there are 602
     >    tasks and
     >      >    5232 threads (~20 running) on clustor00, 117G/49 tasks/1565
     >    threads on
     >      >    clustor01 and 126G/45 tasks/1574 threads on clustor02.
     >      >    I see quite a lot (284!) of glfsheal processes running on
     >    clustor00 (a
     >      >    "gluster v heal cluster_data info summary" is running
    on clustor02
     >      >    since
     >      >    yesterday, still no output). Shouldn't be just one per
    brick?
     >      >
     >      >    Diego
     >      >
     >      >    Il 15/03/2023 08:30, Strahil Nikolov ha scritto:
     >      >      > Do you use brick multiplexing ?
     >      >      >
     >      >      > Best Regards,
     >      >      > Strahil Nikolov
     >      >      >
     >      >      >    On Tue, Mar 14, 2023 at 16:44, Diego Zuccato
     >      >      >    <diego.zuccato@xxxxxxxx
    <mailto:diego.zuccato@xxxxxxxx> <mailto:diego.zuccato@xxxxxxxx>
     >    <mailto:diego.zuccato@xxxxxxxx>> wrote:
     >      >      >    Hello all.
     >      >      >
     >      >      >    Our Gluster 9.6 cluster is showing increasing
    problems.
     >      >      >    Currently it's composed of 3 servers (2x Intel Xeon
     >    4210 [20
     >      >    cores dual
     >      >      >    thread, total 40 threads], 192GB RAM, 30x HGST
     >    HUH721212AL5200
     >      >    [12TB]),
     >      >      >    configured in replica 3 arbiter 1. Using Debian
     >    packages from
     >      >    Gluster
     >      >      >    9.x latest repository.
     >      >      >
     >      >      >    Seems 192G RAM are not enough to handle 30 data
    bricks + 15
     >      >    arbiters
     >      >      >    and
     >      >      >    I often had to reload glusterfsd because glusterfs
     >    processed
     >      >    got killed
     >      >      >    for OOM.
     >      >      >    On top of that, performance have been quite bad,
    especially
     >      >    when we
     >      >      >    reached about 20M files. On top of that, one of
    the servers
     >      >    have had
     >      >      >    mobo issues that resulted in memory errors that
     >    corrupted some
     >      >      >    bricks fs
     >      >      >    (XFS, it required "xfs_reparir -L" to fix).
     >      >      >    Now I'm getting lots of "stale file handle"
    errors and
     >    other
     >      >    errors
     >      >      >    (like directories that seem empty from the
    client but still
     >      >    containing
     >      >      >    files in some bricks) and auto healing seems
    unable to
     >    complete.
     >      >      >
     >      >      >    Since I can't keep up continuing to manually fix
    all the
     >      >    issues, I'm
     >      >      >    thinking about backup+destroy+recreate strategy.
     >      >      >
     >      >      >    I think that if I reduce the number of bricks per
     >    server to just 5
     >      >      >    (RAID1 of 6x12TB disks) I might resolve RAM
    issues - at the
     >      >    cost of
     >      >      >    longer heal times in case a disk fails. Am I
    right or it's
     >      >    useless?
     >      >      >    Other recommendations?
     >      >      >    Servers have space for another 6 disks. Maybe those
     >    could be
     >      >    used for
     >      >      >    some SSDs to speed up access?
     >      >      >
     >      >      >    TIA.
     >      >      >
     >      >      >    --
     >      >      >    Diego Zuccato
     >      >      >    DIFA - Dip. di Fisica e Astronomia
     >      >      >    Servizi Informatici
     >      >      >    Alma Mater Studiorum - Università di Bologna
     >      >      >    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
     >      >      >    tel.: +39 051 20 95786
     >      >      >    ________
     >      >      >
     >      >      >
     >      >      >
     >      >      >    Community Meeting Calendar:
     >      >      >
     >      >      >    Schedule -
     >      >      >    Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
     >      >      >    Bridge: https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>
     >    <https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>>
     >      >    <https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>
     >    <https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>>>
     >      >      >    <https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>
     >    <https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>>
     >      >    <https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>
     >    <https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>>>>
     >      >      >    Gluster-users mailing list
     >      >      > Gluster-users@xxxxxxxxxxx
    <mailto:Gluster-users@xxxxxxxxxxx>
     >    <mailto:Gluster-users@xxxxxxxxxxx>
    <mailto:Gluster-users@xxxxxxxxxxx>
     >      >    <mailto:Gluster-users@xxxxxxxxxxx>
     >      >      >
    https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>
     >    <https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>>
     >      >   
    <https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>
     >    <https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>>>
     >      >      >
     >    <https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>
     >    <https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>>
     >      >   
    <https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>
     >    <https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>>>>

     >
     >      >
     >      >      >
     >      >
     >      >    --
     >      >    Diego Zuccato
     >      >    DIFA - Dip. di Fisica e Astronomia
     >      >    Servizi Informatici
     >      >    Alma Mater Studiorum - Università di Bologna
     >      >    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
     >      >    tel.: +39 051 20 95786
     >      >    ________
     >      >
     >      >
     >      >
     >      >    Community Meeting Calendar:
     >      >
     >      >    Schedule -
     >      >    Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
     >      >    Bridge: https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>
     >    <https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>>
     >      >    <https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>
     >    <https://meet.google.com/cpu-eiue-hvk
    <https://meet.google.com/cpu-eiue-hvk>>>
     >      >    Gluster-users mailing list
     >      > Gluster-users@xxxxxxxxxxx
    <mailto:Gluster-users@xxxxxxxxxxx> <mailto:Gluster-users@xxxxxxxxxxx>
     >    <mailto:Gluster-users@xxxxxxxxxxx>
     >      > https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>
     >    <https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>>
     >      >   
    <https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>
     >    <https://lists.gluster.org/mailman/listinfo/gluster-users
    <https://lists.gluster.org/mailman/listinfo/gluster-users>>>
     >      >
     >
     >    --
     >    Diego Zuccato
     >    DIFA - Dip. di Fisica e Astronomia
     >    Servizi Informatici
     >    Alma Mater Studiorum - Università di Bologna
     >    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
     >    tel.: +39 051 20 95786
     >

    -- 
    Diego Zuccato
    DIFA - Dip. di Fisica e Astronomia
    Servizi Informatici
    Alma Mater Studiorum - Università di Bologna
    V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
    tel.: +39 051 20 95786

--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786
________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users