Best Regards,
Strahil Nikolov
On Fri, Mar 24, 2023 at 15:59, Diego Zuccato<diego.zuccato@xxxxxxxx> wrote:There are 285 files in /var/lib/glusterd/vols/cluster_data ... includingmany files with names related to quorum bricks already moved to adifferent path (like cluster_data.client.clustor02.srv-quorum-00-d.volthat should already have been replaced bycluster_data.clustor02.srv-bricks-00-q.vol -- and both vol files exist).Is there something I should check inside the volfiles?DiegoIl 24/03/2023 13:05, Strahil Nikolov ha scritto:> Can you check your volume file contents?> Maybe it really can't find (or access) a specific volfile ?>> Best Regards,> Strahil Nikolov>> On Fri, Mar 24, 2023 at 8:07, Diego Zuccato> <diego.zuccato@xxxxxxxx> wrote:> In glfsheal-Connection.log I see many lines like:> [2023-03-13 23:04:40.241481 +0000] E [MSGID: 104021]> [glfs-mgmt.c:586:glfs_mgmt_getspec_cbk] 0-gfapi: failed to get the> volume file [{from server}, {errno=2}, {error=File o directory non> esistente}]>> And *lots* of gfid-mismatch errors in glustershd.log .>> Couldn't find anything that would prevent heal to start. :(>> Diego>> Il 21/03/2023 20:39, Strahil Nikolov ha scritto:> > I have no clue. Have you checked for errors in the logs ? Maybe you> > might find something useful.> >> > Best Regards,> > Strahil Nikolov> >> > On Tue, Mar 21, 2023 at 9:56, Diego Zuccato> > <diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>> wrote:> > Killed glfsheal, after a day there were 218 processes, then> they got> > killed by OOM during the weekend. Now there are no processes> active.> > Trying to run "heal info" reports lots of files quite quickly> but does> > not spawn any glfsheal process. And neither does restarting> glusterd.> > Is there some way to selectively run glfsheal to fix one brick> at a> > time?> >> > Diego> >> > Il 21/03/2023 01:21, Strahil Nikolov ha scritto:> > > Theoretically it might help.> > > If possible, try to resolve any pending heals.> > >> > > Best Regards,> > > Strahil Nikolov> > >> > > On Thu, Mar 16, 2023 at 15:29, Diego Zuccato> > > <diego.zuccato@xxxxxxxx <mailto:diego.zuccato@xxxxxxxx>> <mailto:diego.zuccato@xxxxxxxx>> wrote:> > > In Debian stopping glusterd does not stop brick> processes: to stop> > > everything (and free the memory) I have to> > > systemctl stop glusterd> > > killall glusterfs{,d}> > > killall glfsheal> > > systemctl start glusterd> > > [this behaviour hangs a simple reboot of a machine running> > glusterd...> > > not nice]> > >> > > For now I just restarted glusterd w/o killing the bricks:> > >> > > root@str957-clustor00:~# ps aux|grep glfsheal|wc -l ;> > systemctl restart> > > glusterd ; ps aux|grep glfsheal|wc -l> > > 618> > > 618> > >> > > No change neither in glfsheal processes nor in free> memory :(> > > Should I "killall glfsheal" before OOK kicks in?> > >> > > Diego> > >> > > Il 16/03/2023 12:37, Strahil Nikolov ha scritto:> > > > Can you restart glusterd service (first check that> it was not> > > modified> > > > to kill the bricks)?> > > >> > > > Best Regards,> > > > Strahil Nikolov> > > >> > > > On Thu, Mar 16, 2023 at 8:26, Diego Zuccato> > > > <diego.zuccato@xxxxxxxx> <mailto:diego.zuccato@xxxxxxxx> <mailto:diego.zuccato@xxxxxxxx>> > <mailto:diego.zuccato@xxxxxxxx>> wrote:> > > > OOM is just just a matter of time.> > > >> > > > Today mem use is up to 177G/187 and:> > > > # ps aux|grep glfsheal|wc -l> > > > 551> > > >> > > > (well, one is actually the grep process, so> "only" 550> > glfsheal> > > > processes.> > > >> > > > I'll take the last 5:> > > > root 3266352 0.5 0.0 600292 93044 ? Sl> > 06:55 0:07> > > > /usr/libexec/glusterfs/glfsheal cluster_data> > info-summary --xml> > > > root 3267220 0.7 0.0 600292 91964 ? Sl> > 07:00 0:07> > > > /usr/libexec/glusterfs/glfsheal cluster_data> > info-summary --xml> > > > root 3268076 1.0 0.0 600160 88216 ? Sl> > 07:05 0:08> > > > /usr/libexec/glusterfs/glfsheal cluster_data> > info-summary --xml> > > > root 3269492 1.6 0.0 600292 91248 ? Sl> > 07:10 0:07> > > > /usr/libexec/glusterfs/glfsheal cluster_data> > info-summary --xml> > > > root 3270354 4.4 0.0 600292 93260 ? Sl> > 07:15 0:07> > > > /usr/libexec/glusterfs/glfsheal cluster_data> > info-summary --xml> > > >> > > > -8<--> > > > root@str957-clustor00:~# ps -o ppid= 3266352> > > > 3266345> > > > root@str957-clustor00:~# ps -o ppid= 3267220> > > > 3267213> > > > root@str957-clustor00:~# ps -o ppid= 3268076> > > > 3268069> > > > root@str957-clustor00:~# ps -o ppid= 3269492> > > > 3269485> > > > root@str957-clustor00:~# ps -o ppid= 3270354> > > > 3270347> > > > root@str957-clustor00:~# ps aux|grep 3266345> > > > root 3266345 0.0 0.0 430536 10764 ? Sl> > 06:55 0:00> > > > gluster volume heal cluster_data info summary --xml> > > > root 3271532 0.0 0.0 6260 2500 pts/1 S+> > 07:21 0:00> > > grep> > > > 3266345> > > > root@str957-clustor00:~# ps aux|grep 3267213> > > > root 3267213 0.0 0.0 430536 10644 ? Sl> > 07:00 0:00> > > > gluster volume heal cluster_data info summary --xml> > > > root 3271599 0.0 0.0 6260 2480 pts/1 S+> > 07:22 0:00> > > grep> > > > 3267213> > > > root@str957-clustor00:~# ps aux|grep 3268069> > > > root 3268069 0.0 0.0 430536 10704 ? Sl> > 07:05 0:00> > > > gluster volume heal cluster_data info summary --xml> > > > root 3271626 0.0 0.0 6260 2516 pts/1 S+> > 07:22 0:00> > > grep> > > > 3268069> > > > root@str957-clustor00:~# ps aux|grep 3269485> > > > root 3269485 0.0 0.0 430536 10756 ? Sl> > 07:10 0:00> > > > gluster volume heal cluster_data info summary --xml> > > > root 3271647 0.0 0.0 6260 2480 pts/1 S+> > 07:22 0:00> > > grep> > > > 3269485> > > > root@str957-clustor00:~# ps aux|grep 3270347> > > > root 3270347 0.0 0.0 430536 10672 ? Sl> > 07:15 0:00> > > > gluster volume heal cluster_data info summary --xml> > > > root 3271666 0.0 0.0 6260 2568 pts/1 S+> > 07:22 0:00> > > grep> > > > 3270347> > > > -8<--> > > >> > > > Seems glfsheal is spawning more processes.> > > > I can't rule out a metadata corruption (or at> least a> > desync),> > > but it> > > > shouldn't happen...> > > >> > > > Diego> > > >> > > > Il 15/03/2023 20:11, Strahil Nikolov ha scritto:> > > > > If you don't experience any OOM , you can> focus on> > the heals.> > > > >> > > > > 284 processes of glfsheal seems odd.> > > > >> > > > > Can you check the ppid for 2-3 randomly picked ?> > > > > ps -o ppid= <pid>> > > > >> > > > > Best Regards,> > > > > Strahil Nikolov> > > > >> > > > > On Wed, Mar 15, 2023 at 9:54, Diego Zuccato> > > > > <diego.zuccato@xxxxxxxx> <mailto:diego.zuccato@xxxxxxxx>> > <mailto:diego.zuccato@xxxxxxxx> <mailto:diego.zuccato@xxxxxxxx>> > > <mailto:diego.zuccato@xxxxxxxx>> wrote:> > > > > I enabled it yesterday and that greatly> reduced> > memory> > > pressure.> > > > > Current volume info:> > > > > -8<--> > > > > Volume Name: cluster_data> > > > > Type: Distributed-Replicate> > > > > Volume ID:> a8caaa90-d161-45bb-a68c-278263a8531a> > > > > Status: Started> > > > > Snapshot Count: 0> > > > > Number of Bricks: 45 x (2 + 1) = 135> > > > > Transport-type: tcp> > > > > Bricks:> > > > > Brick1: clustor00:/srv/bricks/00/d> > > > > Brick2: clustor01:/srv/bricks/00/d> > > > > Brick3: clustor02:/srv/bricks/00/q (arbiter)> > > > > [...]> > > > > Brick133: clustor01:/srv/bricks/29/d> > > > > Brick134: clustor02:/srv/bricks/29/d> > > > > Brick135: clustor00:/srv/bricks/14/q> (arbiter)> > > > > Options Reconfigured:> > > > > performance.quick-read: off> > > > > cluster.entry-self-heal: on> > > > > cluster.data-self-heal-algorithm: full> > > > > cluster.metadata-self-heal: on> > > > > cluster.shd-max-threads: 2> > > > > network.inode-lru-limit: 500000> > > > > performance.md-cache-timeout: 600> > > > > performance.cache-invalidation: on> > > > > features.cache-invalidation-timeout: 600> > > > > features.cache-invalidation: on> > > > > features.quota-deem-statfs: on> > > > > performance.readdir-ahead: on> > > > > cluster.granular-entry-heal: enable> > > > > features.scrub: Active> > > > > features.bitrot: on> > > > > cluster.lookup-optimize: on> > > > > performance.stat-prefetch: on> > > > > performance.cache-refresh-timeout: 60> > > > > performance.parallel-readdir: on> > > > > performance.write-behind-window-size: 128MB> > > > > cluster.self-heal-daemon: enable> > > > > features.inode-quota: on> > > > > features.quota: on> > > > > transport.address-family: inet> > > > > nfs.disable: on> > > > > performance.client-io-threads: off> > > > > client.event-threads: 1> > > > > features.scrub-throttle: normal> > > > > diagnostics.brick-log-level: ERROR> > > > > diagnostics.client-log-level: ERROR> > > > > config.brick-threads: 0> > > > > cluster.lookup-unhashed: on> > > > > config.client-threads: 1> > > > > cluster.use-anonymous-inode: off> > > > > diagnostics.brick-sys-log-level: CRITICAL> > > > > features.scrub-freq: monthly> > > > > cluster.data-self-heal: on> > > > > cluster.brick-multiplex: on> > > > > cluster.daemon-log-level: ERROR> > > > > -8<--> > > > >> > > > > htop reports that memory usage is up to 143G,> > there are 602> > > > tasks and> > > > > 5232 threads (~20 running) on clustor00,> 117G/49> > tasks/1565> > > > threads on> > > > > clustor01 and 126G/45 tasks/1574 threads on> > clustor02.> > > > > I see quite a lot (284!) of glfsheal> processes> > running on> > > > clustor00 (a> > > > > "gluster v heal cluster_data info summary" is> > running> > > on clustor02> > > > > since> > > > > yesterday, still no output). Shouldn't be> just> > one per> > > brick?> > > > >> > > > > Diego> > > > >> > > > > Il 15/03/2023 08:30, Strahil Nikolov ha> scritto:> > > > > > Do you use brick multiplexing ?> > > > > >> > > > > > Best Regards,> > > > > > Strahil Nikolov> > > > > >> > > > > > On Tue, Mar 14, 2023 at 16:44,> Diego Zuccato> > > > > > <diego.zuccato@xxxxxxxx> <mailto:diego.zuccato@xxxxxxxx>> > <mailto:diego.zuccato@xxxxxxxx>> > > <mailto:diego.zuccato@xxxxxxxx>> <mailto:diego.zuccato@xxxxxxxx>> > > > <mailto:diego.zuccato@xxxxxxxx>> wrote:> > > > > > Hello all.> > > > > >> > > > > > Our Gluster 9.6 cluster is showing> increasing> > > problems.> > > > > > Currently it's composed of 3> servers (2x> > Intel Xeon> > > > 4210 [20> > > > > cores dual> > > > > > thread, total 40 threads], 192GB> RAM, 30x> > HGST> > > > HUH721212AL5200> > > > > [12TB]),> > > > > > configured in replica 3 arbiter 1.> Using> > Debian> > > > packages from> > > > > Gluster> > > > > > 9.x latest repository.> > > > > >> > > > > > Seems 192G RAM are not enough to> handle> > 30 data> > > bricks + 15> > > > > arbiters> > > > > > and> > > > > > I often had to reload glusterfsd> because> > glusterfs> > > > processed> > > > > got killed> > > > > > for OOM.> > > > > > On top of that, performance have been> > quite bad,> > > especially> > > > > when we> > > > > > reached about 20M files. On top of> that,> > one of> > > the servers> > > > > have had> > > > > > mobo issues that resulted in memory> > errors that> > > > corrupted some> > > > > > bricks fs> > > > > > (XFS, it required "xfs_reparir -L"> to fix).> > > > > > Now I'm getting lots of "stale> file handle"> > > errors and> > > > other> > > > > errors> > > > > > (like directories that seem empty> from the> > > client but still> > > > > containing> > > > > > files in some bricks) and auto> healing seems> > > unable to> > > > complete.> > > > > >> > > > > > Since I can't keep up continuing to> > manually fix> > > all the> > > > > issues, I'm> > > > > > thinking about backup+destroy+recreate> > strategy.> > > > > >> > > > > > I think that if I reduce the number of> > bricks per> > > > server to just 5> > > > > > (RAID1 of 6x12TB disks) I might> resolve RAM> > > issues - at the> > > > > cost of> > > > > > longer heal times in case a disk> fails. Am I> > > right or it's> > > > > useless?> > > > > > Other recommendations?> > > > > > Servers have space for another 6> disks.> > Maybe those> > > > could be> > > > > used for> > > > > > some SSDs to speed up access?> > > > > >> > > > > > TIA.> > > > > >> > > > > > --> > > > > > Diego Zuccato> > > > > > DIFA - Dip. di Fisica e Astronomia> > > > > > Servizi Informatici> > > > > > Alma Mater Studiorum - Università> di Bologna> > > > > > V.le Berti-Pichat 6/2 - 40127> Bologna - Italy> > > > > > tel.: +39 051 20 95786> > > > > > ________> > > > > >> > > > > >> > > > > >> > > > > > Community Meeting Calendar:> > > > > >> > > > > > Schedule -> > > > > > Every 2nd and 4th Tuesday at 14:30> IST /> > 09:00 UTC> > > > > > Bridge:> > > > <https://meet.google.com/cpu-eiue-hvk> > > > > <https://meet.google.com/cpu-eiue-hvk> > > > <https://meet.google.com/cpu-eiue-hvk> > > > > >> > > > <https://meet.google.com/cpu-eiue-hvk> > > > > <https://meet.google.com/cpu-eiue-hvk> > > > <https://meet.google.com/cpu-eiue-hvk> <https://meet.google.com/cpu-eiue-hvk>>>>>>> > > > > > Gluster-users mailing list> > > > > > Gluster-users@xxxxxxxxxxx> <mailto:Gluster-users@xxxxxxxxxxx>> > <mailto:Gluster-users@xxxxxxxxxxx>> > > <mailto:Gluster-users@xxxxxxxxxxx>> > > > <mailto:Gluster-users@xxxxxxxxxxx>> > > <mailto:Gluster-users@xxxxxxxxxxx>> > > > > <mailto:Gluster-users@xxxxxxxxxxx>> > > > > >> > >> > > >> > >> > > > >> > >> > >> > > >> > >> > > > > >> > > >> > >> > > >> > >> > > > >> > >> > >> > > >> > >> >> > >> > > >> > > > >> > > > > >> > > > >> > > > > --> > > > > Diego Zuccato> > > > > DIFA - Dip. di Fisica e Astronomia> > > > > Servizi Informatici> > > > > Alma Mater Studiorum - Università di Bologna> > > > > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy> > > > > tel.: +39 051 20 95786> > > > > ________> > > > >> > > > >> > > > >> > > > > Community Meeting Calendar:> > > > >> > > > > Schedule -> > > > > Every 2nd and 4th Tuesday at 14:30 IST /> 09:00 UTC> > > > > Bridge:> > > > <https://meet.google.com/cpu-eiue-hvk> > > > > <https://meet.google.com/cpu-eiue-hvk> > > > <https://meet.google.com/cpu-eiue-hvk> > > > > Gluster-users mailing list> > > > > Gluster-users@xxxxxxxxxxx> <mailto:Gluster-users@xxxxxxxxxxx>> > <mailto:Gluster-users@xxxxxxxxxxx>> > > <mailto:Gluster-users@xxxxxxxxxxx>> > <mailto:Gluster-users@xxxxxxxxxxx>> > > > <mailto:Gluster-users@xxxxxxxxxxx>> > > > >> > >> > > >> > >> > > > >> > >> > >> > > >> > >> > > > >> > > >> > > > --> > > > Diego Zuccato> > > > DIFA - Dip. di Fisica e Astronomia> > > > Servizi Informatici> > > > Alma Mater Studiorum - Università di Bologna> > > > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy> > > > tel.: +39 051 20 95786> > > >> > >> > > --> > > Diego Zuccato> > > DIFA - Dip. di Fisica e Astronomia> > > Servizi Informatici> > > Alma Mater Studiorum - Università di Bologna> > > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy> > > tel.: +39 051 20 95786> > >> >> > --> > Diego Zuccato> > DIFA - Dip. di Fisica e Astronomia> > Servizi Informatici> > Alma Mater Studiorum - Università di Bologna> > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy> > tel.: +39 051 20 95786> >>> --> Diego Zuccato> DIFA - Dip. di Fisica e Astronomia> Servizi Informatici> Alma Mater Studiorum - Università di Bologna> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy> tel.: +39 051 20 95786>--Diego ZuccatoDIFA - Dip. di Fisica e AstronomiaServizi InformaticiAlma Mater Studiorum - Università di BolognaV.le Berti-Pichat 6/2 - 40127 Bologna - Italytel.: +39 051 20 95786
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users