Re: Upgrade 10.4 -> 11.1 making problems

Gilberto Ferreira <gilberto.nunes32@xxxxxxxxx> · Fri, 19 Jan 2024 12:23:35 -0300

gluster volume set testvol diagnostics.brick-log-level WARNING
gluster volume set testvol diagnostics.brick-sys-log-level WARNING
gluster volume set testvol diagnostics.client-log-level ERROR
gluster --log-level=ERROR volume status
---
Gilberto Nunes Ferreira

Em sex., 19 de jan. de 2024 às 05:49, Hu Bert <revirii@xxxxxxxxxxxxxx> escreveu:
Hi Strahil,

hm, don't get me wrong, it may sound a bit stupid, but... where do i

set the log level? Using debian...

https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level

ls /etc/glusterfs/

eventsconfig.json  glusterfs-georep-logrotate

gluster-rsyslog-5.8.conf  group-db-workload       group-gluster-block

 group-nl-cache  group-virt.example  logger.conf.example

glusterd.vol       glusterfs-logrotate

gluster-rsyslog-7.2.conf  group-distributed-virt  group-metadata-cache

 group-samba     gsyncd.conf         thin-arbiter.vol

checked: /etc/glusterfs/logger.conf.example

# To enable enhanced logging capabilities,

#

# 1. rename this file to /etc/glusterfs/logger.conf

#

# 2. rename /etc/rsyslog.d/gluster.conf.example to

#    /etc/rsyslog.d/gluster.conf

#

# This change requires restart of all gluster services/volumes and

# rsyslog.

tried (to test): /etc/glusterfs/logger.conf with " LOG_LEVEL='WARNING' "

restart glusterd on that node, but this doesn't work, log-level stays

on INFO. /etc/rsyslog.d/gluster.conf.example does not exist. Probably

/etc/rsyslog.conf on debian. But first it would be better to know

where to set the log-level for glusterd.

Depending on how much the DEBUG log-level talks ;-) i could assign up

to 100G to /var

Thx & best regards,

Hubert

Am Do., 18. Jan. 2024 um 22:58 Uhr schrieb Strahil Nikolov

<hunter86_bg@xxxxxxxxx>:

>

> Are you able to set the logs to debug level ?

> It might provide a clue what it is going on.

>

> Best Regards,

> Strahil Nikolov

>

> On Thu, Jan 18, 2024 at 13:08, Diego Zuccato

> <diego.zuccato@xxxxxxxx> wrote:

> That's the same kind of errors I keep seeing on my 2 clusters,

> regenerated some months ago. Seems a pseudo-split-brain that should be

> impossible on a replica 3 cluster but keeps happening.

> Sadly going to ditch Gluster ASAP.

>

> Diego

>

> Il 18/01/2024 07:11, Hu Bert ha scritto:

> > Good morning,

> > heal still not running. Pending heals now sum up to 60K per brick.

> > Heal was starting instantly e.g. after server reboot with version

> > 10.4, but doesn't with version 11. What could be wrong?

> >

> > I only see these errors on one of the "good" servers in glustershd.log:

> >

> > [2024-01-18 06:08:57.328480 +0000] W [MSGID: 114031]

> > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0:

> > remote operation failed.

> > [{path=<gfid:cb39a1e4-2a4c-4727-861d-3ed9ef00681b>},

> > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e

> > f00681b}, {errno=2}, {error=No such file or directory}]

> > [2024-01-18 06:08:57.594051 +0000] W [MSGID: 114031]

> > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1:

> > remote operation failed.

> > [{path=<gfid:3e9b178c-ae1f-4d85-ae47-fc539d94dd11>},

> > {gfid=3e9b178c-ae1f-4d85-ae47-fc539

> > d94dd11}, {errno=2}, {error=No such file or directory}]

> >

> > About 7K today. Any ideas? Someone?

> >

> >

> > Best regards,

> > Hubert

> >

> > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert <revirii@xxxxxxxxxxxxxx>:

> >>

> >> ok, finally managed to get all servers, volumes etc runnung, but took

> >> a couple of restarts, cksum checks etc.

> >>

> >> One problem: a volume doesn't heal automatically or doesn't heal at all.

> >>

> >> gluster volume status

> >> Status of volume: workdata

> >> Gluster process                            TCP Port  RDMA Port  Online  Pid

> >> ------------------------------------------------------------------------------

> >> Brick glusterpub1:/gluster/md3/workdata    58832    0          Y      3436

> >> Brick glusterpub2:/gluster/md3/workdata    59315    0          Y      1526

> >> Brick glusterpub3:/gluster/md3/workdata    56917    0          Y      1952

> >> Brick glusterpub1:/gluster/md4/workdata    59688    0          Y      3755

> >> Brick glusterpub2:/gluster/md4/workdata    60271    0          Y      2271

> >> Brick glusterpub3:/gluster/md4/workdata    49461    0          Y      2399

> >> Brick glusterpub1:/gluster/md5/workdata    54651    0          Y      4208

> >> Brick glusterpub2:/gluster/md5/workdata    49685    0          Y      2751

> >> Brick glusterpub3:/gluster/md5/workdata    59202    0          Y      2803

> >> Brick glusterpub1:/gluster/md6/workdata    55829    0          Y      4583

> >> Brick glusterpub2:/gluster/md6/workdata    50455    0          Y      3296

> >> Brick glusterpub3:/gluster/md6/workdata    50262    0          Y      3237

> >> Brick glusterpub1:/gluster/md7/workdata    52238    0          Y      5014

> >> Brick glusterpub2:/gluster/md7/workdata    52474    0          Y      3673

> >> Brick glusterpub3:/gluster/md7/workdata    57966    0          Y      3653

> >> Self-heal Daemon on localhost              N/A      N/A        Y      4141

> >> Self-heal Daemon on glusterpub1            N/A      N/A        Y      5570

> >> Self-heal Daemon on glusterpub2            N/A      N/A        Y      4139

> >>

> >> "gluster volume heal workdata info" lists a lot of files per brick.

> >> "gluster volume heal workdata statistics heal-count" shows thousands

> >> of files per brick.

> >> "gluster volume heal workdata enable" has no effect.

> >>

> >> gluster volume heal workdata full

> >> Launching heal operation to perform full self heal on volume workdata

> >> has been successful

> >> Use heal info commands to check status.

> >>

> >> -> not doing anything at all. And nothing happening on the 2 "good"

> >> servers in e.g. glustershd.log. Heal was working as expected on

> >> version 10.4, but here... silence. Someone has an idea?

> >>

> >>

> >> Best regards,

> >> Hubert

> >>

> >> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira

> >> <gilberto.nunes32@xxxxxxxxx>:

> >>>

> >>> Ah! Indeed! You need to perform an upgrade in the clients as well.

> >>>

> >>>

> >>>

> >>>

> >>>

> >>>

> >>>

> >>>

> >>> Em ter., 16 de jan. de 2024 às 03:12, Hu Bert <revirii@xxxxxxxxxxxxxx> escreveu:

> >>>>

> >>>> morning to those still reading :-)

> >>>>

> >>>> i found this: https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them

> >>>>

> >>>> there's a paragraph about "peer rejected" with the same error message,

> >>>> telling me: "Update the cluster.op-version" - i had only updated the

> >>>> server nodes, but not the clients. So upgrading the cluster.op-version

> >>>> wasn't possible at this time. So... upgrading the clients to version

> >>>> 11.1 and then the op-version should solve the problem?

> >>>>

> >>>>

> >>>> Thx,

> >>>> Hubert

> >>>>

> >>>> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii@xxxxxxxxxxxxxx>:

> >>>>>

> >>>>> Hi,

> >>>>> just upgraded some gluster servers from version 10.4 to version 11.1.

> >>>>> Debian bullseye & bookworm. When only installing the packages: good,

> >>>>> servers, volumes etc. work as expected.

> >>>>>

> >>>>> But one needs to test if the systems work after a daemon and/or server

> >>>>> restart. Well, did a reboot, and after that the rebooted/restarted

> >>>>> system is "out". Log message from working node:

> >>>>>

> >>>>> [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163]

> >>>>> [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]

> >>>>> 0-management: using the op-version 100000

> >>>>> [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490]

> >>>>> [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]

> >>>>> 0-glusterd: Received probe from uuid:

> >>>>> b71401c3-512a-47cb-ac18-473c4ba7776e

> >>>>> [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010]

> >>>>> [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:

> >>>>> Version of Cksums sourceimages differ. local cksum = 2204642525,

> >>>>> remote cksum = 1931483801 on peer gluster190

> >>>>> [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493]

> >>>>> [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:

> >>>>> Responded to gluster190 (0), ret: 0, op_ret: -1

> >>>>> [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493]

> >>>>> [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:

> >>>>> Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host:

> >>>>> gluster190, port: 0

> >>>>>

> >>>>> peer status from rebooted node:

> >>>>>

> >>>>> root@gluster190 ~ # gluster peer status

> >>>>> Number of Peers: 2

> >>>>>

> >>>>> Hostname: gluster189

> >>>>> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7

> >>>>> State: Peer Rejected (Connected)

> >>>>>

> >>>>> Hostname: gluster188

> >>>>> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d

> >>>>> State: Peer Rejected (Connected)

> >>>>>

> >>>>> So the rebooted gluster190 is not accepted anymore. And thus does not

> >>>>> appear in "gluster volume status". I then followed this guide:

> >>>>>

> >>>>> https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/

> >>>>>

> >>>>> Remove everything under /var/lib/glusterd/ (except glusterd.info) and

> >>>>> restart glusterd service etc. Data get copied from other nodes,

> >>>>> 'gluster peer status' is ok again - but the volume info is missing,

> >>>>> /var/lib/glusterd/vols is empty. When syncing this dir from another

> >>>>> node, the volume then is available again, heals start etc.

> >>>>>

> >>>>> Well, and just to be sure that everything's working as it should,

> >>>>> rebooted that node again - the rebooted node is kicked out again, and

> >>>>> you have to restart bringing it back again.

> >>>>>

> >>>>> Sry, but did i miss anything? Has someone experienced similar

> >>>>> problems? I'll probably downgrade to 10.4 again, that version was

> >>>>> working...

> >>>>>

> >>>>>

> >>>>> Thx,

> >>>>> Hubert

> >>>> ________

> >>>>

> >>>>

> >>>>

> >>>> Community Meeting Calendar:

> >>>>

> >>>> Schedule -

> >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC

> >>>> Bridge: https://meet.google.com/cpu-eiue-hvk

> >>>> Gluster-users mailing list

> >>>> Gluster-users@xxxxxxxxxxx

> >>>> https://lists.gluster.org/mailman/listinfo/gluster-users

> > ________

> >

> >

> >

> > Community Meeting Calendar:

> >

> > Schedule -

> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC

> > Bridge: https://meet.google.com/cpu-eiue-hvk

> > Gluster-users mailing list

> > Gluster-users@xxxxxxxxxxx

> > https://lists.gluster.org/mailman/listinfo/gluster-users

>

> --

> Diego Zuccato

> DIFA - Dip. di Fisica e Astronomia

> Servizi Informatici

> Alma Mater Studiorum - Università di Bologna

> V.le Berti-Pichat 6/2 - 40127 Bologna - Italy

> tel.: +39 051 20 95786

>

> ________

>

>

>

> Community Meeting Calendar:

>

> Schedule -

> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC

> Bridge: https://meet.google.com/cpu-eiue-hvk

> Gluster-users mailing list

> Gluster-users@xxxxxxxxxxx

> https://lists.gluster.org/mailman/listinfo/gluster-users

>

> ________

>

>

>

> Community Meeting Calendar:

>

> Schedule -

> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC

> Bridge: https://meet.google.com/cpu-eiue-hvk

> Gluster-users mailing list

> Gluster-users@xxxxxxxxxxx

> https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

Schedule -

Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC

Bridge: https://meet.google.com/cpu-eiue-hvk

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

https://lists.gluster.org/mailman/listinfo/gluster-users

________

Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users