Good morning, thx Gilberto, did the first three (set to WARNING), but the last one doesn't work. Anyway, with setting these three some new messages appear: [2024-01-20 07:23:58.561106 +0000] W [MSGID: 114061] [client-common.c:796:client_pre_lk_v2] 0-workdata-client-11: remote_fd is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb}, {errno=77}, {error=File descriptor in bad state}] [2024-01-20 07:23:58.561177 +0000] E [MSGID: 108028] [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-3: Failed getlk for faf59566-10f5-4ddd-8b0c-a87bc6a334fb [File descriptor in bad state] [2024-01-20 07:23:58.562151 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-11: remote operation failed. [{path=<gfid:faf59566-10f5-4ddd-8b0c-a87bc6a334fb>}, {gfid=faf59566-10f5-4ddd-8b0c-a87b c6a334fb}, {errno=2}, {error=No such file or directory}] [2024-01-20 07:23:58.562296 +0000] W [MSGID: 114061] [client-common.c:530:client_pre_flush_v2] 0-workdata-client-11: remote_fd is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb}, {errno=77}, {error=File descriptor in bad state}] [2024-01-20 07:23:58.860552 +0000] W [MSGID: 114061] [client-common.c:796:client_pre_lk_v2] 0-workdata-client-8: remote_fd is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642}, {errno=77}, {error=File descriptor in bad state}] [2024-01-20 07:23:58.860608 +0000] E [MSGID: 108028] [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-2: Failed getlk for 60465723-5dc0-4ebe-aced-9f2c12e52642 [File descriptor in bad state] [2024-01-20 07:23:58.861520 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-8: remote operation failed. [{path=<gfid:60465723-5dc0-4ebe-aced-9f2c12e52642>}, {gfid=60465723-5dc0-4ebe-aced-9f2c1 2e52642}, {errno=2}, {error=No such file or directory}] [2024-01-20 07:23:58.861640 +0000] W [MSGID: 114061] [client-common.c:530:client_pre_flush_v2] 0-workdata-client-8: remote_fd is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642}, {errno=77}, {error=File descriptor in bad state}] Not many log entries appear, only a few. Has someone seen error messages like these? Setting diagnostics.brick-sys-log-level to DEBUG shows way more log entries, uploaded it to: https://file.io/spLhlcbMCzr8 - not sure if that helps. Thx, Hubert Am Fr., 19. Jan. 2024 um 16:24 Uhr schrieb Gilberto Ferreira <gilberto.nunes32@xxxxxxxxx>: > > gluster volume set testvol diagnostics.brick-log-level WARNING > gluster volume set testvol diagnostics.brick-sys-log-level WARNING > gluster volume set testvol diagnostics.client-log-level ERROR > gluster --log-level=ERROR volume status > > --- > Gilberto Nunes Ferreira > > > > > > > Em sex., 19 de jan. de 2024 às 05:49, Hu Bert <revirii@xxxxxxxxxxxxxx> escreveu: >> >> Hi Strahil, >> hm, don't get me wrong, it may sound a bit stupid, but... where do i >> set the log level? Using debian... >> >> https://access.redhat.com/documentation/de-de/red_hat_gluster_storage/3/html/administration_guide/configuring_the_log_level >> >> ls /etc/glusterfs/ >> eventsconfig.json glusterfs-georep-logrotate >> gluster-rsyslog-5.8.conf group-db-workload group-gluster-block >> group-nl-cache group-virt.example logger.conf.example >> glusterd.vol glusterfs-logrotate >> gluster-rsyslog-7.2.conf group-distributed-virt group-metadata-cache >> group-samba gsyncd.conf thin-arbiter.vol >> >> checked: /etc/glusterfs/logger.conf.example >> >> # To enable enhanced logging capabilities, >> # >> # 1. rename this file to /etc/glusterfs/logger.conf >> # >> # 2. rename /etc/rsyslog.d/gluster.conf.example to >> # /etc/rsyslog.d/gluster.conf >> # >> # This change requires restart of all gluster services/volumes and >> # rsyslog. >> >> tried (to test): /etc/glusterfs/logger.conf with " LOG_LEVEL='WARNING' " >> >> restart glusterd on that node, but this doesn't work, log-level stays >> on INFO. /etc/rsyslog.d/gluster.conf.example does not exist. Probably >> /etc/rsyslog.conf on debian. But first it would be better to know >> where to set the log-level for glusterd. >> >> Depending on how much the DEBUG log-level talks ;-) i could assign up >> to 100G to /var >> >> >> Thx & best regards, >> Hubert >> >> >> Am Do., 18. Jan. 2024 um 22:58 Uhr schrieb Strahil Nikolov >> <hunter86_bg@xxxxxxxxx>: >> > >> > Are you able to set the logs to debug level ? >> > It might provide a clue what it is going on. >> > >> > Best Regards, >> > Strahil Nikolov >> > >> > On Thu, Jan 18, 2024 at 13:08, Diego Zuccato >> > <diego.zuccato@xxxxxxxx> wrote: >> > That's the same kind of errors I keep seeing on my 2 clusters, >> > regenerated some months ago. Seems a pseudo-split-brain that should be >> > impossible on a replica 3 cluster but keeps happening. >> > Sadly going to ditch Gluster ASAP. >> > >> > Diego >> > >> > Il 18/01/2024 07:11, Hu Bert ha scritto: >> > > Good morning, >> > > heal still not running. Pending heals now sum up to 60K per brick. >> > > Heal was starting instantly e.g. after server reboot with version >> > > 10.4, but doesn't with version 11. What could be wrong? >> > > >> > > I only see these errors on one of the "good" servers in glustershd.log: >> > > >> > > [2024-01-18 06:08:57.328480 +0000] W [MSGID: 114031] >> > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0: >> > > remote operation failed. >> > > [{path=<gfid:cb39a1e4-2a4c-4727-861d-3ed9ef00681b>}, >> > > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e >> > > f00681b}, {errno=2}, {error=No such file or directory}] >> > > [2024-01-18 06:08:57.594051 +0000] W [MSGID: 114031] >> > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1: >> > > remote operation failed. >> > > [{path=<gfid:3e9b178c-ae1f-4d85-ae47-fc539d94dd11>}, >> > > {gfid=3e9b178c-ae1f-4d85-ae47-fc539 >> > > d94dd11}, {errno=2}, {error=No such file or directory}] >> > > >> > > About 7K today. Any ideas? Someone? >> > > >> > > >> > > Best regards, >> > > Hubert >> > > >> > > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert <revirii@xxxxxxxxxxxxxx>: >> > >> >> > >> ok, finally managed to get all servers, volumes etc runnung, but took >> > >> a couple of restarts, cksum checks etc. >> > >> >> > >> One problem: a volume doesn't heal automatically or doesn't heal at all. >> > >> >> > >> gluster volume status >> > >> Status of volume: workdata >> > >> Gluster process TCP Port RDMA Port Online Pid >> > >> ------------------------------------------------------------------------------ >> > >> Brick glusterpub1:/gluster/md3/workdata 58832 0 Y 3436 >> > >> Brick glusterpub2:/gluster/md3/workdata 59315 0 Y 1526 >> > >> Brick glusterpub3:/gluster/md3/workdata 56917 0 Y 1952 >> > >> Brick glusterpub1:/gluster/md4/workdata 59688 0 Y 3755 >> > >> Brick glusterpub2:/gluster/md4/workdata 60271 0 Y 2271 >> > >> Brick glusterpub3:/gluster/md4/workdata 49461 0 Y 2399 >> > >> Brick glusterpub1:/gluster/md5/workdata 54651 0 Y 4208 >> > >> Brick glusterpub2:/gluster/md5/workdata 49685 0 Y 2751 >> > >> Brick glusterpub3:/gluster/md5/workdata 59202 0 Y 2803 >> > >> Brick glusterpub1:/gluster/md6/workdata 55829 0 Y 4583 >> > >> Brick glusterpub2:/gluster/md6/workdata 50455 0 Y 3296 >> > >> Brick glusterpub3:/gluster/md6/workdata 50262 0 Y 3237 >> > >> Brick glusterpub1:/gluster/md7/workdata 52238 0 Y 5014 >> > >> Brick glusterpub2:/gluster/md7/workdata 52474 0 Y 3673 >> > >> Brick glusterpub3:/gluster/md7/workdata 57966 0 Y 3653 >> > >> Self-heal Daemon on localhost N/A N/A Y 4141 >> > >> Self-heal Daemon on glusterpub1 N/A N/A Y 5570 >> > >> Self-heal Daemon on glusterpub2 N/A N/A Y 4139 >> > >> >> > >> "gluster volume heal workdata info" lists a lot of files per brick. >> > >> "gluster volume heal workdata statistics heal-count" shows thousands >> > >> of files per brick. >> > >> "gluster volume heal workdata enable" has no effect. >> > >> >> > >> gluster volume heal workdata full >> > >> Launching heal operation to perform full self heal on volume workdata >> > >> has been successful >> > >> Use heal info commands to check status. >> > >> >> > >> -> not doing anything at all. And nothing happening on the 2 "good" >> > >> servers in e.g. glustershd.log. Heal was working as expected on >> > >> version 10.4, but here... silence. Someone has an idea? >> > >> >> > >> >> > >> Best regards, >> > >> Hubert >> > >> >> > >> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira >> > >> <gilberto.nunes32@xxxxxxxxx>: >> > >>> >> > >>> Ah! Indeed! You need to perform an upgrade in the clients as well. >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> Em ter., 16 de jan. de 2024 às 03:12, Hu Bert <revirii@xxxxxxxxxxxxxx> escreveu: >> > >>>> >> > >>>> morning to those still reading :-) >> > >>>> >> > >>>> i found this: https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them >> > >>>> >> > >>>> there's a paragraph about "peer rejected" with the same error message, >> > >>>> telling me: "Update the cluster.op-version" - i had only updated the >> > >>>> server nodes, but not the clients. So upgrading the cluster.op-version >> > >>>> wasn't possible at this time. So... upgrading the clients to version >> > >>>> 11.1 and then the op-version should solve the problem? >> > >>>> >> > >>>> >> > >>>> Thx, >> > >>>> Hubert >> > >>>> >> > >>>> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii@xxxxxxxxxxxxxx>: >> > >>>>> >> > >>>>> Hi, >> > >>>>> just upgraded some gluster servers from version 10.4 to version 11.1. >> > >>>>> Debian bullseye & bookworm. When only installing the packages: good, >> > >>>>> servers, volumes etc. work as expected. >> > >>>>> >> > >>>>> But one needs to test if the systems work after a daemon and/or server >> > >>>>> restart. Well, did a reboot, and after that the rebooted/restarted >> > >>>>> system is "out". Log message from working node: >> > >>>>> >> > >>>>> [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163] >> > >>>>> [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack] >> > >>>>> 0-management: using the op-version 100000 >> > >>>>> [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490] >> > >>>>> [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req] >> > >>>>> 0-glusterd: Received probe from uuid: >> > >>>>> b71401c3-512a-47cb-ac18-473c4ba7776e >> > >>>>> [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010] >> > >>>>> [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management: >> > >>>>> Version of Cksums sourceimages differ. local cksum = 2204642525, >> > >>>>> remote cksum = 1931483801 on peer gluster190 >> > >>>>> [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493] >> > >>>>> [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd: >> > >>>>> Responded to gluster190 (0), ret: 0, op_ret: -1 >> > >>>>> [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493] >> > >>>>> [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd: >> > >>>>> Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host: >> > >>>>> gluster190, port: 0 >> > >>>>> >> > >>>>> peer status from rebooted node: >> > >>>>> >> > >>>>> root@gluster190 ~ # gluster peer status >> > >>>>> Number of Peers: 2 >> > >>>>> >> > >>>>> Hostname: gluster189 >> > >>>>> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7 >> > >>>>> State: Peer Rejected (Connected) >> > >>>>> >> > >>>>> Hostname: gluster188 >> > >>>>> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d >> > >>>>> State: Peer Rejected (Connected) >> > >>>>> >> > >>>>> So the rebooted gluster190 is not accepted anymore. And thus does not >> > >>>>> appear in "gluster volume status". I then followed this guide: >> > >>>>> >> > >>>>> https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/ >> > >>>>> >> > >>>>> Remove everything under /var/lib/glusterd/ (except glusterd.info) and >> > >>>>> restart glusterd service etc. Data get copied from other nodes, >> > >>>>> 'gluster peer status' is ok again - but the volume info is missing, >> > >>>>> /var/lib/glusterd/vols is empty. When syncing this dir from another >> > >>>>> node, the volume then is available again, heals start etc. >> > >>>>> >> > >>>>> Well, and just to be sure that everything's working as it should, >> > >>>>> rebooted that node again - the rebooted node is kicked out again, and >> > >>>>> you have to restart bringing it back again. >> > >>>>> >> > >>>>> Sry, but did i miss anything? Has someone experienced similar >> > >>>>> problems? I'll probably downgrade to 10.4 again, that version was >> > >>>>> working... >> > >>>>> >> > >>>>> >> > >>>>> Thx, >> > >>>>> Hubert >> > >>>> ________ >> > >>>> >> > >>>> >> > >>>> >> > >>>> Community Meeting Calendar: >> > >>>> >> > >>>> Schedule - >> > >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> > >>>> Bridge: https://meet.google.com/cpu-eiue-hvk >> > >>>> Gluster-users mailing list >> > >>>> Gluster-users@xxxxxxxxxxx >> > >>>> https://lists.gluster.org/mailman/listinfo/gluster-users >> > > ________ >> > > >> > > >> > > >> > > Community Meeting Calendar: >> > > >> > > Schedule - >> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> > > Bridge: https://meet.google.com/cpu-eiue-hvk >> > > Gluster-users mailing list >> > > Gluster-users@xxxxxxxxxxx >> > > https://lists.gluster.org/mailman/listinfo/gluster-users >> > >> > -- >> > Diego Zuccato >> > DIFA - Dip. di Fisica e Astronomia >> > Servizi Informatici >> > Alma Mater Studiorum - Università di Bologna >> > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy >> > tel.: +39 051 20 95786 >> > >> > ________ >> > >> > >> > >> > Community Meeting Calendar: >> > >> > Schedule - >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> > Bridge: https://meet.google.com/cpu-eiue-hvk >> > Gluster-users mailing list >> > Gluster-users@xxxxxxxxxxx >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> > >> > ________ >> > >> > >> > >> > Community Meeting Calendar: >> > >> > Schedule - >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> > Bridge: https://meet.google.com/cpu-eiue-hvk >> > Gluster-users mailing list >> > Gluster-users@xxxxxxxxxxx >> > https://lists.gluster.org/mailman/listinfo/gluster-users >> ________ >> >> >> >> Community Meeting Calendar: >> >> Schedule - >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC >> Bridge: https://meet.google.com/cpu-eiue-hvk >> Gluster-users mailing list >> Gluster-users@xxxxxxxxxxx >> https://lists.gluster.org/mailman/listinfo/gluster-users ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users