Have you checked if a client is not allowed to update all 3 copies ?
If it's only 1 system, you can remove the brick, reinitialize it and then bring it back for a full sync.
Best Regards,
Strahil Nikolov
On Mon, Jan 29, 2024 at 8:44, Hu Bert<revirii@xxxxxxxxxxxxxx> wrote:Morning,a few bad apples - but which ones? Checked glustershd.log on the "bad"server and counted todays "gfid mismatch" entries (2800 in total):44 <gfid:faeea007-2f41-4a72-959f-e9e14e6a9ea4>/212>,44 <gfid:faeea007-2f41-4a72-959f-e9e14e6a9ea4>/174>,44 <gfid:d5c6d7b9-f217-4cc9-a664-448d034e74c2>/94037803>,44 <gfid:d263ecc2-9c21-455c-9ba9-5a999c03adce>/94066216>,44 <gfid:cbfd5d46-d580-4845-a544-e46fd82c1758>/249771609>,44 <gfid:aecf217a-0797-43d1-9481-422a8ac8a5d0>/64235523>,44 <gfid:a701d47b-b3fb-4e7e-bbfb-bc3e19632867>/185>,etc. But as i said, these are pretty new and didn't appear when thevolume/servers started missbehaving. Are there scripts/snippetsavailable how one could handle this?Healing would be very painful for the running system (still connected,but not very long anymore), as there surely are 4-5 million entries tobe healed. I can't do this now - maybe, when the replacement is inproductive state, one could give it a try.Thx,HubertAm So., 28. Jan. 2024 um 23:12 Uhr schrieb Strahil Nikolov>> From gfid mismatch a manual effort is needed but you can script it.> I think that a few bad "apples" can break the healing and if you fix them the healing might be recovered.>> Also, check why the client is not updating all copies. Most probably you have a client that is not able to connect to a brick.>> gluster volume status VOLUME_NAME clients>> Best Regards,> Strahil Nikolov>> On Sun, Jan 28, 2024 at 20:55, Hu Bert> <revirii@xxxxxxxxxxxxxx> wrote:> Hi Strahil,> there's no arbiter: 3 servers with 5 bricks each.>> Volume Name: workdata> Type: Distributed-Replicate> Volume ID: 7d1e23e5-0308-4443-a832-d36f85ff7959> Status: Started> Snapshot Count: 0> Number of Bricks: 5 x 3 = 15>> The "problem" is: the number of files/entries to-be-healed has> continuously grown since the beginning, and now we're talking about> way too many files to do this manually. Last time i checked: 700K per> brick, should be >900K at the moment. The command 'gluster volume heal> workdata statistics heal-count' is unable to finish. Doesn't look that> good :D>> Interesting, the glustershd.log on the "bad" server now shows errors like these:>> [2024-01-28 18:48:33.734053 +0000] E [MSGID: 108008]> [afr-self-heal-common.c:399:afr_gfid_split_brain_source]> 0-workdata-replicate-3: Gfid mismatch detected for> <gfid:70ab3d57-bd82-4932-86bf-d613db32c1ab>/803620716>,> 82d7939a-8919-40ea-> 9459-7b8af23d3b72 on workdata-client-11 and> bb9399a3-0a5c-4cd1-b2b1-3ee787ec835a on workdata-client-9>> Shouldn't the heals happen on the 2 "good" servers?>> Anyway... we're currently preparing a different solution for our data> and we'll throw away this gluster volume - no critical data will be> lost, as these are derived from source data (on a different volume on> different servers). Will be a hard time (calculating tons of data),> but the chosen solution should have a way better performance.>> Well... thx to all for your efforts, really appreciate that :-)>>> Hubert>> Am So., 28. Jan. 2024 um 08:35 Uhr schrieb Strahil Nikolov> >> > What about the arbiter node ?> > Actually, check on all nodes and script it - you might need it in the future.> >> > Simplest way to resolve is to make the file didappear (rename to something else and then rename it back). Another easy trick is to read thr whole file: dd if=file of=/dev/null status=progress> >> > Best Regards,> > Strahil Nikolov> >> > On Sat, Jan 27, 2024 at 8:24, Hu Bert> > <revirii@xxxxxxxxxxxxxx> wrote:> > Morning,> >> > gfid1:> > getfattr -d -e hex -m.> > /gluster/md{3,4,5,6,7}/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb> >> > glusterpub1 (good one):> > getfattr: Removing leading '/' from absolute path names> > # file: gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb> > trusted.afr.dirty=0x000000000000000000000000> > trusted.afr.workdata-client-11=0x000000020000000100000000> > trusted.gfid=0xfaf5956610f54ddd8b0ca87bc6a334fb> > trusted.gfid2path.c2845024cc9b402e=0x38633139626234612d396236382d343532652d623434652d3664616331666434616465652f31323878313238732e6a7067> > trusted.glusterfs.mdata=0x0100000000000000000000000065aaecff000000002695ebb70000000065aaecff000000002695ebb70000000065aaecff000000002533f110> >> > glusterpub3 (bad one):> > getfattr: /gluster/md6/workdata/.glusterfs/fa/f5/faf59566-10f5-4ddd-8b0c-a87bc6a334fb:> > No such file or directory> >> > gfid 2:> > getfattr -d -e hex -m.> > /gluster/md{3,4,5,6,7}/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642> >> > glusterpub1 (good one):> > getfattr: Removing leading '/' from absolute path names> > # file: gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642> > trusted.afr.dirty=0x000000000000000000000000> > trusted.afr.workdata-client-8=0x000000020000000100000000> > trusted.gfid=0x604657235dc04ebeaced9f2c12e52642> > trusted.gfid2path.ac4669e3c4faf926=0x33366463366137392d666135642d343238652d613738642d6234376230616662316562642f31323878313238732e6a7067> > trusted.glusterfs.mdata=0x0100000000000000000000000065aaecfe000000000c5403bd0000000065aaecfe000000000c5403bd0000000065aaecfe000000000ad61ee4> >> > glusterpub3 (bad one):> > getfattr: /gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642:> > No such file or directory> >> > thx,> > Hubert> >> > Am Sa., 27. Jan. 2024 um 06:13 Uhr schrieb Strahil Nikolov> > <hunter86_bg@xxxxxxxxx>:> > >> > > You don't need to mount it.> > > Like this :> > > # getfattr -d -e hex -m. /path/to/brick/.glusterfs/00/46/00462be8-3e61-4931-8bda-dae1645c639e> > > # file: 00/46/00462be8-3e61-4931-8bda-dae1645c639e> > > trusted.gfid=0x00462be83e6149318bdadae1645c639e> > > trusted.gfid2path.05fcbdafdeea18ab=0x30326333373930632d386637622d346436652d393464362d3936393132313930643131312f66696c656c6f636b696e672e7079> > > trusted.glusterfs.mdata=0x010000000000000000000000006170340c0000000025b6a745000000006170340c0000000020efb577000000006170340c0000000020d42b07> > > trusted.glusterfs.shard.block-size=0x0000000004000000> > > trusted.glusterfs.shard.file-size=0x00000000000000cd000000000000000000000000000000010000000000000000> > >> > >> > > Best Regards,> > > Strahil Nikolov> > >> > >> > >> > > В четвъртък, 25 януари 2024 г. в 09:42:46 ч. Гринуич+2, Hu Bert <revirii@xxxxxxxxxxxxxx> написа:> > >> > >> > >> > >> > >> > > Good morning,> > >> > > hope i got it right... using:> > >> > > mount -t glusterfs -o aux-gfid-mount glusterpub1:/workdata /mnt/workdata> > >> > > gfid 1:> > > getfattr -n trusted.glusterfs.pathinfo -e text> > > /mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb> > > getfattr: Removing leading '/' from absolute path names> > > # file: mnt/workdata/.gfid/faf59566-10f5-4ddd-8b0c-a87bc6a334fb> > > trusted.glusterfs.pathinfo="(<DISTRIBUTE:workdata-dht>> > > (<REPLICATE:workdata-replicate-3>> > > <POSIX(/gluster/md6/workdata):glusterpub1:/gluster/md6/workdata/images/133/283/13328349/128x128s.jpg>> > > <POSIX(/gluster/md6/workdata):glusterpub2:/gl> > > uster/md6/workdata/images/133/283/13328349/128x128s.jpg>))"> > >> > > gfid 2:> > > getfattr -n trusted.glusterfs.pathinfo -e text> > > /mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642> > > getfattr: Removing leading '/' from absolute path names> > > # file: mnt/workdata/.gfid/60465723-5dc0-4ebe-aced-9f2c12e52642> > > trusted.glusterfs.pathinfo="(<DISTRIBUTE:workdata-dht>> > > (<REPLICATE:workdata-replicate-2>> > > <POSIX(/gluster/md5/workdata):glusterpub2:/gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642>> > > <POSIX(/gluster/md5/workdata> > > ):glusterpub1:/gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642>))"> > >> > > glusterpub1 + glusterpub2 are the good ones, glusterpub3 is the> > > misbehaving (not healing) one.> > >> > > The file with gfid 1 is available under> > > /gluster/md6/workdata/images/133/283/13328349/ on glusterpub1+2> > > bricks, but missing on glusterpub3 brick.> > >> > > gfid 2: /gluster/md5/workdata/.glusterfs/60/46/60465723-5dc0-4ebe-aced-9f2c12e52642> > > is present on glusterpub1+2, but not on glusterpub3.> > >> > >> > > Thx,> > > Hubert> > >> > > Am Mi., 24. Jan. 2024 um 17:36 Uhr schrieb Strahil Nikolov> > > <hunter86_bg@xxxxxxxxx>:> > >> > > >> > > > Hi,> > > >> > > > Can you find and check the files with gfids:> > > > 60465723-5dc0-4ebe-aced-9f2c12e52642> > > > faf59566-10f5-4ddd-8b0c-a87bc6a334fb> > > >> > > > Use 'getfattr -d -e hex -m. ' command from https://docs.gluster.org/en/main/Troubleshooting/resolving-splitbrain/#analysis-of-the-output .> > > >> > > > Best Regards,> > > > Strahil Nikolov> > > >> > > > On Sat, Jan 20, 2024 at 9:44, Hu Bert> > > > <revirii@xxxxxxxxxxxxxx> wrote:> > > > Good morning,> > > >> > > > thx Gilberto, did the first three (set to WARNING), but the last one> > > > doesn't work. Anyway, with setting these three some new messages> > > > appear:> > > >> > > > [2024-01-20 07:23:58.561106 +0000] W [MSGID: 114061]> > > > [client-common.c:796:client_pre_lk_v2] 0-workdata-client-11: remote_fd> > > > is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb},> > > > {errno=77}, {error=File descriptor in bad state}]> > > > [2024-01-20 07:23:58.561177 +0000] E [MSGID: 108028]> > > > [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-3:> > > > Failed getlk for faf59566-10f5-4ddd-8b0c-a87bc6a334fb [File descriptor> > > > in bad state]> > > > [2024-01-20 07:23:58.562151 +0000] W [MSGID: 114031]> > > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-11:> > > > remote operation failed.> > > > [{path=<gfid:faf59566-10f5-4ddd-8b0c-a87bc6a334fb>},> > > > {gfid=faf59566-10f5-4ddd-8b0c-a87b> > > > c6a334fb}, {errno=2}, {error=No such file or directory}]> > > > [2024-01-20 07:23:58.562296 +0000] W [MSGID: 114061]> > > > [client-common.c:530:client_pre_flush_v2] 0-workdata-client-11:> > > > remote_fd is -1. EBADFD [{gfid=faf59566-10f5-4ddd-8b0c-a87bc6a334fb},> > > > {errno=77}, {error=File descriptor in bad state}]> > > > [2024-01-20 07:23:58.860552 +0000] W [MSGID: 114061]> > > > [client-common.c:796:client_pre_lk_v2] 0-workdata-client-8: remote_fd> > > > is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642},> > > > {errno=77}, {error=File descriptor in bad state}]> > > > [2024-01-20 07:23:58.860608 +0000] E [MSGID: 108028]> > > > [afr-open.c:361:afr_is_reopen_allowed_cbk] 0-workdata-replicate-2:> > > > Failed getlk for 60465723-5dc0-4ebe-aced-9f2c12e52642 [File descriptor> > > > in bad state]> > > > [2024-01-20 07:23:58.861520 +0000] W [MSGID: 114031]> > > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-8:> > > > remote operation failed.> > > > [{path=<gfid:60465723-5dc0-4ebe-aced-9f2c12e52642>},> > > > {gfid=60465723-5dc0-4ebe-aced-9f2c1> > > > 2e52642}, {errno=2}, {error=No such file or directory}]> > > > [2024-01-20 07:23:58.861640 +0000] W [MSGID: 114061]> > > > [client-common.c:530:client_pre_flush_v2] 0-workdata-client-8:> > > > remote_fd is -1. EBADFD [{gfid=60465723-5dc0-4ebe-aced-9f2c12e52642},> > > > {errno=77}, {error=File descriptor in bad state}]> > > >> > > > Not many log entries appear, only a few. Has someone seen error> > > > messages like these? Setting diagnostics.brick-sys-log-level to DEBUG> > > > shows way more log entries, uploaded it to:> > > > https://file.io/spLhlcbMCzr8 - not sure if that helps.> > > >> > > >> > > > Thx,> > > > Hubert> > > >> > > > Am Fr., 19. Jan. 2024 um 16:24 Uhr schrieb Gilberto Ferreira> > > > <gilberto.nunes32@xxxxxxxxx>:> > > >> > > > >> > > > > gluster volume set testvol diagnostics.brick-log-level WARNING> > > > > gluster volume set testvol diagnostics.brick-sys-log-level WARNING> > > > > gluster volume set testvol diagnostics.client-log-level ERROR> > > > > gluster --log-level=ERROR volume status> > > > >> > > > > ---> > > > > Gilberto Nunes Ferreira> > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > > > Em sex., 19 de jan. de 2024 às 05:49, Hu Bert <revirii@xxxxxxxxxxxxxx> escreveu:> > > > >>> > > > >> Hi Strahil,> > > > >> hm, don't get me wrong, it may sound a bit stupid, but... where do i> > > > >> set the log level? Using debian...> > > > >>> > > > >>> > > > >> ls /etc/glusterfs/> > > > >> eventsconfig.json glusterfs-georep-logrotate> > > > >> gluster-rsyslog-5.8.conf group-db-workload group-gluster-block> > > > >> group-nl-cache group-virt.example logger.conf.example> > > > >> glusterd.vol glusterfs-logrotate> > > > >> gluster-rsyslog-7.2.conf group-distributed-virt group-metadata-cache> > > > >> group-samba gsyncd.conf thin-arbiter.vol> > > > >>> > > > >> checked: /etc/glusterfs/logger.conf.example> > > > >>> > > > >> # To enable enhanced logging capabilities,> > > > >> #> > > > >> # 1. rename this file to /etc/glusterfs/logger.conf> > > > >> #> > > > >> # 2. rename /etc/rsyslog.d/gluster.conf.example to> > > > >> # /etc/rsyslog.d/gluster.conf> > > > >> #> > > > >> # This change requires restart of all gluster services/volumes and> > > > >> # rsyslog.> > > > >>> > > > >> tried (to test): /etc/glusterfs/logger.conf with " LOG_LEVEL='WARNING' "> > > > >>> > > > >> restart glusterd on that node, but this doesn't work, log-level stays> > > > >> on INFO. /etc/rsyslog.d/gluster.conf.example does not exist. Probably> > > > >> /etc/rsyslog.conf on debian. But first it would be better to know> > > > >> where to set the log-level for glusterd.> > > > >>> > > > >> Depending on how much the DEBUG log-level talks ;-) i could assign up> > > > >> to 100G to /var> > > > >>> > > > >>> > > > >> Thx & best regards,> > > > >> Hubert> > > > >>> > > > >>> > > > >> Am Do., 18. Jan. 2024 um 22:58 Uhr schrieb Strahil Nikolov> > > > >> <hunter86_bg@xxxxxxxxx>:> > > > >> >> > > > >> > Are you able to set the logs to debug level ?> > > > >> > It might provide a clue what it is going on.> > > > >> >> > > > >> > Best Regards,> > > > >> > Strahil Nikolov> > > > >> >> > > > >> > On Thu, Jan 18, 2024 at 13:08, Diego Zuccato> > > > >> > <diego.zuccato@xxxxxxxx> wrote:> > > > >> > That's the same kind of errors I keep seeing on my 2 clusters,> > > > >> > regenerated some months ago. Seems a pseudo-split-brain that should be> > > > >> > impossible on a replica 3 cluster but keeps happening.> > > > >> > Sadly going to ditch Gluster ASAP.> > > > >> >> > > > >> > Diego> > > > >> >> > > > >> > Il 18/01/2024 07:11, Hu Bert ha scritto:> > > > >> > > Good morning,> > > > >> > > heal still not running. Pending heals now sum up to 60K per brick.> > > > >> > > Heal was starting instantly e.g. after server reboot with version> > > > >> > > 10.4, but doesn't with version 11. What could be wrong?> > > > >> > >> > > > >> > > I only see these errors on one of the "good" servers in glustershd.log:> > > > >> > >> > > > >> > > [2024-01-18 06:08:57.328480 +0000] W [MSGID: 114031]> > > > >> > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-0:> > > > >> > > remote operation failed.> > > > >> > > [{path=<gfid:cb39a1e4-2a4c-4727-861d-3ed9ef00681b>},> > > > >> > > {gfid=cb39a1e4-2a4c-4727-861d-3ed9e> > > > >> > > f00681b}, {errno=2}, {error=No such file or directory}]> > > > >> > > [2024-01-18 06:08:57.594051 +0000] W [MSGID: 114031]> > > > >> > > [client-rpc-fops_v2.c:2561:client4_0_lookup_cbk] 0-workdata-client-1:> > > > >> > > remote operation failed.> > > > >> > > [{path=<gfid:3e9b178c-ae1f-4d85-ae47-fc539d94dd11>},> > > > >> > > {gfid=3e9b178c-ae1f-4d85-ae47-fc539> > > > >> > > d94dd11}, {errno=2}, {error=No such file or directory}]> > > > >> > >> > > > >> > > About 7K today. Any ideas? Someone?> > > > >> > >> > > > >> > >> > > > >> > > Best regards,> > > > >> > > Hubert> > > > >> > >> > > > >> > > Am Mi., 17. Jan. 2024 um 11:24 Uhr schrieb Hu Bert <revirii@xxxxxxxxxxxxxx>:> > > > >> > >>> > > > >> > >> ok, finally managed to get all servers, volumes etc runnung, but took> > > > >> > >> a couple of restarts, cksum checks etc.> > > > >> > >>> > > > >> > >> One problem: a volume doesn't heal automatically or doesn't heal at all.> > > > >> > >>> > > > >> > >> gluster volume status> > > > >> > >> Status of volume: workdata> > > > >> > >> Gluster process TCP Port RDMA Port Online Pid> > > > >> > >> ------------------------------------------------------------------------------> > > > >> > >> Brick glusterpub1:/gluster/md3/workdata 58832 0 Y 3436> > > > >> > >> Brick glusterpub2:/gluster/md3/workdata 59315 0 Y 1526> > > > >> > >> Brick glusterpub3:/gluster/md3/workdata 56917 0 Y 1952> > > > >> > >> Brick glusterpub1:/gluster/md4/workdata 59688 0 Y 3755> > > > >> > >> Brick glusterpub2:/gluster/md4/workdata 60271 0 Y 2271> > > > >> > >> Brick glusterpub3:/gluster/md4/workdata 49461 0 Y 2399> > > > >> > >> Brick glusterpub1:/gluster/md5/workdata 54651 0 Y 4208> > > > >> > >> Brick glusterpub2:/gluster/md5/workdata 49685 0 Y 2751> > > > >> > >> Brick glusterpub3:/gluster/md5/workdata 59202 0 Y 2803> > > > >> > >> Brick glusterpub1:/gluster/md6/workdata 55829 0 Y 4583> > > > >> > >> Brick glusterpub2:/gluster/md6/workdata 50455 0 Y 3296> > > > >> > >> Brick glusterpub3:/gluster/md6/workdata 50262 0 Y 3237> > > > >> > >> Brick glusterpub1:/gluster/md7/workdata 52238 0 Y 5014> > > > >> > >> Brick glusterpub2:/gluster/md7/workdata 52474 0 Y 3673> > > > >> > >> Brick glusterpub3:/gluster/md7/workdata 57966 0 Y 3653> > > > >> > >> Self-heal Daemon on localhost N/A N/A Y 4141> > > > >> > >> Self-heal Daemon on glusterpub1 N/A N/A Y 5570> > > > >> > >> Self-heal Daemon on glusterpub2 N/A N/A Y 4139> > > > >> > >>> > > > >> > >> "gluster volume heal workdata info" lists a lot of files per brick.> > > > >> > >> "gluster volume heal workdata statistics heal-count" shows thousands> > > > >> > >> of files per brick.> > > > >> > >> "gluster volume heal workdata enable" has no effect.> > > > >> > >>> > > > >> > >> gluster volume heal workdata full> > > > >> > >> Launching heal operation to perform full self heal on volume workdata> > > > >> > >> has been successful> > > > >> > >> Use heal info commands to check status.> > > > >> > >>> > > > >> > >> -> not doing anything at all. And nothing happening on the 2 "good"> > > > >> > >> servers in e.g. glustershd.log. Heal was working as expected on> > > > >> > >> version 10.4, but here... silence. Someone has an idea?> > > > >> > >>> > > > >> > >>> > > > >> > >> Best regards,> > > > >> > >> Hubert> > > > >> > >>> > > > >> > >> Am Di., 16. Jan. 2024 um 13:44 Uhr schrieb Gilberto Ferreira> > > > >> > >> <gilberto.nunes32@xxxxxxxxx>:> > > > >> > >>>> > > > >> > >>> Ah! Indeed! You need to perform an upgrade in the clients as well.> > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > >> > >>>> > > > >> > >>> Em ter., 16 de jan. de 2024 às 03:12, Hu Bert <revirii@xxxxxxxxxxxxxx> escreveu:> > > > >> > >>>>> > > > >> > >>>> morning to those still reading :-)> > > > >> > >>>>> > > > >> > >>>> i found this: https://docs.gluster.org/en/main/Troubleshooting/troubleshooting-glusterd/#common-issues-and-how-to-resolve-them> > > > >> > >>>>> > > > >> > >>>> there's a paragraph about "peer rejected" with the same error message,> > > > >> > >>>> telling me: "Update the cluster.op-version" - i had only updated the> > > > >> > >>>> server nodes, but not the clients. So upgrading the cluster.op-version> > > > >> > >>>> wasn't possible at this time. So... upgrading the clients to version> > > > >> > >>>> 11.1 and then the op-version should solve the problem?> > > > >> > >>>>> > > > >> > >>>>> > > > >> > >>>> Thx,> > > > >> > >>>> Hubert> > > > >> > >>>>> > > > >> > >>>> Am Mo., 15. Jan. 2024 um 09:16 Uhr schrieb Hu Bert <revirii@xxxxxxxxxxxxxx>:> > > > >> > >>>>>> > > > >> > >>>>> Hi,> > > > >> > >>>>> just upgraded some gluster servers from version 10.4 to version 11.1.> > > > >> > >>>>> Debian bullseye & bookworm. When only installing the packages: good,> > > > >> > >>>>> servers, volumes etc. work as expected.> > > > >> > >>>>>> > > > >> > >>>>> But one needs to test if the systems work after a daemon and/or server> > > > >> > >>>>> restart. Well, did a reboot, and after that the rebooted/restarted> > > > >> > >>>>> system is "out". Log message from working node:> > > > >> > >>>>>> > > > >> > >>>>> [2024-01-15 08:02:21.585694 +0000] I [MSGID: 106163]> > > > >> > >>>>> [glusterd-handshake.c:1501:__glusterd_mgmt_hndsk_versions_ack]> > > > >> > >>>>> 0-management: using the op-version 100000> > > > >> > >>>>> [2024-01-15 08:02:21.589601 +0000] I [MSGID: 106490]> > > > >> > >>>>> [glusterd-handler.c:2546:__glusterd_handle_incoming_friend_req]> > > > >> > >>>>> 0-glusterd: Received probe from uuid:> > > > >> > >>>>> b71401c3-512a-47cb-ac18-473c4ba7776e> > > > >> > >>>>> [2024-01-15 08:02:23.608349 +0000] E [MSGID: 106010]> > > > >> > >>>>> [glusterd-utils.c:3824:glusterd_compare_friend_volume] 0-management:> > > > >> > >>>>> Version of Cksums sourceimages differ. local cksum = 2204642525,> > > > >> > >>>>> remote cksum = 1931483801 on peer gluster190> > > > >> > >>>>> [2024-01-15 08:02:23.608584 +0000] I [MSGID: 106493]> > > > >> > >>>>> [glusterd-handler.c:3819:glusterd_xfer_friend_add_resp] 0-glusterd:> > > > >> > >>>>> Responded to gluster190 (0), ret: 0, op_ret: -1> > > > >> > >>>>> [2024-01-15 08:02:23.613553 +0000] I [MSGID: 106493]> > > > >> > >>>>> [glusterd-rpc-ops.c:467:__glusterd_friend_add_cbk] 0-glusterd:> > > > >> > >>>>> Received RJT from uuid: b71401c3-512a-47cb-ac18-473c4ba7776e, host:> > > > >> > >>>>> gluster190, port: 0> > > > >> > >>>>>> > > > >> > >>>>> peer status from rebooted node:> > > > >> > >>>>>> > > > >> > >>>>> root@gluster190 ~ # gluster peer status> > > > >> > >>>>> Number of Peers: 2> > > > >> > >>>>>> > > > >> > >>>>> Hostname: gluster189> > > > >> > >>>>> Uuid: 50dc8288-aa49-4ea8-9c6c-9a9a926c67a7> > > > >> > >>>>> State: Peer Rejected (Connected)> > > > >> > >>>>>> > > > >> > >>>>> Hostname: gluster188> > > > >> > >>>>> Uuid: e15a33fe-e2f7-47cf-ac53-a3b34136555d> > > > >> > >>>>> State: Peer Rejected (Connected)> > > > >> > >>>>>> > > > >> > >>>>> So the rebooted gluster190 is not accepted anymore. And thus does not> > > > >> > >>>>> appear in "gluster volume status". I then followed this guide:> > > > >> > >>>>>> > > > >> > >>>>> https://gluster-documentations.readthedocs.io/en/latest/Administrator%20Guide/Resolving%20Peer%20Rejected/> > > > >> > >>>>>> > > > >> > >>>>> Remove everything under /var/lib/glusterd/ (except glusterd.info) and> > > > >> > >>>>> restart glusterd service etc. Data get copied from other nodes,> > > > >> > >>>>> 'gluster peer status' is ok again - but the volume info is missing,> > > > >> > >>>>> /var/lib/glusterd/vols is empty. When syncing this dir from another> > > > >> > >>>>> node, the volume then is available again, heals start etc.> > > > >> > >>>>>> > > > >> > >>>>> Well, and just to be sure that everything's working as it should,> > > > >> > >>>>> rebooted that node again - the rebooted node is kicked out again, and> > > > >> > >>>>> you have to restart bringing it back again.> > > > >> > >>>>>> > > > >> > >>>>> Sry, but did i miss anything? Has someone experienced similar> > > > >> > >>>>> problems? I'll probably downgrade to 10.4 again, that version was> > > > >> > >>>>> working...> > > > >> > >>>>>> > > > >> > >>>>>> > > > >> > >>>>> Thx,> > > > >> > >>>>> Hubert> > > > >> > >>>> ________> > > > >> > >>>>> > > > >> > >>>>> > > > >> > >>>>> > > > >> > >>>> Community Meeting Calendar:> > > > >> > >>>>> > > > >> > >>>> Schedule -> > > > >> > >>>> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC> > > > >> > >>>> Bridge: https://meet.google.com/cpu-eiue-hvk> > > > >> > >>>> Gluster-users mailing list> > > > >> > >>>> Gluster-users@xxxxxxxxxxx> > > > >> > >>>> https://lists.gluster.org/mailman/listinfo/gluster-users> > > > >> > > ________> > > > >> > >> > > > >> > >> > > > >> > >> > > > >> > > Community Meeting Calendar:> > > > >> > >> > > > >> > > Schedule -> > > > >> > > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC> > > > >> > > Bridge: https://meet.google.com/cpu-eiue-hvk> > > > >> > > Gluster-users mailing list> > > > >> > > Gluster-users@xxxxxxxxxxx> > > > >> > > https://lists.gluster.org/mailman/listinfo/gluster-users> > > > >> >> > > > >> > --> > > > >> > Diego Zuccato> > > > >> > DIFA - Dip. di Fisica e Astronomia> > > > >> > Servizi Informatici> > > > >> > Alma Mater Studiorum - Università di Bologna> > > > >> > V.le Berti-Pichat 6/2 - 40127 Bologna - Italy> > > > >> > tel.: +39 051 20 95786> > > > >> >> > > > >> > ________> > > > >> >> > > > >> >> > > > >> >> > > > >> > Community Meeting Calendar:> > > > >> >> > > > >> > Schedule -> > > > >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC> > > > >> > Bridge: https://meet.google.com/cpu-eiue-hvk> > > > >> > Gluster-users mailing list> > > > >> > Gluster-users@xxxxxxxxxxx> > > > >> >> > > > >> > ________> > > > >> >> > > > >> >> > > > >> >> > > > >> > Community Meeting Calendar:> > > > >> >> > > > >> > Schedule -> > > > >> > Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC> > > > >> > Bridge: https://meet.google.com/cpu-eiue-hvk> > > > >> > Gluster-users mailing list> > > > >> > Gluster-users@xxxxxxxxxxx> > > > >> ________> > > > >>> > > > >>> > > > >>> > > > >> Community Meeting Calendar:> > > > >>> > > > >> Schedule -> > > > >> Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC> > > > >> Bridge: https://meet.google.com/cpu-eiue-hvk> > > > >> Gluster-users mailing list> > > > >> Gluster-users@xxxxxxxxxxx
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users