Hey folks,I am having a weird issue here. I am running a 3-node gluster setup with these versions:
glusterfs-selinux-2.0.1-1.el8s.noarch glusterfs-9.6-1.el8s.x86_64 centos-release-gluster9-1.0-1.el8.noarch libglusterfs0-9.6-1.el8s.x86_64 libglusterd0-9.6-1.el8s.x86_64 glusterfs-cli-9.6-1.el8s.x86_64 glusterfs-server-9.6-1.el8s.x86_64 glusterfs-client-xlators-9.6-1.el8s.x86_64 glusterfs-fuse-9.6-1.el8s.x86_64 My volume info: Volume Name: web-dir Type: Replicate Volume ID: 4ff57154-6ccb-45b0-97da-c12b8b5afa2b Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: wc-srv01.eulie.de:/var/lib/gluster/brick01 Brick2: wc-srv02.eulie.de:/var/lib/gluster/brick01 Brick3: wc-srv03.eulie.de:/var/lib/gluster/brick01 Options Reconfigured: cluster.granular-entry-heal: on storage.fips-mode-rchecksum: on transport.address-family: inet nfs.disable: on performance.client-io-threads: on features.cache-invalidation: on features.cache-invalidation-timeout: 600 performance.stat-prefetch: on performance.cache-invalidation: on performance.md-cache-timeout: 600 network.inode-lru-limit: 200000 performance.readdir-ahead: on performance.parallel-readdir: on performance.nl-cache: on performance.nl-cache-timeout: 600 performance.nl-cache-positive-entry: on performance.qr-cache-timeout: 600 performance.cache-size: 4096MB performance.cache-max-file-size: 512KB diagnostics.latency-measurement: on diagnostics.count-fop-hits: on performance.io-cache: on performance.io-thread-count: 16 server.allow-insecure: on cluster.lookup-optimize: on client.event-threads: 8 server.event-threads: 4 cluster.readdir-optimize: on performance.write-behind-window-size: 32MB and all bricks are online: Status of volume: web-dir Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick wc-srv01.eulie.de:/var/lib/gluster/brick01 49152 0 Y 2671
Brick wc-srv02.eulie.de:/var/lib/gluster/brick01 49152 0 Y 2614
Brick wc-srv03.eulie.de:/var/lib/gluster/brick01 49152 0 Y 3223 Self-heal Daemon on localhost N/A N/A Y 2679
Self-heal Daemon on wc-srv02.dc-dus.dalason.net N/A N/A Y 41537
Self-heal Daemon on wc-srv03.dc-dus.dalason.net N/A N/A Y 78473
Task Status of Volume web-dir ------------------------------------------------------------------------------ There are no active volume tasks Selinux is set to permissive. System is running AlmaLinux 8 with current patches (as of today).The three servers wc-srv01, wc-srv02 and wc-srv03 are connected via 10Gbit, can see each other and no connections issue arive. Network speed is nearly 10Gbit, tested.
I mounted the volume on each server via itself: wc-srv01 fstab: wc-srv01.eulie.de:/web-dir /var/www glusterfs defaults,_netdev 0 0 wc-srv02 fstab: wc-srv02.eulie.de:/web-dir /var/www glusterfs defaults,_netdev 0 0 wc-srv03 fstab: wc-srv03.eulie.de:/web-dir /var/www glusterfs defaults,_netdev 0 0 Mounting works, and its size is correct across all servers: # df -h /var/www/ Filesystem Size Used Avail Use% Mounted on wc-srv01.eulie.de:/web-dir 100G 31G 70G 31% /var/www Here is the weird issue: wc01: while sleep 1; do date > testfile ; done wc02: while sleep 1 ; do date ; cat testfile ; done Wed 14 Sep 09:43:47 CEST 2022 Wed 14 Sep 09:43:45 CEST 2022 Wed 14 Sep 09:43:48 CEST 2022 Wed 14 Sep 09:43:45 CEST 2022 Wed 14 Sep 09:43:49 CEST 2022 Wed 14 Sep 09:43:45 CEST 2022 Wed 14 Sep 09:43:50 CEST 2022 Wed 14 Sep 09:43:45 CEST 2022 Wed 14 Sep 09:43:51 CEST 2022 Wed 14 Sep 09:43:45 CEST 2022 Wed 14 Sep 09:43:52 CEST 2022 Wed 14 Sep 09:43:45 CEST 2022 wc03: while sleep 1 ; do date ; cat testfile ; done Wed 14 Sep 09:43:43 CEST 2022 Wed 14 Sep 09:41:12 CEST 2022 Wed 14 Sep 09:43:45 CEST 2022 Wed 14 Sep 09:41:12 CEST 2022 Wed 14 Sep 09:43:46 CEST 2022 Wed 14 Sep 09:41:12 CEST 2022 Wed 14 Sep 09:43:47 CEST 2022 Wed 14 Sep 09:41:12 CEST 2022 Wed 14 Sep 09:43:48 CEST 2022 Wed 14 Sep 09:41:12 CEST 2022 Wed 14 Sep 09:43:49 CEST 2022 Wed 14 Sep 09:41:12 CEST 2022So the file exists, and on initial write the stamps are correct. From second 2 onward, I have three different files on all servers.
Deleting the file is instant on all nodes, and editing a file in vim (doing :w) also instantly updates all files.
# gluster volume heal web-dir info Brick wc-srv01.eulie.de:/var/lib/gluster/brick01 Status: Connected Number of entries: 0 Brick wc-srv02.eulie.de:/var/lib/gluster/brick01 Status: Connected Number of entries: 0 Brick wc-srv03.eulie.de:/var/lib/gluster/brick01 Status: Connected Number of entries: 0 What... Why... How? :-)I need a synced three-way active-active-active cluster with consistent data across all nodes.
Any pointers from you gurus? -- with kind regards, mit freundlichen Gruessen, Christian Reiss
Attachment:
OpenPGP_signature
Description: OpenPGP digital signature
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users