Hello all. I have a volume setup as: -8<-- root@str957-biostor:~# gluster v info BigVol Volume Name: BigVol Type: Distributed-Replicate Volume ID: c51926bd-6715-46b2-8bb3-8c915ec47e28 Status: Started Snapshot Count: 0 Number of Bricks: 28 x (2 + 1) = 84 Transport-type: tcp Bricks: Brick1: str957-biostor2:/srv/bricks/00/BigVol Brick2: str957-biostor:/srv/bricks/00/BigVol Brick3: str957-biostq:/srv/arbiters/00/BigVol (arbiter) [...] Options Reconfigured: cluster.granular-entry-heal: enable client.event-threads: 8 server.event-threads: 8 server.ssl: on client.ssl: on nfs.disable: on performance.readdir-ahead: on transport.address-family: inet features.bitrot: on features.scrub: Active features.scrub-freq: biweekly auth.ssl-allow: str957-bio* ssl.certificate-depth: 1 cluster.self-heal-daemon: enable features.quota: on features.inode-quota: on features.quota-deem-statfs: on server.manage-gids: on features.scrub-throttle: aggressive -8<-- After a couple failures (a disk on biostor2 went "missing", and glusterd on biostq got killed by OOM) I noticed that some files can't be accessed from the clients: -8<-- $ ls -lh 1_germline_CGTACTAG_L005_R* -rwxr-xr-x 1 e.f domain^users 2,0G apr 24 2015 1_germline_CGTACTAG_L005_R1_001.fastq.gz -rwxr-xr-x 1 e.f domain^users 2,0G apr 24 2015 1_germline_CGTACTAG_L005_R2_001.fastq.gz $ ls -lh 1_germline_CGTACTAG_L005_R1_001.fastq.gz ls: cannot access '1_germline_CGTACTAG_L005_R1_001.fastq.gz': Input/output error -8<-- (note that if I request ls for more files, it works...). The files have exactly the same contents (verified via md5sum). The only difference is in getfattr: trusted.bit-rot.version is 0x17000000000000005f3f9e670002ad5b on a node and 0x12000000000000005f3ce7af000dccad on the other. On the client, the log reports: -8<- [2020-08-21 11:32:52.208809] W [MSGID: 108008] [afr-self-heal-name.c:354:afr_selfheal_name_gfid_mismatch_check] 4-BigVol-replicate-13: GFID mismatch for <gfid:5217fe67-4dd0-47a1-8d27-143ae912ef4a>/1_germline_CGTACTAG_L005_R1_001.fastq.gz d70a4a6d-05fc-4988-8041-5e7f62155fe5 on BigVol-client-55 and f249f88a-909f-489d-8d1d-d428e842ee96 on BigVol-client-34 [2020-08-21 11:32:52.209768] W [fuse-bridge.c:471:fuse_entry_cbk] 0-glusterfs-fuse: 233606: LOOKUP() /[...]/1_germline_CGTACTAG_L005_R1_001.fastq.gz => -1 (Errore di input/output) -8<-- As suggested on IRC, I tested the RAM, but the only thing I got have been a "Peer rejected" status due to another OOM kill. No problem, I've been able to resolve it, but the original problem still remains. What else can I do? TIA! -- Diego Zuccato DIFA - Dip. di Fisica e Astronomia Servizi Informatici Alma Mater Studiorum - Università di Bologna V.le Berti-Pichat 6/2 - 40127 Bologna - Italy tel.: +39 051 20 95786 ________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://bluejeans.com/441850968 Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users