Hi David,
Do you have any bricks down? Can you please share the output of the following commands and also the logs of the server and the client nodes?
1) gluster volume info
2) gluster volume status
3) gluster volume bitrot <volume name> scrub status
Few more questions
1) How many copies of the file were corrupted? (All? Or Just one?)
2 things I am trying to understand
A) IIUC, if only one copy is corrupted, then the replication module from the gluster client should serve the data from the
remaining good copy
B) If all the copies were corrupted (or say more than quorum copies were corrupted which means 2 in case of 3 way replication)
then there will be an error to the application. But the error to be reported should 'Input/Output Error'. Not 'Transport endpoint not connected'
'Transport endpoint not connected' error usually comes when a brick where the operation is being directed to is not connected to the client.
Regards,
Raghavendra
On Mon, Feb 4, 2019 at 6:02 AM David Spisla <spisla80@xxxxxxxxx> wrote:
Hello Amar,sounds good. Until now this patch is only merged into master. I think it should be part of the next v5.x patch release!RegardsDavidAm Mo., 4. Feb. 2019 um 09:58 Uhr schrieb Amar Tumballi Suryanarayan <atumball@xxxxxxxxxx>:Hi David,I guess https://review.gluster.org/#/c/glusterfs/+/21996/ helps to fix the issue. I will leave it to Raghavendra Bhat to reconfirm.Regards,AmarOn Fri, Feb 1, 2019 at 8:45 PM David Spisla <spisla80@xxxxxxxxx> wrote:_______________________________________________Hello Gluster Community,I have got a 4 Node Cluster with a Replica 4 Volume, so each node has a brick with a copy of a file. Now I tried out the bitrot functionality and corrupt the copy on the brick of node1. After this I scrub ondemand and the file is marked correctly as corrupted.No I try to read that file from FUSE on node1 (with corrupt copy):$ cat file1.txtFUSE log says:
cat: file1.txt: Transport endpoint is not connected[2019-02-01 15:02:19.191984] E [MSGID: 114031] [client-rpc-fops_v2.c:281:client4_0_open_cbk] 0-archive1-client-0: remote operation failed. Path: /data/file1.txt (b432c1d6-ece2-42f2-8749-b11e058c4be3) [Input/output error]
[2019-02-01 15:02:19.192269] W [dict.c:761:dict_ref] (-->/usr/lib64/glusterfs/5.3/xlator/performance/quick-read.so(+0x7329) [0x7fc642471329] -->/usr/lib64/glusterfs/5.3/xlator/performance/io-cache.so(+0xaaf5) [0x7fc642682af5] -->/usr/lib64/libglusterfs.so.0(dict_ref+0x58) [0x7fc64a78d218] ) 0-dict: dict is NULL [Invalid argument]
[2019-02-01 15:02:19.192714] E [MSGID: 108009] [afr-open.c:220:afr_openfd_fix_open_cbk] 0-archive1-replicate-0: Failed to open /data/file1.txt on subvolume archive1-client-0 [Input/output error]
[2019-02-01 15:02:19.193009] W [fuse-bridge.c:2371:fuse_readv_cbk] 0-glusterfs-fuse: 147733: READ => -1 gfid=b432c1d6-ece2-42f2-8749-b11e058c4be3 fd=0x7fc60408bbb8 (Transport endpoint is not connected)
[2019-02-01 15:02:19.193653] W [MSGID: 114028] [client-lk.c:347:delete_granted_locks_owner] 0-archive1-client-0: fdctx not valid [Invalid argument]And from FUSE on node2 (with heal copy):$ cat file1.txt
file1It seems to be that node1 wants to get the file from its own brick, but the copy there is broken. Node2 gets the file from its own brick with a heal copy, so reading the file succeed.But I am wondering myself because sometimes reading the file from node1 with the broken copy succeedWhat is the expected behaviour here? Is it possibly to read files with a corrupted copy from any client access?RegardsDavid Spisla
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users--Amar Tumballi (amarts)
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users