Hello Soumya, Hello everybody, network.ping-timeout was set to 42 seconds. I set it to 0 but no difference. The problem was, after having re-set le transport-type to rdma,tcp some brick down after a few minutes.. Despite of restarting volumes, after a few minutes, some [other/different] bricks down again. Now, after re-creation of my volume, bricks keep alive but, oddly, i’m not able to write on my volume. In addition, I defined a distributed volume with 2 servers, 4 bricks of 250GB each and my final volume seems to be only sized to 500GB… It’s amazing.. Here you can find some information: # gluster volume status vol_workdir_amd Status of volume: vol_workdir_amd Gluster process TCP Port RDMA Port Online Pid ------------------------------------------------------------------------------ Brick ib-storage1:/export/brick_workdir/bri ck1/data 49185 49186 Y 23098 Brick ib-storage3:/export/brick_workdir/bri ck1/data 49158 49159 Y 3886 Brick ib-storage1:/export/brick_workdir/bri ck2/data 49187 49188 Y 23117 Brick ib-storage3:/export/brick_workdir/bri ck2/data 49160 49161 Y 3905 # gluster volume info vol_workdir_amd Volume Name: vol_workdir_amd Type: Distribute Volume ID: 087d26ea-c6df-4cbe-94af-ecd87b59aedb Status: Started Number of Bricks: 4 Transport-type: tcp,rdma Bricks: Brick1: ib-storage1:/export/brick_workdir/brick1/data Brick2: ib-storage3:/export/brick_workdir/brick1/data Brick3: ib-storage1:/export/brick_workdir/brick2/data Brick4: ib-storage3:/export/brick_workdir/brick2/data Options Reconfigured: performance.readdir-ahead: on # pdsh -w storage[1,3] df -h /export/brick_workdir/brick{1,2} storage3: Filesystem Size Used Avail Use% Mounted on storage3: /dev/mapper/st--block1-blk1--workdir storage3: 250G 34M 250G 1% /export/brick_workdir/brick1 storage3: /dev/mapper/st--block2-blk2--workdir storage3: 250G 34M 250G 1% /export/brick_workdir/brick2 storage1: Filesystem Size Used Avail Use% Mounted on storage1: /dev/mapper/st--block1-blk1--workdir storage1: 250G 33M 250G 1% /export/brick_workdir/brick1 storage1: /dev/mapper/st--block2-blk2--workdir storage1: 250G 33M 250G 1% /export/brick_workdir/brick2 # df -h /workdir/ Filesystem Size Used Avail Use% Mounted on localhost:vol_workdir_amd.rdma 500G 67M 500G 1% /workdir # touch /workdir/test touch: impossible de faire un touch « /workdir/test »: Aucun fichier ou dossier de ce type # tail -30l /var/log/glusterfs/workdir.log Host Unreachable, Check your connection with IPoIB [2015-07-21 21:10:33.927673] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) Host Unreachable, Check your connection with IPoIB [2015-07-21 21:10:37.877231] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to 49173 (from 0) [2015-07-21 21:10:37.880556] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to 49174 (from 0) [2015-07-21 21:10:37.914661] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173) Host Unreachable, Check your connection with IPoIB [2015-07-21 21:10:37.923535] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) Host Unreachable, Check your connection with IPoIB [2015-07-21 21:10:41.883925] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to 49173 (from 0) [2015-07-21 21:10:41.887085] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to 49174 (from 0) [2015-07-21 21:10:41.919394] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173) Host Unreachable, Check your connection with IPoIB [2015-07-21 21:10:41.932622] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) Host Unreachable, Check your connection with IPoIB [2015-07-21 21:10:44.682636] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 [2015-07-21 21:10:44.682947] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 [2015-07-21 21:10:44.683240] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 [2015-07-21 21:10:44.683472] W [dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht: failed to get disk info from vol_workdir_amd-client-0 [2015-07-21 21:10:44.683506] W [dht-diskusage.c:48:dht_du_info_cbk] 0-vol_workdir_amd-dht: failed to get disk info from vol_workdir_amd-client-2 [2015-07-21 21:10:44.683532] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 [2015-07-21 21:10:44.683551] W [fuse-bridge.c:1970:fuse_create_cbk] 0-glusterfs-fuse: 18: /test => -1 (Aucun fichier ou dossier de ce type) [2015-07-21 21:10:44.683619] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 [2015-07-21 21:10:44.683846] W [dht-layout.c:189:dht_layout_search] 0-vol_workdir_amd-dht: no subvolume for hash (value) = 1072520554 [2015-07-21 21:10:45.886807] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-0: changing port to 49173 (from 0) [2015-07-21 21:10:45.893059] I [rpc-clnt.c:1819:rpc_clnt_reconfig] 0-vol_workdir_amd-client-2: changing port to 49174 (from 0) [2015-07-21 21:10:45.920434] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-0: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1021 peer:10.0.4.1:49173) Host Unreachable, Check your connection with IPoIB [2015-07-21 21:10:45.925292] W [rdma.c:1263:gf_rdma_cm_event_handler] 0-vol_workdir_amd-client-2: cma event RDMA_CM_EVENT_REJECTED, error 8 (me:10.0.4.1:1020 peer:10.0.4.1:49174) Host Unreachable, Check your connection with IPoIB I use GlusterFS in production since around 3 years without any block problem but now the situation is awesome since more than 3 weeks… Indeed, our production are down since roughly 3.5 weeks (with a lot and different problems with GlusterFS v3.5.3 and now with 3.7.2-3) and i need to restart it… Thanks in advance, Geoffrey
------------------------------------------------------
Geoffrey Letessier Responsable informatique & ingénieur système UPR 9080 - CNRS - Laboratoire de Biochimie Théorique Institut de Biologie Physico-Chimique 13, rue Pierre et Marie Curie - 75005 Paris Tel: 01 58 41 50 93 - eMail: geoffrey.letessier@xxxxxxx Le 21 juil. 2015 à 19:36, Soumya Koduri <skoduri@xxxxxxxxxx> a écrit : From the following errors, |
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users