Also some other anomalies. Even when the files are visible and readable, many dirs are unwritable and/or undeleteable.
for example: ==== Sat Jan 04 18:36:17 [0.02 0.08 0.12] root@hpc-s:/bio/mmacchie 1104 $ mkdir hjmtest mkdir: cannot create directory `hjmtest': Invalid argument
Sat Jan 04 18:36:23 [0.02 0.08 0.12] root@hpc-s:/bio/mmacchie ==== The client log says this for that operation (note offset times - UTC vs local: <http://pastie.org/8602365>
And in many subdirs, other dirs can be made, but not deleted:
Sat Jan 04 18:41:45 [0.00 0.04 0.09] root@hpc-s:/bio/mmacchie/Nematodes2/phast/steiner_motifs/mmacchie_recovered 1109 $ mkdir j1
Sat Jan 04 18:42:00 [0.00 0.03 0.09] root@hpc-s:/bio/mmacchie/Nematodes2/phast/steiner_motifs/mmacchie_recovered 1110 $ rmdir j1 rmdir: failed to remove `j1': Transport endpoint is not connected
Sat Jan 04 18:42:09 [0.08 0.05 0.09] root@hpc-s:/bio/mmacchie/Nematodes2/phast/steiner_motifs/mmacchie_recovered
With the client log saying: ==== [2014-01-05 02:42:09.548263] W [client-rpc-fops.c:526:client3_3_stat_cbk] 0-gl-client-2: remote operation failed: Transport endpoint is not connected [2014-01-05 02:42:09.549314] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-2: remote operation failed: Transport endpoint is not connected. Path: /bio/mmacchie/Nematodes2/phast/steiner_motifs/mmacchie_recovered/j1 (aebbf21f-37fe-4edc-be8a-0f57b057b516) [2014-01-05 02:42:09.550124] W [client-rpc-fops.c:2541:client3_3_opendir_cbk] 0-gl-client-2: remote operation failed: Transport endpoint is not connected. Path: /bio/mmacchie/Nematodes2/phast/steiner_motifs/mmacchie_recovered/j1 (aebbf21f-37fe-4edc-be8a-0f57b057b516) [2014-01-05 02:42:09.552439] W [fuse-bridge.c:1193:fuse_unlink_cbk] 0-glusterfs-fuse: 5805445: RMDIR() /bio/mmacchie/Nematodes2/phast/steiner_motifs/mmacchie_recovered/j1 => -1 (Transport endpoint is not connected) [2014-01-05 02:42:12.175860] W [socket.c:514:__socket_rwv] 0-gl-client-2: readv failed (No data available) [2014-01-05 02:42:15.181365] W [socket.c:514:__socket_rwv] 0-gl-client-2: readv failed (No data available) [2014-01-05 02:42:18.186668] W [socket.c:514:__socket_rwv] 0-gl-client-2: readv failed (No data available) ====
This is odd - how can a dir be created OK but then the fs lose track of it to delete it?
And that dir (j1) can have /files/ created and deleted inside of it, but not other /dirs/ (same result as the parent dir).
In looking thru the client log, I see instances of this: ==== [2014-01-05 02:27:20.721043] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-2: remote operation failed: Transport endpoint is not connected. Path: /bio/mmacchie/Nematodes (00000000-0000-0000-0000-000000000000) [2014-01-05 02:27:20.769058] I [dht-layout.c:630:dht_layout_normalize] 0-gl-dht: found anomalies in /bio/mmacchie/Nematodes. holes=2 overlaps=0 [2014-01-05 02:27:20.769090] W [dht-selfheal.c:900:dht_selfheal_directory] 0-gl-dht: 1 subvolumes down -- not fixing [2014-01-05 02:27:20.784335] W [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-2: ==== more at: <http://pastie.org/8602381>
alarming since it says: [2014-01-05 02:27:20.769090] W [dht-selfheal.c:900:dht_selfheal_directory] 0-gl-dht: 1 subvolumes down -- not fixing
All my servers and bricks appear to be up and online:
Sat Jan 04 18:54:09 [0.76 0.30 0.20] root@biostor1:~ 1003 $ gluster volume status gl detail | egrep "Brick|Online" Brick : Brick bs2:/raid1 Online : Y Brick : Brick bs2:/raid2 Online : Y Brick : Brick bs3:/raid1 Online : Y Brick : Brick bs3:/raid2 Online : Y Brick : Brick bs4:/raid1 Online : Y Brick : Brick bs4:/raid2 Online : Y Brick : Brick bs1:/raid1 Online : Y Brick : Brick bs1:/raid2 Online : Y
The gluster server logs seem to be fairly quiet thru this. the followig contains the logs for the last day or so from the 4 servers, reduced by the following command to eliminate the 'socket.c:2788' errors
grep -v socket.c:2788 /var/log/glusterfs/etc-glusterfs-glusterd.vol.log
<http://pastie.org/8602412>
hjm
On Saturday, January 04, 2014 10:45:29 PM Vijay Bellur wrote: > On 01/04/2014 07:21 AM, harry mangalam wrote: > > This is a distributed-only glusterfs on 4 servers with 2 bricks each on > > an IPoIB network. > > > > Thanks to a misconfigured autoupdate script, when 3.4.2 was released > > today, my gluster servers tried to update themselves. 2 succeeded, but > > then failed to restart, the other 2 failed to update and kept running. > > > > Not realizing the sequence of events, I restarted the 2 that failed to > > restart, which gave my fs 2 servers running 3.4.1 and 2 running 3.4.2. > > > > When I realized this after about 30m, I shut everything down and updated > > the 2 remaining to 3.4.2 and then restarted but now I'm getting lots of > > reports of file errors of the type 'endpoints not connected' and the like: > > > > [2014-01-04 01:31:18.593547] W > > [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-2: remote > > operation failed: Transport endpoint i > > > > s not connected. Path: /bio/fishm/test_cuffdiff.sh > > (00000000-0000-0000-0000-000000000000) > > > > [2014-01-04 01:31:18.594928] W > > [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-2: remote > > operation failed: Transport endpoint i > > > > s not connected. Path: /bio/fishm/test_cuffdiff.sh > > (00000000-0000-0000-0000-000000000000) > > > > [2014-01-04 01:31:18.595818] W > > [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-2: remote > > operation failed: Transport endpoint i > > > > s not connected. Path: /bio/fishm/.#test_cuffdiff.sh > > (14c3b612-e952-4aec-ae18-7f3dbb422dcc) > > > > [2014-01-04 01:31:18.597381] W > > [client-rpc-fops.c:2624:client3_3_lookup_cbk] 0-gl-client-2: remote > > operation failed: Transport endpoint i > > > > s not connected. Path: /bio/fishm/test_cuffdiff.sh > > (00000000-0000-0000-0000-000000000000) > > > > [2014-01-04 01:31:18.598212] W > > [client-rpc-fops.c:814:client3_3_statfs_cbk] 0-gl-client-2: remote > > operation failed: Transport endpoint is > > > > not connected > > > > [2014-01-04 01:31:18.598236] W [dht-diskusage.c:45:dht_du_info_cbk] > > 0-gl-dht: failed to get disk info from gl-client-2 > > > > [2014-01-04 01:31:19.912210] W [socket.c:514:__socket_rwv] > > 0-gl-client-2: readv failed (No data available) > > > > [2014-01-04 01:31:22.912717] W [socket.c:514:__socket_rwv] > > 0-gl-client-2: readv failed (No data available) > > > > [2014-01-04 01:31:25.913208] W [socket.c:514:__socket_rwv] > > 0-gl-client-2: readv failed (No data available) > > > > The servers at the same time provided the following error 'E' messages: > > > > Fri Jan 03 17:46:42 [0.20 0.12 0.13] root@biostor1:~ > > > > 1008 $ grep ' E ' /var/log/glusterfs/bricks/raid1.log |grep '2014-01-03' > > > > [2014-01-03 06:11:36.251786] E [server-helpers.c:751:server_alloc_frame] > > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x103) [0x3161e090d3] > > (-->/usr/lib64/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x245) > > [0x3161e08f85] > > (-->/usr/lib64/glusterfs/3.4.1/xlator/protocol/server.so(server3_3_lookup+ > > 0xa0) [0x7fa60e577170]))) 0-server: invalid argument: conn > > > > [2014-01-03 06:11:36.251813] E > > [rpcsvc.c:450:rpcsvc_check_and_reply_error] 0-rpcsvc: rpc actor failed > > to complete successfully > > > > [2014-01-03 17:48:44.236127] E [rpc-transport.c:253:rpc_transport_load] > > 0-rpc-transport: /usr/lib64/glusterfs/3.4.1/rpc-transport/rdma.so: > > cannot open shared object file: No such file or directory > > > > [2014-01-03 19:15:26.643378] E [rpc-transport.c:253:rpc_transport_load] > > 0-rpc-transport: /usr/lib64/glusterfs/3.4.2/rpc-transport/rdma.so: > > cannot open shared object file: No such file or directory > > rdma.so seems to be missing here. Is glusterfs-rdma-3.4.2-1 rpm > installed on the servers? > > -Vijay > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://supercolony.gluster.org/mailman/listinfo/gluster-users
--- Harry Mangalam - Research Computing, OIT, Rm 225 MSTB, UC Irvine [m/c 2225] / 92697 Google Voice Multiplexer: (949) 478-4487 415 South Circle View Dr, Irvine, CA, 92697 [shipping] MSTB Lat/Long: (33.642025,-117.844414) (paste into Google Maps) ---
|
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users