Just had an NFS crash on my test system running 3.5. Load of messages like this: [2014-05-19 06:24:59.347147] E [rpc-drc.c:499:rpcsvc_add_op_to_cache] 0-rpc-service: DRC failed to detect duplicates [2014-05-19 06:24:59.347240] E [rpc-drc.c:499:rpcsvc_add_op_to_cache] 0-rpc-service: DRC failed to detect duplicates [2014-05-19 06:24:59.347340] E [rpc-drc.c:499:rpcsvc_add_op_to_cache] 0-rpc-service: DRC failed to detect duplicates [2014-05-19 06:24:59.347408] E [rpc-drc.c:499:rpcsvc_add_op_to_cache] 0-rpc-service: DRC failed to detect duplicates followed by: .... frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 6 time of crash: 2014-05-19 06:25:13configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.5.0 /lib64/libc.so.6(+0x329a0)[0x7f0e3d9249a0] /lib64/libc.so.6(gsignal+0x35)[0x7f0e3d924925] /lib64/libc.so.6(abort+0x175)[0x7f0e3d926105] /lib64/libc.so.6(+0x70837)[0x7f0e3d962837] /lib64/libc.so.6(+0x76166)[0x7f0e3d968166] /usr/lib64/libgfrpc.so.0(+0x10e0f)[0x7f0e3f0e2e0f] /usr/lib64/libglusterfs.so.0(rb_destroy+0x51)[0x7f0e3f331bc1] /usr/lib64/libgfrpc.so.0(+0x10b5f)[0x7f0e3f0e2b5f] /usr/lib64/libgfrpc.so.0(rpcsvc_drc_notify+0xe8)[0x7f0e3f0e2c98] /usr/lib64/libgfrpc.so.0(rpcsvc_handle_disconnect+0x105)[0x7f0e3f0d9d35] /usr/lib64/libgfrpc.so.0(rpcsvc_notify+0x1a0)[0x7f0e3f0db880] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x28)[0x7f0e3f0dcf98] /usr/lib64/glusterfs/3.5.0/rpc-transport/socket.so(+0xa9a1)[0x7f0e3a93c9a1] /usr/lib64/libglusterfs.so.0(+0x672f7)[0x7f0e3f3512f7] /usr/sbin/glusterfs(main+0x564)[0x4075e4] /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f0e3d910d1d] /usr/sbin/glusterfs[0x404679] Volume Name: data2 Type: Distribute Volume ID: d958423f-bd25-49f1-81f8-f12e4edc6823 Status: Started Number of Bricks: 8 Transport-type: tcp Bricks: Brick1: nas5-10g:/data17/gvol Brick2: nas5-10g:/data18/gvol Brick3: nas5-10g:/data19/gvol Brick4: nas5-10g:/data20/gvol Brick5: nas6-10g:/data21/gvol Brick6: nas6-10g:/data22/gvol Brick7: nas6-10g:/data23/gvol Brick8: nas6-10g:/data24/gvol Options Reconfigured: cluster.min-free-disk: 5% network.frame-timeout: 10800 cluster.readdir-optimize: on nfs.disable: off nfs.export-volumes: on performance.readdir-ahead: off On Thu, 2014-05-01 at 09:55 +0800, Franco Broi wrote: > Installed 3.4.3 exactly 2 weeks ago on all our brick servers and I'm > happy to report that we've not had a crash since. > > Thanks for all the good work. > > On Tue, 2014-04-15 at 14:22 +0800, Franco Broi wrote: > > The whole system came to a grinding halt today and no amount of > > restarting daemons would make it work again. What was really odd was > > that gluster vol status said everything was fine and yet all the client > > mount points had hung. > > > > On the node that was exporting Gluster NFS I had zombie processes so I > > decided to reboot, took a while for the ZFS JBOD's to sort themselves > > out but I was relieved when it all came back up - except that the df > > size on the clients was wrong... > > > > gluster vol info and gluster vol status said everything was fine but it > > was obvious that 2 of my bricks were missing. I restarted everything, > > and still 2 missing brick. I remounted the fuse clients and still no > > good. > > > > Just out of sheer desperation and for no good reason I disabled the > > Gluster NFS export and magically the missing 2 bricks reappeared and the > > filesystem was back to its normal size. I turned NFS exports back on and > > everything stayed working. > > > > I'm not trying to belittle all the good work done by the Gluster > > developers but this really doesn't look like a viable big data > > filesystem at the moment. We've currently got 800TB and are about to add > > another 400TB but quite honestly the prospect terrifies me. > > > > > > On Tue, 2014-04-15 at 08:35 +0800, Franco Broi wrote: > > > On Mon, 2014-04-14 at 17:29 -0700, Harshavardhana wrote: > > > > > > > > > > Just distributed. > > > > > > > > > > > > > Pure distributed setup you have to take a downtime, since the data > > > > isn't replicated. > > > > > > If I shutdown the server processes, wont the clients just wait for it to > > > come back up? Ie like NFS hard mounts? I don't mind an interruption, I > > > just want to avoid killing all jobs that are currently accessing the > > > filesystem if at all possible, our users have suffered a lot recently > > > with filesystem outages. > > > > > > By the way, how does one shutdown the glusterfs processes without > > > stopping a volume? It would be nice to have a quiesce or freeze option > > > that just stalls all access while maintenance takes place. > > > > > > > > > > > >> > > > > >> > 3.4.1 to 3.4.3-3 shouldn't cause problems with existing clients and > > > > >> > other servers, right? > > > > >> > > > > > >> > > > > >> You mean 3.4.1 and 3.4.3 co-existent with in a cluster? > > > > > > > > > > Yes, at least for the duration of the upgrade. > > > > > > > > Yeah 3.4.x series is backward compatible to each other in any case. > > > > > > > > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users@xxxxxxxxxxx > > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > > > > > > _______________________________________________ > > Gluster-users mailing list > > Gluster-users@xxxxxxxxxxx > > http://supercolony.gluster.org/mailman/listinfo/gluster-users > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users