On Tue, Jul 18, 2017 at 10:48:45AM +0200, Jan Wrona wrote: > Hi, > > I need to use rrdtool on top of a Gluster FUSE mount, rrdtool uses > memory-mapped file IO extensively (I know I can recompile rrdtool with > mmap() disabled, but that is just a workaround). I have three FUSE mount > points on three different servers, on one of them the command "rrdtool > create test.rrd --start 920804400 DS:speed:COUNTER:600:U:U > RRA:AVERAGE:0.5:1:24" works fine, on the other two servers the command is > killed and Bus error is reported. With every Bus error, following two lines > rise in the mount log: > [2017-07-18 08:30:22.470770] E [MSGID: 108008] > [afr-transaction.c:2629:afr_write_txn_refresh_done] 0-flow-replicate-0: > Failing FALLOCATE on gfid 6a675cdd-2ea1-473f-8765-2a4c935a22ad: split-brain > observed. [Input/output error] > [2017-07-18 08:30:22.470843] W [fuse-bridge.c:1291:fuse_err_cbk] > 0-glusterfs-fuse: 56589: FALLOCATE() ERR => -1 (Input/output error) > > I'm not sure about current state of mmap() on FUSE and Gluster, but its > strange that it works only on certain mount of the same volume. This can be caused when a mmap()'d region is not written. For example, trying to read/write the mmap()'d region that is after the end-of-file. I've seen issues like this before (long ago), and that got fixed in the write-behind xlator. Could you disable the performance.write-behind option for the volume and try to reproduce the problem? If the issue is in write-behind, disabling it should prevent the issue. If this helps, please file a bug with strace of the application and tcpdump that contains the GlusterFS traffic from start to end when the problem is observed. https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&component=write-behind HTH, Niels > > version: glusterfs 3.10.3 > > [root@dc1]# gluster volume info flow > Volume Name: flow > Type: Distributed-Replicate > Volume ID: dc6a9ea0-97ec-471f-b763-1d395ece73e1 > Status: Started > Snapshot Count: 0 > Number of Bricks: 3 x 2 = 6 > Transport-type: tcp > Bricks: > Brick1: dc1.liberouter.org:/data/glusterfs/flow/brick1/safety_dir > Brick2: dc2.liberouter.org:/data/glusterfs/flow/brick2/safety_dir > Brick3: dc2.liberouter.org:/data/glusterfs/flow/brick1/safety_dir > Brick4: dc3.liberouter.org:/data/glusterfs/flow/brick2/safety_dir > Brick5: dc3.liberouter.org:/data/glusterfs/flow/brick1/safety_dir > Brick6: dc1.liberouter.org:/data/glusterfs/flow/brick2/safety_dir > Options Reconfigured: > performance.parallel-readdir: on > performance.client-io-threads: on > cluster.nufa: enable > network.ping-timeout: 10 > transport.address-family: inet > nfs.disable: true > > [root@dc1]# gluster volume status flow > Status of volume: flow > Gluster process TCP Port RDMA Port Online Pid > ------------------------------------------------------------------------------ > Brick dc1.liberouter.org:/data/glusterfs/fl > ow/brick1/safety_dir 49155 0 Y 26441 > Brick dc2.liberouter.org:/data/glusterfs/fl > ow/brick2/safety_dir 49155 0 Y 26110 > Brick dc2.liberouter.org:/data/glusterfs/fl > ow/brick1/safety_dir 49156 0 Y 26129 > Brick dc3.liberouter.org:/data/glusterfs/fl > ow/brick2/safety_dir 49152 0 Y 8703 > Brick dc3.liberouter.org:/data/glusterfs/fl > ow/brick1/safety_dir 49153 0 Y 8722 > Brick dc1.liberouter.org:/data/glusterfs/fl > ow/brick2/safety_dir 49156 0 Y 26460 > Self-heal Daemon on localhost N/A N/A Y 26493 > Self-heal Daemon on dc2.liberouter.org N/A N/A Y 26151 > Self-heal Daemon on dc3.liberouter.org N/A N/A Y 8744 > > Task Status of Volume flow > ------------------------------------------------------------------------------ > There are no active volume tasks > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://lists.gluster.org/mailman/listinfo/gluster-users
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users