On Tue, Jul 18, 2017 at 01:55:17PM +0200, Jan Wrona wrote: > On 18.7.2017 12:17, Niels de Vos wrote: > > On Tue, Jul 18, 2017 at 10:48:45AM +0200, Jan Wrona wrote: > > > Hi, > > > > > > I need to use rrdtool on top of a Gluster FUSE mount, rrdtool uses > > > memory-mapped file IO extensively (I know I can recompile rrdtool with > > > mmap() disabled, but that is just a workaround). I have three FUSE mount > > > points on three different servers, on one of them the command "rrdtool > > > create test.rrd --start 920804400 DS:speed:COUNTER:600:U:U > > > RRA:AVERAGE:0.5:1:24" works fine, on the other two servers the command is > > > killed and Bus error is reported. With every Bus error, following two lines > > > rise in the mount log: > > > [2017-07-18 08:30:22.470770] E [MSGID: 108008] > > > [afr-transaction.c:2629:afr_write_txn_refresh_done] 0-flow-replicate-0: > > > Failing FALLOCATE on gfid 6a675cdd-2ea1-473f-8765-2a4c935a22ad: split-brain > > > observed. [Input/output error] > > > [2017-07-18 08:30:22.470843] W [fuse-bridge.c:1291:fuse_err_cbk] > > > 0-glusterfs-fuse: 56589: FALLOCATE() ERR => -1 (Input/output error) > > > > > > I'm not sure about current state of mmap() on FUSE and Gluster, but its > > > strange that it works only on certain mount of the same volume. > > This can be caused when a mmap()'d region is not written. For example, > > trying to read/write the mmap()'d region that is after the end-of-file. > > I've seen issues like this before (long ago), and that got fixed in the > > write-behind xlator. > > > > Could you disable the performance.write-behind option for the volume and > > try to reproduce the problem? If the issue is in write-behind, disabling > > it should prevent the issue. > > > > If this helps, please file a bug with strace of the application and > > tcpdump that contains the GlusterFS traffic from start to end when the > > problem is observed. > > I've disabled the performance.write-behind, umounted, stopped and started > the volume, then mounted again, but no effect. After that I've been > successively disabling/enabling options and xlators, and I've found that the > problem is related to the cluster.nufa option. When NUFA translator is > disabled, rrdtool works fine on all mounts. When enabled again, the problem > shows up again. Thanks for testing. NUFA is not something that is used a lot, and I think it only has benefits for very few workloads. I dont think we can recommend using NUFA. In any case, this seems to be a bug in the NUFA xlator, please file a bug for that never the less. In the bug, please point to this discussion in the mailinglist archives. http://lists.gluster.org/pipermail/gluster-users/ (find the URL there) Thanks, Niels > > > > > https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&component=write-behind > > > > HTH, > > Niels > > > > > > > version: glusterfs 3.10.3 > > > > > > [root@dc1]# gluster volume info flow > > > Volume Name: flow > > > Type: Distributed-Replicate > > > Volume ID: dc6a9ea0-97ec-471f-b763-1d395ece73e1 > > > Status: Started > > > Snapshot Count: 0 > > > Number of Bricks: 3 x 2 = 6 > > > Transport-type: tcp > > > Bricks: > > > Brick1: dc1.liberouter.org:/data/glusterfs/flow/brick1/safety_dir > > > Brick2: dc2.liberouter.org:/data/glusterfs/flow/brick2/safety_dir > > > Brick3: dc2.liberouter.org:/data/glusterfs/flow/brick1/safety_dir > > > Brick4: dc3.liberouter.org:/data/glusterfs/flow/brick2/safety_dir > > > Brick5: dc3.liberouter.org:/data/glusterfs/flow/brick1/safety_dir > > > Brick6: dc1.liberouter.org:/data/glusterfs/flow/brick2/safety_dir > > > Options Reconfigured: > > > performance.parallel-readdir: on > > > performance.client-io-threads: on > > > cluster.nufa: enable > > > network.ping-timeout: 10 > > > transport.address-family: inet > > > nfs.disable: true > > > > > > [root@dc1]# gluster volume status flow > > > Status of volume: flow > > > Gluster process TCP Port RDMA Port Online Pid > > > ------------------------------------------------------------------------------ > > > Brick dc1.liberouter.org:/data/glusterfs/fl > > > ow/brick1/safety_dir 49155 0 Y 26441 > > > Brick dc2.liberouter.org:/data/glusterfs/fl > > > ow/brick2/safety_dir 49155 0 Y 26110 > > > Brick dc2.liberouter.org:/data/glusterfs/fl > > > ow/brick1/safety_dir 49156 0 Y 26129 > > > Brick dc3.liberouter.org:/data/glusterfs/fl > > > ow/brick2/safety_dir 49152 0 Y 8703 > > > Brick dc3.liberouter.org:/data/glusterfs/fl > > > ow/brick1/safety_dir 49153 0 Y 8722 > > > Brick dc1.liberouter.org:/data/glusterfs/fl > > > ow/brick2/safety_dir 49156 0 Y 26460 > > > Self-heal Daemon on localhost N/A N/A Y 26493 > > > Self-heal Daemon on dc2.liberouter.org N/A N/A Y 26151 > > > Self-heal Daemon on dc3.liberouter.org N/A N/A Y 8744 > > > > > > Task Status of Volume flow > > > ------------------------------------------------------------------------------ > > > There are no active volume tasks > > > > > > _______________________________________________ > > > Gluster-users mailing list > > > Gluster-users@xxxxxxxxxxx > > > http://lists.gluster.org/mailman/listinfo/gluster-users > >
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://lists.gluster.org/mailman/listinfo/gluster-users