Re: Sporadic Bus error on mmap() on FUSE mount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 18.7.2017 12:17, Niels de Vos wrote:
On Tue, Jul 18, 2017 at 10:48:45AM +0200, Jan Wrona wrote:
Hi,

I need to use rrdtool on top of a Gluster FUSE mount, rrdtool uses
memory-mapped file IO extensively (I know I can recompile rrdtool with
mmap() disabled, but that is just a workaround). I have three FUSE mount
points on three different servers, on one of them the command "rrdtool
create test.rrd --start 920804400 DS:speed:COUNTER:600:U:U
RRA:AVERAGE:0.5:1:24" works fine, on the other two servers the command is
killed and Bus error is reported. With every Bus error, following two lines
rise in the mount log:
[2017-07-18 08:30:22.470770] E [MSGID: 108008]
[afr-transaction.c:2629:afr_write_txn_refresh_done] 0-flow-replicate-0:
Failing FALLOCATE on gfid 6a675cdd-2ea1-473f-8765-2a4c935a22ad: split-brain
observed. [Input/output error]
[2017-07-18 08:30:22.470843] W [fuse-bridge.c:1291:fuse_err_cbk]
0-glusterfs-fuse: 56589: FALLOCATE() ERR => -1 (Input/output error)

I'm not sure about current state of mmap() on FUSE and Gluster, but its
strange that it works only on certain mount of the same volume.
This can be caused when a mmap()'d region is not written. For example,
trying to read/write the mmap()'d region that is after the end-of-file.
I've seen issues like this before (long ago), and that got fixed in the
write-behind xlator.

Could you disable the performance.write-behind option for the volume and
try to reproduce the problem? If the issue is in write-behind, disabling
it should prevent the issue.

If this helps, please file a bug with strace of the application and
tcpdump that contains the GlusterFS traffic from start to end when the
problem is observed.

I've disabled the performance.write-behind, umounted, stopped and started the volume, then mounted again, but no effect. After that I've been successively disabling/enabling options and xlators, and I've found that the problem is related to the cluster.nufa option. When NUFA translator is disabled, rrdtool works fine on all mounts. When enabled again, the problem shows up again.


   https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS&component=write-behind

HTH,
Niels


version: glusterfs 3.10.3

[root@dc1]# gluster volume info flow
Volume Name: flow
Type: Distributed-Replicate
Volume ID: dc6a9ea0-97ec-471f-b763-1d395ece73e1
Status: Started
Snapshot Count: 0
Number of Bricks: 3 x 2 = 6
Transport-type: tcp
Bricks:
Brick1: dc1.liberouter.org:/data/glusterfs/flow/brick1/safety_dir
Brick2: dc2.liberouter.org:/data/glusterfs/flow/brick2/safety_dir
Brick3: dc2.liberouter.org:/data/glusterfs/flow/brick1/safety_dir
Brick4: dc3.liberouter.org:/data/glusterfs/flow/brick2/safety_dir
Brick5: dc3.liberouter.org:/data/glusterfs/flow/brick1/safety_dir
Brick6: dc1.liberouter.org:/data/glusterfs/flow/brick2/safety_dir
Options Reconfigured:
performance.parallel-readdir: on
performance.client-io-threads: on
cluster.nufa: enable
network.ping-timeout: 10
transport.address-family: inet
nfs.disable: true

[root@dc1]# gluster volume status flow
Status of volume: flow
Gluster process                             TCP Port  RDMA Port Online  Pid
------------------------------------------------------------------------------
Brick dc1.liberouter.org:/data/glusterfs/fl
ow/brick1/safety_dir                        49155     0 Y       26441
Brick dc2.liberouter.org:/data/glusterfs/fl
ow/brick2/safety_dir                        49155     0 Y       26110
Brick dc2.liberouter.org:/data/glusterfs/fl
ow/brick1/safety_dir                        49156     0 Y       26129
Brick dc3.liberouter.org:/data/glusterfs/fl
ow/brick2/safety_dir                        49152     0 Y       8703
Brick dc3.liberouter.org:/data/glusterfs/fl
ow/brick1/safety_dir                        49153     0 Y       8722
Brick dc1.liberouter.org:/data/glusterfs/fl
ow/brick2/safety_dir                        49156     0 Y       26460
Self-heal Daemon on localhost               N/A       N/A Y       26493
Self-heal Daemon on dc2.liberouter.org      N/A       N/A Y       26151
Self-heal Daemon on dc3.liberouter.org      N/A       N/A Y       8744

Task Status of Volume flow
------------------------------------------------------------------------------
There are no active volume tasks

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux