Re: stat() returns invalid file size when self healing

Ravishankar N <ravishankar@xxxxxxxxxx> · Wed, 12 Apr 2017 16:40:59 +0530

On 04/12/2017 01:57 PM, Mateusz Slupny wrote:
Hi,

I'm observing strange behavior when accessing glusterfs 3.10.0 volume 
through FUSE mount: when self-healing, stat() on a file that I know 
has non-zero size and is being appended to results in stat() return 
code 0, and st_size being set to 0 as well.

Next week I'm planning to find a minimal reproducible example and file 
a bug report. I wasn't able to find any references to similar issues, 
but I wanted to make sure that it isn't an already known problem.

Some notes about my current setup:
- Multiple applications are writing to multiple FUSE mounts pointing 
to the same gluster volume. Only one of those applicatuibs is writing 
to a given file at a time. I am only appending to files, or to be 
specific calling pwrite() with offset set to file size obtained by 
stat(). (I'm not sure if using O_APPEND would change anything, but 
still it would be a workaround, so shouldn't matter.)
- The issue happens even if no reads are performed on those files, 
e.g. load is no higher than usual.
- Since I'm calling stat() only before writing, and only one node 
writes to a given file, it means that stat() returns invalid size even 
to clients that write to the file.

Steps to reproduce:
0. Have multiple processes constantly appending data to files.
1. Stop one replica.
2. Wait few minutes.
3. Start that replica again - shd starts self healing.
4. stat() on some of the files that are being healed returns st_size 
equal to 0.

Setup:
- glusterfs 3.10.0

- volume type: replicas with arbiters
Type: Distributed-Replicate
Number of Bricks: 12 x (2 + 1) = 36

- FUSE mount configuration:
-o direct-io-mode=on passed explicitly to mount

- volume configuration:
cluster.consistent-metadata: yes
cluster.eager-lock: on
cluster.readdir-optimize: on
cluster.self-heal-readdir-size: 64KB
cluster.self-heal-daemon: on
cluster.read-hash-mode: 2
cluster.use-compound-fops: on
cluster.ensure-durability: on
cluster.granular-entry-heal: enable
cluster.entry-self-heal: off
cluster.data-self-heal: off
cluster.metadata-self-heal: off
performance.quick-read: off
performance.md-cache-timeout: 600
performance.cache-invalidation: on
performance.stat-prefetch: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
performance.flush-behind: off
performance.write-behind: off
performance.open-behind: off
cluster.background-self-heal-count: 1
network.inode-lru-limit: 1024
network.ping-timeout: 1
performance.io-cache: off
transport.address-family: inet
nfs.disable: on
cluster.locking-scheme: granular

I have already verified that following options do not influence this 
behavior:
- cluster.data-self-heal-algorithm (all possible values)
- cluster.eager-lock
- cluster.consistent-metadata
- performance.stat-prefetch

I would greatly appreciate any hints on what may be wrong with the 
current setup, or what to focus on (or not) in minimal reproducible 
example.

Would you be able to  try and see if you can reproduce this in a 
replica-3 volume? Since you are observing it on arbiter config, the bug 
could be that the stat is being served from the arbiter brick but we had 
fixed (http://review.gluster.org/13609) in one of the 3.7 releases 
itself, so maybe this is a new bug. In any case please do raise the bug 
with the gluster logs attached.

Regards,
Ravi

thanks and best regards,
Matt
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-devel