Re: Bitrot strange behavior

FNU Raghavendra Manjunath <rabhat@xxxxxxxxxx> · Wed, 18 Apr 2018 16:29:00 -0400

Hi Cedric,

The 120 seconds is given to allow a window for things to settle. i.e. imagine the following situation

1) open file (fd1 as file descriptor)
2) modify the file via fd1
3) close the file descriptor (fd1)
4) Again open the file (fd2)
5) modify

In the above set of operations, by the time bitrot daemon tries to calculate the signature after 1st fd (fd1) is closed, active IO could be happening again on the new file descriptor (fd2). And The signature calculated might not be correct while active IO is happening. 
So in gluster bitrot daemon waits for 120 seconds to sign the file after all the file descriptors associated with that file are closed.

So with 120 seconds time what happens is, once all the file descriptors associated with a file are closed (by the application), then a notification is sent to bitrot daemon that a object (file to be precise with details about that file) is modified. When all the file descriptors of a file are closed a operation called "release" is received by the brick. So the brick process sends a notification to bitrot daemon about a object (i.e. file) when release operation is received on that file (means all the file descriptors are closed). And the bitrot daemon waits for 120 seconds after receiving the notice. And  before the file is signed (i.e. within the 120 seconds of wait time), if someone again opens it and modifies it, the brick process will let the bit rot daemon know about it so that bitrot daemon wont attempt to sign the file (as it is actively being modified).

The above value is configurable. And can be changed to some other value.  You can use the below command to change it to a different value

"gluster volume set <volume name> features.expiry-time <value>"

But as you said, currently the comparison of the signature by the scrubber is local. i.e. while scrubbing, it calculates the checksum of the file, compares with the stored checksum (as a extended attribute) to determine whether the object is corrupted or not. 
So yes, if the object is corrupted before the signing happens, then as of now the scrubber does not have the mechanism to know that.

Regards,
Raghavendra

On Wed, Apr 18, 2018 at 2:20 PM, Cedric Lemarchand <yipikai7@xxxxxxxxx> wrote:
Hi Sweta,
Thanks, this drive me some more questions:

1. What is the reason of delaying signature creation ?

2. As a same file (replicated or dispersed) having different signature thought bricks is by definition an error, it would be good to triggered it during a scrub, or with a different tool. Is something like this planned ?

Cheers

—
Cédric Lemarchand

On 18 Apr 2018, at 07:53, Sweta Anandpara <sanandpa@xxxxxxxxxx> wrote:

Hi Cedric,

Any file is picked up for signing by the bitd process after the predetermined wait of 120 seconds. This default value is captured in the volume option 'features.expiry-time' and is configurable - in your case, it can be set to 0 or 1.

Point 2 is correct. A file corrupted before the bitrot signature is generated will not be successfully detected by the scrubber. That would require admin/manual intervention to explicitly heal the corrupted file.

-Sweta

On 04/16/2018 10:42 PM, Cedric Lemarchand wrote:
Hello,

I am playing around with the bitrot feature and have some questions:

1. when a file is created, the "trusted.bit-rot.signature” attribute
seems only created approximatively 120 seconds after its creations
(the cluster is idle and there is only one file living on it). Why ?
Is there a way to make this attribute generated at the same time of
the file creation ?

2. corrupting a file (adding a 0 locally on a brick) before the
creation of the "trusted.bit-rot.signature” do not provide any
warning: its signature is different than the 2 others copies on other
bricks. Starting a scrub did not show up anything. I would think that
Gluster compares signature between bricks for this particular use
cases, but it seems the check is only local, so a file corrupted
before it’s bitrot signature creation stay corrupted, and thus could
be served to clients whith bad data ?

Gluster 3.12.8 on Debian Stretch, bricks on ext4.

Volume Name: vol1
Type: Replicate
Volume ID: 85ccfaf2-5793-46f2-bd20-3f823b0a2232
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gluster-01:/data/brick1
Brick2: gluster-02:/data/brick2
Brick3: gluster-03:/data/brick3
Options Reconfigured:
storage.build-pgfid: on
performance.client-io-threads: off
nfs.disable: on
transport.address-family: inet
features.bitrot: on
features.scrub: Active
features.scrub-throttle: aggressive
features.scrub-freq: hourly

Cheers,

Cédric
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________

Gluster-users mailing list

Gluster-users@xxxxxxxxxxx

http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users