Hi Kotresh,
On Thursday, September 22, 2016, Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx> wrote:
> Hi Amudhan,
>
> No, bitrot signer is a different process by itself and is not part of brick process.
> I believe the process 2280 is a brick process ? Did you check with dist-rep volume?
> Is the same behavior being observed there as well? We need to figure out why brick
> process is holding that fd for such a long time.
>
> Thanks and Regards,
> Kotresh H R
>
> ----- Original Message -----
>> From: "Amudhan P" <amudhan83@xxxxxxxxx>
>> To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
>> Sent: Wednesday, September 21, 2016 8:15:33 PM
>> Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process
>>
>> Hi Kotresh,
>>
>> As soon as fd closes from brick1 pid, i can see bitrot signature for the
>> file in brick.
>>
>> So, it looks like fd opened by brick process to calculate signature.
>>
>> output of the file:
>>
>> -rw-r--r-- 2 root root 250M Sep 21 18:32
>> /media/disk1/brick1/data/G/test59-bs10M-c100.nul
>>
>> getfattr: Removing leading '/' from absolute path names
>> # file: media/disk1/brick1/data/G/test59-bs10M-c100.nul
>> trusted.bit-rot.signature=0x010200000000000000e9474e4cc673c0c227a6e807e04aa4ab1f88d3744243950a290869c53daa65df
>> trusted.bit-rot.version=0x020000000000000057d6af3200012a13
>> trusted.ec.config=0x0000080501000200
>> trusted.ec.size=0x000000003e800000
>> trusted.ec.version=0x0000000000001f400000000000001f40
>> trusted.gfid=0x4c091145429448468fffe358482c63e1
>>
>> stat /media/disk1/brick1/data/G/test59-bs10M-c100.nul
>> File: ‘/media/disk1/brick1/data/G/test59-bs10M-c100.nul’
>> Size: 262144000 Blocks: 512000 IO Block: 4096 regular file
>> Device: 811h/2065d Inode: 402653311 Links: 2
>> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
>> Access: 2016-09-21 18:34:43.722712751 +0530
>> Modify: 2016-09-21 18:32:41.650712946 +0530
>> Change: 2016-09-21 19:14:41.698708914 +0530
>> Birth: -
>>
>>
>> In other 2 bricks in same set, still signature is not updated for the same
>> file.
>>
>>
>> On Wed, Sep 21, 2016 at 6:48 PM, Amudhan P <amudhan83@xxxxxxxxx> wrote:
>>
>> > Hi Kotresh,
>> >
>> > I am very sure, No read was going on from mount point.
>> >
>> > Again i did same test but after writing data to mount point. I have
>> > unmounted mount point.
>> >
>> > after 120 seconds i am seeing this file fd entry in brick 1 pid
>> >
>> > getfattr -m. -e hex -d test59-bs10
>> > # file: test59-bs10M-c100.nul
>> > trusted.bit-rot.version=0x020000000000000057bed574000ed534
>> > trusted.ec.config=0x0000080501000200
>> > trusted.ec.size=0x000000003e800000
>> > trusted.ec.version=0x0000000000001f400000000000001f40
>> > trusted.gfid=0x4c091145429448468fffe358482c63e1
>> >
>> >
>> > ls -l /proc/2280/fd
>> > lr-x------ 1 root root 64 Sep 21 13:08 19 -> /media/disk1/brick1/.
>> > glusterfs/4c/09/4c091145-4294-4846-8fff-e358482c63e1
>> >
>> > Volume is a EC - 4+1
>> >
>> > On Wed, Sep 21, 2016 at 6:17 PM, Kotresh Hiremath Ravishankar <
>> > khiremat@xxxxxxxxxx> wrote:
>> >
>> >> Hi Amudhan,
>> >>
>> >> If you see the ls output, some process has a fd opened in the backend.
>> >> That is the reason bitrot is not considering for the signing.
>> >> Could you please observe, after 120 secs of closure of
>> >> "/media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435-
>> >> 85bf-f21f99fd8764"
>> >> the signing happens. If so we need to figure out who holds this fd for
>> >> such a long time.
>> >> And also we need to figure is this issue specific to EC volume.
>> >>
>> >> Thanks and Regards,
>> >> Kotresh H R
>> >>
>> >> ----- Original Message -----
>> >> > From: "Amudhan P" <amudhan83@xxxxxxxxx>
>> >> > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
>> >> > Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx>
>> >> > Sent: Wednesday, September 21, 2016 4:56:40 PM
>> >> > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process
>> >> >
>> >> > Hi Kotresh,
>> >> >
>> >> >
>> >> > Writing new file.
>> >> >
>> >> > getfattr -m. -e hex -d /media/disk2/brick2/data/G/test58-bs10M-c100.nul
>> >> > getfattr: Removing leading '/' from absolute path names
>> >> > # file: media/disk2/brick2/data/G/test58-bs10M-c100.nul
>> >> > trusted.bit-rot.version=0x020000000000000057da8b23000b120e
>> >> > trusted.ec.config=0x0000080501000200
>> >> > trusted.ec.size=0x000000003e800000
>> >> > trusted.ec.version=0x0000000000001f400000000000001f40
>> >> > trusted.gfid=0x6e7c49e6094e443585bff21f99fd8764
>> >> >
>> >> >
>> >> > Running ls -l in brick 2 pid
>> >> >
>> >> > ls -l /proc/30162/fd
>> >> >
>> >> > lr-x------ 1 root root 64 Sep 21 16:22 59 ->
>> >> > /media/disk2/brick2/.glusterfs/quanrantine
>> >> > lrwx------ 1 root root 64 Sep 21 16:22 6 ->
>> >> > /var/lib/glusterd/vols/glsvol1/run/10.1.2.2-media-disk2-brick2.pid
>> >> > lr-x------ 1 root root 64 Sep 21 16:25 60 ->
>> >> > /media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435-
>> >> 85bf-f21f99fd8764
>> >> > lr-x------ 1 root root 64 Sep 21 16:22 61 ->
>> >> > /media/disk2/brick2/.glusterfs/quanrantine
>> >> >
>> >> >
>> >> > find /media/disk2/ -samefile
>> >> > /media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435-
>> >> 85bf-f21f99fd8764
>> >> > /media/disk2/brick2/.glusterfs/6e/7c/6e7c49e6-094e-4435-
>> >> 85bf-f21f99fd8764
>> >> > /media/disk2/brick2/data/G/test58-bs10M-c100.nul
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Sep 21, 2016 at 3:28 PM, Kotresh Hiremath Ravishankar <
>> >> > khiremat@xxxxxxxxxx> wrote:
>> >> >
>> >> > > Hi Amudhan,
>> >> > >
>> >> > > Don't grep for the filename, glusterfs maintains hardlink in
>> >> .glusterfs
>> >> > > directory
>> >> > > for each file. Just check 'ls -l /proc/<respective brick pid>/fd' for
>> >> any
>> >> > > fds opened
>> >> > > for a file in .glusterfs and check if it's the same file.
>> >> > >
>> >> > > Thanks and Regards,
>> >> > > Kotresh H R
>> >> > >
>> >> > > ----- Original Message -----
>> >> > > > From: "Amudhan P" <amudhan83@xxxxxxxxx>
>> >> > > > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
>> >> > > > Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx>
>> >> > > > Sent: Wednesday, September 21, 2016 1:33:10 PM
>> >> > > > Subject: Re: 3.8.3 Bitrot signature process
>> >> > > >
>> >> > > > Hi Kotresh,
>> >> > > >
>> >> > > > i have used below command to verify any open fd for file.
>> >> > > >
>> >> > > > "ls -l /proc/*/fd | grep filename".
>> >> > > >
>> >> > > > as soon as write completes there no open fd's, if there is any
>> >> alternate
>> >> > > > option. please let me know will also try that.
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > Also, below is my scrub status in my test setup. number of skipped
>> >> files
>> >> > > > slow reducing day by day. I think files are skipped due to bitrot
>> >> > > signature
>> >> > > > process is not completed yet.
>> >> > > >
>> >> > > > where can i see scrub skipped files?
>> >> > > >
>> >> > > >
>> >> > > > Volume name : glsvol1
>> >> > > >
>> >> > > > State of scrub: Active (Idle)
>> >> > > >
>> >> > > > Scrub impact: normal
>> >> > > >
>> >> > > > Scrub frequency: daily
>> >> > > >
>> >> > > > Bitrot error log location: /var/log/glusterfs/bitd.log
>> >> > > >
>> >> > > > Scrubber error log location: /var/log/glusterfs/scrub.log
>> >> > > >
>> >> > > >
>> >> > > > =========================================================
>> >> > > >
>> >> > > > Node: localhost
>> >> > > >
>> >> > > > Number of Scrubbed files: 1644
>> >> > > >
>> >> > > > Number of Skipped files: 1001
>> >> > > >
>> >> > > > Last completed scrub time: 2016-09-20 11:59:58
>> >> > > >
>> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:39:26
>> >> > > >
>> >> > > > Error count: 0
>> >> > > >
>> >> > > >
>> >> > > > =========================================================
>> >> > > >
>> >> > > > Node: 10.1.2.3
>> >> > > >
>> >> > > > Number of Scrubbed files: 1644
>> >> > > >
>> >> > > > Number of Skipped files: 1001
>> >> > > >
>> >> > > > Last completed scrub time: 2016-09-20 10:50:00
>> >> > > >
>> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:38:17
>> >> > > >
>> >> > > > Error count: 0
>> >> > > >
>> >> > > >
>> >> > > > =========================================================
>> >> > > >
>> >> > > > Node: 10.1.2.4
>> >> > > >
>> >> > > > Number of Scrubbed files: 981
>> >> > > >
>> >> > > > Number of Skipped files: 1664
>> >> > > >
>> >> > > > Last completed scrub time: 2016-09-20 12:38:01
>> >> > > >
>> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:35:19
>> >> > > >
>> >> > > > Error count: 0
>> >> > > >
>> >> > > >
>> >> > > > =========================================================
>> >> > > >
>> >> > > > Node: 10.1.2.1
>> >> > > >
>> >> > > > Number of Scrubbed files: 1263
>> >> > > >
>> >> > > > Number of Skipped files: 1382
>> >> > > >
>> >> > > > Last completed scrub time: 2016-09-20 11:57:21
>> >> > > >
>> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:37:17
>> >> > > >
>> >> > > > Error count: 0
>> >> > > >
>> >> > > >
>> >> > > > =========================================================
>> >> > > >
>> >> > > > Node: 10.1.2.2
>> >> > > >
>> >> > > > Number of Scrubbed files: 1644
>> >> > > >
>> >> > > > Number of Skipped files: 1001
>> >> > > >
>> >> > > > Last completed scrub time: 2016-09-20 11:59:25
>> >> > > >
>> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:39:18
>> >> > > >
>> >> > > > Error count: 0
>> >> > > >
>> >> > > > =========================================================
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > Thanks
>> >> > > > Amudhan
>> >> > > >
>> >> > > >
>> >> > > > On Wed, Sep 21, 2016 at 11:45 AM, Kotresh Hiremath Ravishankar <
>> >> > > > khiremat@xxxxxxxxxx> wrote:
>> >> > > >
>> >> > > > > Hi Amudhan,
>> >> > > > >
>> >> > > > > I don't think it's the limitation with read data from the brick.
>> >> > > > > To limit the usage of CPU, throttling is done using token bucket
>> >> > > > > algorithm. The log message showed is related to it. But even then
>> >> > > > > I think it should not take 12 minutes for check-sum calculation
>> >> unless
>> >> > > > > there is an fd open (might be internal). Could you please cross
>> >> verify
>> >> > > > > if there are any fd opened on that file by looking into /proc? I
>> >> will
>> >> > > > > also test it out in the mean time and get back to you.
>> >> > > > >
>> >> > > > > Thanks and Regards,
>> >> > > > > Kotresh H R
>> >> > > > >
>> >> > > > > ----- Original Message -----
>> >> > > > > > From: "Amudhan P" <amudhan83@xxxxxxxxx>
>> >> > > > > > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
>> >> > > > > > Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx>
>> >> > > > > > Sent: Tuesday, September 20, 2016 3:19:28 PM
>> >> > > > > > Subject: Re: 3.8.3 Bitrot signature process
>> >> > > > > >
>> >> > > > > > Hi Kotresh,
>> >> > > > > >
>> >> > > > > > Please correct me if i am wrong, Once a file write completes
>> >> and as
>> >> > > soon
>> >> > > > > as
>> >> > > > > > closes fds, bitrot waits for 120 seconds and starts hashing and
>> >> > > update
>> >> > > > > > signature for the file in brick.
>> >> > > > > >
>> >> > > > > > But, what i am feeling that bitrot takes too much of time to
>> >> complete
>> >> > > > > > hashing.
>> >> > > > > >
>> >> > > > > > below is test result i would like to share.
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > writing data in below path using dd :
>> >> > > > > >
>> >> > > > > > /mnt/gluster/data/G (mount point)
>> >> > > > > > -rw-r--r-- 1 root root 10M Sep 20 12:19 test53-bs10M-c1.nul
>> >> > > > > > -rw-r--r-- 1 root root 100M Sep 20 12:19 test54-bs10M-c10.nul
>> >> > > > > >
>> >> > > > > > No any other write or read process is going on.
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > Checking file data in one of the brick.
>> >> > > > > >
>> >> > > > > > -rw-r--r-- 2 root root 2.5M Sep 20 12:23 test53-bs10M-c1.nul
>> >> > > > > > -rw-r--r-- 2 root root 25M Sep 20 12:23 test54-bs10M-c10.nul
>> >> > > > > >
>> >> > > > > > file's stat and getfattr info from brick, after write process
>> >> > > completed.
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat
>> >> test53-bs10M-c1.nul
>> >> > > > > > File: ‘test53-bs10M-c1.nul’
>> >> > > > > > Size: 2621440 Blocks: 5120 IO Block: 4096
>> >> regular
>> >> > > file
>> >> > > > > > Device: 821h/2081d Inode: 536874168 Links: 2
>> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
>> >> > > root)
>> >> > > > > > Access: 2016-09-20 12:23:28.798886647 +0530
>> >> > > > > > Modify: 2016-09-20 12:23:28.994886646 +0530
>> >> > > > > > Change: 2016-09-20 12:23:28.998886646 +0530
>> >> > > > > > Birth: -
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat
>> >> test54-bs10M-c10.nul
>> >> > > > > > File: ‘test54-bs10M-c10.nul’
>> >> > > > > > Size: 26214400 Blocks: 51200 IO Block: 4096
>> >> regular
>> >> > > file
>> >> > > > > > Device: 821h/2081d Inode: 536874169 Links: 2
>> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
>> >> > > root)
>> >> > > > > > Access: 2016-09-20 12:23:42.902886624 +0530
>> >> > > > > > Modify: 2016-09-20 12:23:44.378886622 +0530
>> >> > > > > > Change: 2016-09-20 12:23:44.378886622 +0530
>> >> > > > > > Birth: -
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo getfattr -m. -e
>> >> hex -d
>> >> > > > > > test53-bs10M-c1.nul
>> >> > > > > > # file: test53-bs10M-c1.nul
>> >> > > > > > trusted.bit-rot.version=0x020000000000000057daa7b50002e5b4
>> >> > > > > > trusted.ec.config=0x0000080501000200
>> >> > > > > > trusted.ec.size=0x0000000000a00000
>> >> > > > > > trusted.ec.version=0x00000000000000500000000000000050
>> >> > > > > > trusted.gfid=0xe2416bd1aae4403c88f44286273bbe99
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo getfattr -m. -e
>> >> hex -d
>> >> > > > > > test54-bs10M-c10.nul
>> >> > > > > > # file: test54-bs10M-c10.nul
>> >> > > > > > trusted.bit-rot.version=0x020000000000000057daa7b50002e5b4
>> >> > > > > > trusted.ec.config=0x0000080501000200
>> >> > > > > > trusted.ec.size=0x0000000006400000
>> >> > > > > > trusted.ec.version=0x00000000000003200000000000000320
>> >> > > > > > trusted.gfid=0x54e018dd8c5a4bd79e0317729d8a57c5
>> >> > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > file's stat and getfattr info from brick, after bitrot signature
>> >> > > updated.
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat
>> >> test53-bs10M-c1.nul
>> >> > > > > > File: ‘test53-bs10M-c1.nul’
>> >> > > > > > Size: 2621440 Blocks: 5120 IO Block: 4096
>> >> regular
>> >> > > file
>> >> > > > > > Device: 821h/2081d Inode: 536874168 Links: 2
>> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
>> >> > > root)
>> >> > > > > > Access: 2016-09-20 12:25:31.494886450 +0530
>> >> > > > > > Modify: 2016-09-20 12:23:28.994886646 +0530
>> >> > > > > > Change: 2016-09-20 12:27:00.994886307 +0530
>> >> > > > > > Birth: -
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo getfattr -m. -e
>> >> hex -d
>> >> > > > > > test53-bs10M-c1.nul
>> >> > > > > > # file: test53-bs10M-c1.nul
>> >> > > > > > trusted.bit-rot.signature=0x0102000000000000006de7493c5c
>> >> > > > > 90f643357c268fbaaf461c1567e0334e4948023ce17268403aa37a
>> >> > > > > > trusted.bit-rot.version=0x020000000000000057daa7b50002e5b4
>> >> > > > > > trusted.ec.config=0x0000080501000200
>> >> > > > > > trusted.ec.size=0x0000000000a00000
>> >> > > > > > trusted.ec.version=0x00000000000000500000000000000050
>> >> > > > > > trusted.gfid=0xe2416bd1aae4403c88f44286273bbe99
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat
>> >> test54-bs10M-c10.nul
>> >> > > > > > File: ‘test54-bs10M-c10.nul’
>> >> > > > > > Size: 26214400 Blocks: 51200 IO Block: 4096
>> >> regular
>> >> > > file
>> >> > > > > > Device: 821h/2081d Inode: 536874169 Links: 2
>> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
>> >> > > root)
>> >> > > > > > Access: 2016-09-20 12:25:47.510886425 +0530
>> >> > > > > > Modify: 2016-09-20 12:23:44.378886622 +0530
>> >> > > > > > Change: 2016-09-20 12:38:05.954885243 +0530
>> >> > > > > > Birth: -
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo getfattr -m. -e
>> >> hex -d
>> >> > > > > > test54-bs10M-c10.nul
>> >> > > > > > # file: test54-bs10M-c10.nul
>> >> > > > > > trusted.bit-rot.signature=0x010200000000000000394c345f0b
>> >> > > > > 0c63ee652627a62eed069244d35c4d5134e4f07d4eabb51afda47e
>> >> > > > > > trusted.bit-rot.version=0x020000000000000057daa7b50002e5b4
>> >> > > > > > trusted.ec.config=0x0000080501000200
>> >> > > > > > trusted.ec.size=0x0000000006400000
>> >> > > > > > trusted.ec.version=0x00000000000003200000000000000320
>> >> > > > > > trusted.gfid=0x54e018dd8c5a4bd79e0317729d8a57c5
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > (Actual time taken for reading file from brick for md5sum)
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ time md5sum
>> >> > > test53-bs10M-c1.nul
>> >> > > > > > 8354dcaa18a1ecb52d0895bf00888c44 test53-bs10M-c1.nul
>> >> > > > > >
>> >> > > > > > real 0m0.045s
>> >> > > > > > user 0m0.007s
>> >> > > > > > sys 0m0.003s
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/brick2/data/G$ time md5sum
>> >> > > > > test54-bs10M-c10.nul
>> >> > > > > > bed3c0a4a1407f584989b4009e9ce33f test54-bs10M-c10.nul
>> >> > > > > >
>> >> > > > > > real 0m0.166s
>> >> > > > > > user 0m0.062s
>> >> > > > > > sys 0m0.011s
>> >> > > > > >
>> >> > > > > > As you can see that 'test54-bs10M-c10.nul' file took around 12
>> >> > > minutes to
>> >> > > > > > update bitort signature (pls refer stat output for the file).
>> >> > > > > >
>> >> > > > > > what would be the cause for such a slow read?. Any limitation
>> >> in read
>> >> > > > > data
>> >> > > > > > from brick?
>> >> > > > > >
>> >> > > > > > Also, i am seeing this line bitd.log, what does this mean?
>> >> > > > > > [bit-rot.c:1784:br_rate_limit_signer] 0-glsvol1-bit-rot-0:
>> >> [Rate
>> >> > > Limit
>> >> > > > > > Info] "tokens/sec (rate): 131072, maxlimit: 524288
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > Thanks
>> >> > > > > > Amudhan P
>> >> > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > On Mon, Sep 19, 2016 at 1:00 PM, Kotresh Hiremath Ravishankar <
>> >> > > > > > khiremat@xxxxxxxxxx> wrote:
>> >> > > > > >
>> >> > > > > > > Hi Amudhan,
>> >> > > > > > >
>> >> > > > > > > Thanks for testing out the bitrot feature and sorry for the
>> >> delayed
>> >> > > > > > > response.
>> >> > > > > > > Please find the answers inline.
>> >> > > > > > >
>> >> > > > > > > Thanks and Regards,
>> >> > > > > > > Kotresh H R
>> >> > > > > > >
>> >> > > > > > > ----- Original Message -----
>> >> > > > > > > > From: "Amudhan P" <amudhan83@xxxxxxxxx>
>> >> > > > > > > > To: "Gluster Users" <gluster-users@xxxxxxxxxxx>
>> >> > > > > > > > Sent: Friday, September 16, 2016 4:14:10 PM
>> >> > > > > > > > Subject: Re: 3.8.3 Bitrot signature process
>> >> > > > > > > >
>> >> > > > > > > > Hi,
>> >> > > > > > > >
>> >> > > > > > > > Can anyone reply to this mail.
>> >> > > > > > > >
>> >> > > > > > > > On Tue, Sep 13, 2016 at 12:49 PM, Amudhan P <
>> >> > > amudhan83@xxxxxxxxx >
>> >> > > > > > > wrote:
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > Hi,
>> >> > > > > > > >
>> >> > > > > > > > I am testing bitrot feature in Gluster 3.8.3 with disperse
>> >> EC
>> >> > > volume
>> >> > > > > 4+1.
>> >> > > > > > > >
>> >> > > > > > > > When i write single small file (< 10MB) after 2 seconds i
>> >> can see
>> >> > > > > bitrot
>> >> > > > > > > > signature in bricks for the file, but when i write multiple
>> >> files
>> >> > > > > with
>> >> > > > > > > > different size ( > 10MB) it takes long time (> 24hrs) to see
>> >> > > bitrot
>> >> > > > > > > > signature in all the files.
>> >> > > > > > >
>> >> > > > > > > The default timeout for signing to happen is 120 seconds.
>> >> So the
>> >> > > > > > > signing will happen
>> >> > > > > > > 120 secs after the last fd gets closed on that file. So if
>> >> the
>> >> > > file
>> >> > > > > is
>> >> > > > > > > being written
>> >> > > > > > > continuously, it will not be signed until 120 secs after
>> >> it's
>> >> > > last
>> >> > > > > fd is
>> >> > > > > > > closed.
>> >> > > > > > > >
>> >> > > > > > > > My questions are.
>> >> > > > > > > > 1. I have enabled scrub schedule as hourly and throttle as
>> >> > > normal,
>> >> > > > > does
>> >> > > > > > > this
>> >> > > > > > > > make any impact in delaying bitrot signature?
>> >> > > > > > > No.
>> >> > > > > > > > 2. other than "bitd.log" where else i can watch current
>> >> status of
>> >> > > > > bitrot,
>> >> > > > > > > > like number of files added for signature and file status?
>> >> > > > > > > Signature will happen after 120 sec of last fd closure,
>> >> as
>> >> > > said
>> >> > > > > above.
>> >> > > > > > > There is not status command which tracks the signature
>> >> of the
>> >> > > > > files.
>> >> > > > > > > But there is bitrot status command which tracks the
>> >> number of
>> >> > > > > files
>> >> > > > > > > scrubbed.
>> >> > > > > > >
>> >> > > > > > > #gluster vol bitrot <volname> scrub status
>> >> > > > > > >
>> >> > > > > > >
>> >> > > > > > > > 3. where i can confirm that all the files in the brick are
>> >> bitrot
>> >> > > > > signed?
>> >> > > > > > >
>> >> > > > > > > As said, signing information of all the files is not
>> >> tracked.
>> >> > > > > > >
>> >> > > > > > > > 4. is there any file read size limit in bitrot?
>> >> > > > > > >
>> >> > > > > > > I didn't get. Could you please elaborate this ?
>> >> > > > > > >
>> >> > > > > > > > 5. options for tuning bitrot for faster signing of files?
>> >> > > > > > >
>> >> > > > > > > Bitrot feature is mainly to detect silent corruption
>> >> > > (bitflips) of
>> >> > > > > > > files due to long
>> >> > > > > > > term storage. Hence the default is 120 sec of last fd
>> >> > > closure, the
>> >> > > > > > > signing happens.
>> >> > > > > > > But there is a tune able which can change the default
>> >> 120 sec
>> >> > > but
>> >> > > > > > > that's only for
>> >> > > > > > > testing purposes and we don't recommend it.
>> >> > > > > > >
>> >> > > > > > > gluster vol get master features.expiry-time
>> >> > > > > > >
>> >> > > > > > > For testing purposes, you can change this default and
>> >> test.
>> >> > > > > > > >
>> >> > > > > > > > Thanks
>> >> > > > > > > > Amudhan
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > _______________________________________________
>> >> > > > > > > > Gluster-users mailing list
>> >> > > > > > > > Gluster-users@xxxxxxxxxxx
>> >> > > > > > > > http://www.gluster.org/mailman/listinfo/gluster-users
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
2280 is a brick process, i have not tried with dist-rep volume?
I have not seen any fd in bitd process in any of the node's and bitd process usage always 0% CPU and randomly it goes 0.3% CPU.
Thanks,
Amudhan
On Thursday, September 22, 2016, Kotresh Hiremath Ravishankar <khiremat@xxxxxxxxxx> wrote:
> Hi Amudhan,
>
> No, bitrot signer is a different process by itself and is not part of brick process.
> I believe the process 2280 is a brick process ? Did you check with dist-rep volume?
> Is the same behavior being observed there as well? We need to figure out why brick
> process is holding that fd for such a long time.
>
> Thanks and Regards,
> Kotresh H R
>
> ----- Original Message -----
>> From: "Amudhan P" <amudhan83@xxxxxxxxx>
>> To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
>> Sent: Wednesday, September 21, 2016 8:15:33 PM
>> Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process
>>
>> Hi Kotresh,
>>
>> As soon as fd closes from brick1 pid, i can see bitrot signature for the
>> file in brick.
>>
>> So, it looks like fd opened by brick process to calculate signature.
>>
>> output of the file:
>>
>> -rw-r--r-- 2 root root 250M Sep 21 18:32
>> /media/disk1/brick1/data/G/
>>
>> getfattr: Removing leading '/' from absolute path names
>> # file: media/disk1/brick1/data/G/
>> trusted.bit-rot.signature=
>> trusted.bit-rot.version=
>> trusted.ec.config=
>> trusted.ec.size=
>> trusted.ec.version=
>> trusted.gfid=
>>
>> stat /media/disk1/brick1/data/G/
>> File: ‘/media/disk1/brick1/data/G/
>> Size: 262144000 Blocks: 512000 IO Block: 4096 regular file
>> Device: 811h/2065d Inode: 402653311 Links: 2
>> Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)
>> Access: 2016-09-21 18:34:43.722712751 +0530
>> Modify: 2016-09-21 18:32:41.650712946 +0530
>> Change: 2016-09-21 19:14:41.698708914 +0530
>> Birth: -
>>
>>
>> In other 2 bricks in same set, still signature is not updated for the same
>> file.
>>
>>
>> On Wed, Sep 21, 2016 at 6:48 PM, Amudhan P <amudhan83@xxxxxxxxx> wrote:
>>
>> > Hi Kotresh,
>> >
>> > I am very sure, No read was going on from mount point.
>> >
>> > Again i did same test but after writing data to mount point. I have
>> > unmounted mount point.
>> >
>> > after 120 seconds i am seeing this file fd entry in brick 1 pid
>> >
>> > getfattr -m. -e hex -d test59-bs10
>> > # file: test59-bs10M-c100.nul
>> > trusted.bit-rot.version=
>> > trusted.ec.config=
>> > trusted.ec.size=
>> > trusted.ec.version=
>> > trusted.gfid=
>> >
>> >
>> > ls -l /proc/2280/fd
>> > lr-x------ 1 root root 64 Sep 21 13:08 19 -> /media/disk1/brick1/.
>> > glusterfs/4c/09/4c091145-4294-
>> >
>> > Volume is a EC - 4+1
>> >
>> > On Wed, Sep 21, 2016 at 6:17 PM, Kotresh Hiremath Ravishankar <
>> > khiremat@xxxxxxxxxx> wrote:
>> >
>> >> Hi Amudhan,
>> >>
>> >> If you see the ls output, some process has a fd opened in the backend.
>> >> That is the reason bitrot is not considering for the signing.
>> >> Could you please observe, after 120 secs of closure of
>> >> "/media/disk2/brick2/.
>> >> 85bf-f21f99fd8764"
>> >> the signing happens. If so we need to figure out who holds this fd for
>> >> such a long time.
>> >> And also we need to figure is this issue specific to EC volume.
>> >>
>> >> Thanks and Regards,
>> >> Kotresh H R
>> >>
>> >> ----- Original Message -----
>> >> > From: "Amudhan P" <amudhan83@xxxxxxxxx>
>> >> > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
>> >> > Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx>
>> >> > Sent: Wednesday, September 21, 2016 4:56:40 PM
>> >> > Subject: Re: [Gluster-users] 3.8.3 Bitrot signature process
>> >> >
>> >> > Hi Kotresh,
>> >> >
>> >> >
>> >> > Writing new file.
>> >> >
>> >> > getfattr -m. -e hex -d /media/disk2/brick2/data/G/
>> >> > getfattr: Removing leading '/' from absolute path names
>> >> > # file: media/disk2/brick2/data/G/
>> >> > trusted.bit-rot.version=
>> >> > trusted.ec.config=
>> >> > trusted.ec.size=
>> >> > trusted.ec.version=
>> >> > trusted.gfid=
>> >> >
>> >> >
>> >> > Running ls -l in brick 2 pid
>> >> >
>> >> > ls -l /proc/30162/fd
>> >> >
>> >> > lr-x------ 1 root root 64 Sep 21 16:22 59 ->
>> >> > /media/disk2/brick2/.
>> >> > lrwx------ 1 root root 64 Sep 21 16:22 6 ->
>> >> > /var/lib/glusterd/vols/
>> >> > lr-x------ 1 root root 64 Sep 21 16:25 60 ->
>> >> > /media/disk2/brick2/.
>> >> 85bf-f21f99fd8764
>> >> > lr-x------ 1 root root 64 Sep 21 16:22 61 ->
>> >> > /media/disk2/brick2/.
>> >> >
>> >> >
>> >> > find /media/disk2/ -samefile
>> >> > /media/disk2/brick2/.
>> >> 85bf-f21f99fd8764
>> >> > /media/disk2/brick2/.
>> >> 85bf-f21f99fd8764
>> >> > /media/disk2/brick2/data/G/
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Sep 21, 2016 at 3:28 PM, Kotresh Hiremath Ravishankar <
>> >> > khiremat@xxxxxxxxxx> wrote:
>> >> >
>> >> > > Hi Amudhan,
>> >> > >
>> >> > > Don't grep for the filename, glusterfs maintains hardlink in
>> >> .glusterfs
>> >> > > directory
>> >> > > for each file. Just check 'ls -l /proc/<respective brick pid>/fd' for
>> >> any
>> >> > > fds opened
>> >> > > for a file in .glusterfs and check if it's the same file.
>> >> > >
>> >> > > Thanks and Regards,
>> >> > > Kotresh H R
>> >> > >
>> >> > > ----- Original Message -----
>> >> > > > From: "Amudhan P" <amudhan83@xxxxxxxxx>
>> >> > > > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
>> >> > > > Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx>
>> >> > > > Sent: Wednesday, September 21, 2016 1:33:10 PM
>> >> > > > Subject: Re: 3.8.3 Bitrot signature process
>> >> > > >
>> >> > > > Hi Kotresh,
>> >> > > >
>> >> > > > i have used below command to verify any open fd for file.
>> >> > > >
>> >> > > > "ls -l /proc/*/fd | grep filename".
>> >> > > >
>> >> > > > as soon as write completes there no open fd's, if there is any
>> >> alternate
>> >> > > > option. please let me know will also try that.
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > Also, below is my scrub status in my test setup. number of skipped
>> >> files
>> >> > > > slow reducing day by day. I think files are skipped due to bitrot
>> >> > > signature
>> >> > > > process is not completed yet.
>> >> > > >
>> >> > > > where can i see scrub skipped files?
>> >> > > >
>> >> > > >
>> >> > > > Volume name : glsvol1
>> >> > > >
>> >> > > > State of scrub: Active (Idle)
>> >> > > >
>> >> > > > Scrub impact: normal
>> >> > > >
>> >> > > > Scrub frequency: daily
>> >> > > >
>> >> > > > Bitrot error log location: /var/log/glusterfs/bitd.log
>> >> > > >
>> >> > > > Scrubber error log location: /var/log/glusterfs/scrub.log
>> >> > > >
>> >> > > >
>> >> > > > ==============================
>> >> > > >
>> >> > > > Node: localhost
>> >> > > >
>> >> > > > Number of Scrubbed files: 1644
>> >> > > >
>> >> > > > Number of Skipped files: 1001
>> >> > > >
>> >> > > > Last completed scrub time: 2016-09-20 11:59:58
>> >> > > >
>> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:39:26
>> >> > > >
>> >> > > > Error count: 0
>> >> > > >
>> >> > > >
>> >> > > > ==============================
>> >> > > >
>> >> > > > Node: 10.1.2.3
>> >> > > >
>> >> > > > Number of Scrubbed files: 1644
>> >> > > >
>> >> > > > Number of Skipped files: 1001
>> >> > > >
>> >> > > > Last completed scrub time: 2016-09-20 10:50:00
>> >> > > >
>> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:38:17
>> >> > > >
>> >> > > > Error count: 0
>> >> > > >
>> >> > > >
>> >> > > > ==============================
>> >> > > >
>> >> > > > Node: 10.1.2.4
>> >> > > >
>> >> > > > Number of Scrubbed files: 981
>> >> > > >
>> >> > > > Number of Skipped files: 1664
>> >> > > >
>> >> > > > Last completed scrub time: 2016-09-20 12:38:01
>> >> > > >
>> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:35:19
>> >> > > >
>> >> > > > Error count: 0
>> >> > > >
>> >> > > >
>> >> > > > ==============================
>> >> > > >
>> >> > > > Node: 10.1.2.1
>> >> > > >
>> >> > > > Number of Scrubbed files: 1263
>> >> > > >
>> >> > > > Number of Skipped files: 1382
>> >> > > >
>> >> > > > Last completed scrub time: 2016-09-20 11:57:21
>> >> > > >
>> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:37:17
>> >> > > >
>> >> > > > Error count: 0
>> >> > > >
>> >> > > >
>> >> > > > ==============================
>> >> > > >
>> >> > > > Node: 10.1.2.2
>> >> > > >
>> >> > > > Number of Scrubbed files: 1644
>> >> > > >
>> >> > > > Number of Skipped files: 1001
>> >> > > >
>> >> > > > Last completed scrub time: 2016-09-20 11:59:25
>> >> > > >
>> >> > > > Duration of last scrub (D:M:H:M:S): 0:0:39:18
>> >> > > >
>> >> > > > Error count: 0
>> >> > > >
>> >> > > > ==============================
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > >
>> >> > > > Thanks
>> >> > > > Amudhan
>> >> > > >
>> >> > > >
>> >> > > > On Wed, Sep 21, 2016 at 11:45 AM, Kotresh Hiremath Ravishankar <
>> >> > > > khiremat@xxxxxxxxxx> wrote:
>> >> > > >
>> >> > > > > Hi Amudhan,
>> >> > > > >
>> >> > > > > I don't think it's the limitation with read data from the brick.
>> >> > > > > To limit the usage of CPU, throttling is done using token bucket
>> >> > > > > algorithm. The log message showed is related to it. But even then
>> >> > > > > I think it should not take 12 minutes for check-sum calculation
>> >> unless
>> >> > > > > there is an fd open (might be internal). Could you please cross
>> >> verify
>> >> > > > > if there are any fd opened on that file by looking into /proc? I
>> >> will
>> >> > > > > also test it out in the mean time and get back to you.
>> >> > > > >
>> >> > > > > Thanks and Regards,
>> >> > > > > Kotresh H R
>> >> > > > >
>> >> > > > > ----- Original Message -----
>> >> > > > > > From: "Amudhan P" <amudhan83@xxxxxxxxx>
>> >> > > > > > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx>
>> >> > > > > > Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx>
>> >> > > > > > Sent: Tuesday, September 20, 2016 3:19:28 PM
>> >> > > > > > Subject: Re: 3.8.3 Bitrot signature process
>> >> > > > > >
>> >> > > > > > Hi Kotresh,
>> >> > > > > >
>> >> > > > > > Please correct me if i am wrong, Once a file write completes
>> >> and as
>> >> > > soon
>> >> > > > > as
>> >> > > > > > closes fds, bitrot waits for 120 seconds and starts hashing and
>> >> > > update
>> >> > > > > > signature for the file in brick.
>> >> > > > > >
>> >> > > > > > But, what i am feeling that bitrot takes too much of time to
>> >> complete
>> >> > > > > > hashing.
>> >> > > > > >
>> >> > > > > > below is test result i would like to share.
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > writing data in below path using dd :
>> >> > > > > >
>> >> > > > > > /mnt/gluster/data/G (mount point)
>> >> > > > > > -rw-r--r-- 1 root root 10M Sep 20 12:19 test53-bs10M-c1.nul
>> >> > > > > > -rw-r--r-- 1 root root 100M Sep 20 12:19 test54-bs10M-c10.nul
>> >> > > > > >
>> >> > > > > > No any other write or read process is going on.
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > Checking file data in one of the brick.
>> >> > > > > >
>> >> > > > > > -rw-r--r-- 2 root root 2.5M Sep 20 12:23 test53-bs10M-c1.nul
>> >> > > > > > -rw-r--r-- 2 root root 25M Sep 20 12:23 test54-bs10M-c10.nul
>> >> > > > > >
>> >> > > > > > file's stat and getfattr info from brick, after write process
>> >> > > completed.
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/
>> >> test53-bs10M-c1.nul
>> >> > > > > > File: ‘test53-bs10M-c1.nul’
>> >> > > > > > Size: 2621440 Blocks: 5120 IO Block: 4096
>> >> regular
>> >> > > file
>> >> > > > > > Device: 821h/2081d Inode: 536874168 Links: 2
>> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
>> >> > > root)
>> >> > > > > > Access: 2016-09-20 12:23:28.798886647 +0530
>> >> > > > > > Modify: 2016-09-20 12:23:28.994886646 +0530
>> >> > > > > > Change: 2016-09-20 12:23:28.998886646 +0530
>> >> > > > > > Birth: -
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/
>> >> test54-bs10M-c10.nul
>> >> > > > > > File: ‘test54-bs10M-c10.nul’
>> >> > > > > > Size: 26214400 Blocks: 51200 IO Block: 4096
>> >> regular
>> >> > > file
>> >> > > > > > Device: 821h/2081d Inode: 536874169 Links: 2
>> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
>> >> > > root)
>> >> > > > > > Access: 2016-09-20 12:23:42.902886624 +0530
>> >> > > > > > Modify: 2016-09-20 12:23:44.378886622 +0530
>> >> > > > > > Change: 2016-09-20 12:23:44.378886622 +0530
>> >> > > > > > Birth: -
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/
>> >> hex -d
>> >> > > > > > test53-bs10M-c1.nul
>> >> > > > > > # file: test53-bs10M-c1.nul
>> >> > > > > > trusted.bit-rot.version=
>> >> > > > > > trusted.ec.config=
>> >> > > > > > trusted.ec.size=
>> >> > > > > > trusted.ec.version=
>> >> > > > > > trusted.gfid=
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/
>> >> hex -d
>> >> > > > > > test54-bs10M-c10.nul
>> >> > > > > > # file: test54-bs10M-c10.nul
>> >> > > > > > trusted.bit-rot.version=
>> >> > > > > > trusted.ec.config=
>> >> > > > > > trusted.ec.size=
>> >> > > > > > trusted.ec.version=
>> >> > > > > > trusted.gfid=
>> >> > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > file's stat and getfattr info from brick, after bitrot signature
>> >> > > updated.
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/
>> >> test53-bs10M-c1.nul
>> >> > > > > > File: ‘test53-bs10M-c1.nul’
>> >> > > > > > Size: 2621440 Blocks: 5120 IO Block: 4096
>> >> regular
>> >> > > file
>> >> > > > > > Device: 821h/2081d Inode: 536874168 Links: 2
>> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
>> >> > > root)
>> >> > > > > > Access: 2016-09-20 12:25:31.494886450 +0530
>> >> > > > > > Modify: 2016-09-20 12:23:28.994886646 +0530
>> >> > > > > > Change: 2016-09-20 12:27:00.994886307 +0530
>> >> > > > > > Birth: -
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/
>> >> hex -d
>> >> > > > > > test53-bs10M-c1.nul
>> >> > > > > > # file: test53-bs10M-c1.nul
>> >> > > > > > trusted.bit-rot.signature=
>> >> > > > > 90f643357c268fbaaf461c1567e033
>> >> > > > > > trusted.bit-rot.version=
>> >> > > > > > trusted.ec.config=
>> >> > > > > > trusted.ec.size=
>> >> > > > > > trusted.ec.version=
>> >> > > > > > trusted.gfid=
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/
>> >> test54-bs10M-c10.nul
>> >> > > > > > File: ‘test54-bs10M-c10.nul’
>> >> > > > > > Size: 26214400 Blocks: 51200 IO Block: 4096
>> >> regular
>> >> > > file
>> >> > > > > > Device: 821h/2081d Inode: 536874169 Links: 2
>> >> > > > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/
>> >> > > root)
>> >> > > > > > Access: 2016-09-20 12:25:47.510886425 +0530
>> >> > > > > > Modify: 2016-09-20 12:23:44.378886622 +0530
>> >> > > > > > Change: 2016-09-20 12:38:05.954885243 +0530
>> >> > > > > > Birth: -
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/
>> >> hex -d
>> >> > > > > > test54-bs10M-c10.nul
>> >> > > > > > # file: test54-bs10M-c10.nul
>> >> > > > > > trusted.bit-rot.signature=
>> >> > > > > 0c63ee652627a62eed069244d35c4d
>> >> > > > > > trusted.bit-rot.version=
>> >> > > > > > trusted.ec.config=
>> >> > > > > > trusted.ec.size=
>> >> > > > > > trusted.ec.version=
>> >> > > > > > trusted.gfid=
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > (Actual time taken for reading file from brick for md5sum)
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/
>> >> > > test53-bs10M-c1.nul
>> >> > > > > > 8354dcaa18a1ecb52d0895bf00888c
>> >> > > > > >
>> >> > > > > > real 0m0.045s
>> >> > > > > > user 0m0.007s
>> >> > > > > > sys 0m0.003s
>> >> > > > > >
>> >> > > > > > gfstst-node5:/media/disk2/
>> >> > > > > test54-bs10M-c10.nul
>> >> > > > > > bed3c0a4a1407f584989b4009e9ce3
>> >> > > > > >
>> >> > > > > > real 0m0.166s
>> >> > > > > > user 0m0.062s
>> >> > > > > > sys 0m0.011s
>> >> > > > > >
>> >> > > > > > As you can see that 'test54-bs10M-c10.nul' file took around 12
>> >> > > minutes to
>> >> > > > > > update bitort signature (pls refer stat output for the file).
>> >> > > > > >
>> >> > > > > > what would be the cause for such a slow read?. Any limitation
>> >> in read
>> >> > > > > data
>> >> > > > > > from brick?
>> >> > > > > >
>> >> > > > > > Also, i am seeing this line bitd.log, what does this mean?
>> >> > > > > > [bit-rot.c:1784:br_rate_limit_
>> >> [Rate
>> >> > > Limit
>> >> > > > > > Info] "tokens/sec (rate): 131072, maxlimit: 524288
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > Thanks
>> >> > > > > > Amudhan P
>> >> > > > > >
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > On Mon, Sep 19, 2016 at 1:00 PM, Kotresh Hiremath Ravishankar <
>> >> > > > > > khiremat@xxxxxxxxxx> wrote:
>> >> > > > > >
>> >> > > > > > > Hi Amudhan,
>> >> > > > > > >
>> >> > > > > > > Thanks for testing out the bitrot feature and sorry for the
>> >> delayed
>> >> > > > > > > response.
>> >> > > > > > > Please find the answers inline.
>> >> > > > > > >
>> >> > > > > > > Thanks and Regards,
>> >> > > > > > > Kotresh H R
>> >> > > > > > >
>> >> > > > > > > ----- Original Message -----
>> >> > > > > > > > From: "Amudhan P" <amudhan83@xxxxxxxxx>
>> >> > > > > > > > To: "Gluster Users" <gluster-users@xxxxxxxxxxx>
>> >> > > > > > > > Sent: Friday, September 16, 2016 4:14:10 PM
>> >> > > > > > > > Subject: Re: 3.8.3 Bitrot signature process
>> >> > > > > > > >
>> >> > > > > > > > Hi,
>> >> > > > > > > >
>> >> > > > > > > > Can anyone reply to this mail.
>> >> > > > > > > >
>> >> > > > > > > > On Tue, Sep 13, 2016 at 12:49 PM, Amudhan P <
>> >> > > amudhan83@xxxxxxxxx >
>> >> > > > > > > wrote:
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > Hi,
>> >> > > > > > > >
>> >> > > > > > > > I am testing bitrot feature in Gluster 3.8.3 with disperse
>> >> EC
>> >> > > volume
>> >> > > > > 4+1.
>> >> > > > > > > >
>> >> > > > > > > > When i write single small file (< 10MB) after 2 seconds i
>> >> can see
>> >> > > > > bitrot
>> >> > > > > > > > signature in bricks for the file, but when i write multiple
>> >> files
>> >> > > > > with
>> >> > > > > > > > different size ( > 10MB) it takes long time (> 24hrs) to see
>> >> > > bitrot
>> >> > > > > > > > signature in all the files.
>> >> > > > > > >
>> >> > > > > > > The default timeout for signing to happen is 120 seconds.
>> >> So the
>> >> > > > > > > signing will happen
>> >> > > > > > > 120 secs after the last fd gets closed on that file. So if
>> >> the
>> >> > > file
>> >> > > > > is
>> >> > > > > > > being written
>> >> > > > > > > continuously, it will not be signed until 120 secs after
>> >> it's
>> >> > > last
>> >> > > > > fd is
>> >> > > > > > > closed.
>> >> > > > > > > >
>> >> > > > > > > > My questions are.
>> >> > > > > > > > 1. I have enabled scrub schedule as hourly and throttle as
>> >> > > normal,
>> >> > > > > does
>> >> > > > > > > this
>> >> > > > > > > > make any impact in delaying bitrot signature?
>> >> > > > > > > No.
>> >> > > > > > > > 2. other than "bitd.log" where else i can watch current
>> >> status of
>> >> > > > > bitrot,
>> >> > > > > > > > like number of files added for signature and file status?
>> >> > > > > > > Signature will happen after 120 sec of last fd closure,
>> >> as
>> >> > > said
>> >> > > > > above.
>> >> > > > > > > There is not status command which tracks the signature
>> >> of the
>> >> > > > > files.
>> >> > > > > > > But there is bitrot status command which tracks the
>> >> number of
>> >> > > > > files
>> >> > > > > > > scrubbed.
>> >> > > > > > >
>> >> > > > > > > #gluster vol bitrot <volname> scrub status
>> >> > > > > > >
>> >> > > > > > >
>> >> > > > > > > > 3. where i can confirm that all the files in the brick are
>> >> bitrot
>> >> > > > > signed?
>> >> > > > > > >
>> >> > > > > > > As said, signing information of all the files is not
>> >> tracked.
>> >> > > > > > >
>> >> > > > > > > > 4. is there any file read size limit in bitrot?
>> >> > > > > > >
>> >> > > > > > > I didn't get. Could you please elaborate this ?
>> >> > > > > > >
>> >> > > > > > > > 5. options for tuning bitrot for faster signing of files?
>> >> > > > > > >
>> >> > > > > > > Bitrot feature is mainly to detect silent corruption
>> >> > > (bitflips) of
>> >> > > > > > > files due to long
>> >> > > > > > > term storage. Hence the default is 120 sec of last fd
>> >> > > closure, the
>> >> > > > > > > signing happens.
>> >> > > > > > > But there is a tune able which can change the default
>> >> 120 sec
>> >> > > but
>> >> > > > > > > that's only for
>> >> > > > > > > testing purposes and we don't recommend it.
>> >> > > > > > >
>> >> > > > > > > gluster vol get master features.expiry-time
>> >> > > > > > >
>> >> > > > > > > For testing purposes, you can change this default and
>> >> test.
>> >> > > > > > > >
>> >> > > > > > > > Thanks
>> >> > > > > > > > Amudhan
>> >> > > > > > > >
>> >> > > > > > > >
>> >> > > > > > > > ______________________________
>> >> > > > > > > > Gluster-users mailing list
>> >> > > > > > > > Gluster-users@xxxxxxxxxxx
>> >> > > > > > > > http://www.gluster.org/
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>> >
>> >
>>
>
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users