Hi Amudhan, Don't grep for the filename, glusterfs maintains hardlink in .glusterfs directory for each file. Just check 'ls -l /proc/<respective brick pid>/fd' for any fds opened for a file in .glusterfs and check if it's the same file. Thanks and Regards, Kotresh H R ----- Original Message ----- > From: "Amudhan P" <amudhan83@xxxxxxxxx> > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx> > Sent: Wednesday, September 21, 2016 1:33:10 PM > Subject: Re: 3.8.3 Bitrot signature process > > Hi Kotresh, > > i have used below command to verify any open fd for file. > > "ls -l /proc/*/fd | grep filename". > > as soon as write completes there no open fd's, if there is any alternate > option. please let me know will also try that. > > > > > Also, below is my scrub status in my test setup. number of skipped files > slow reducing day by day. I think files are skipped due to bitrot signature > process is not completed yet. > > where can i see scrub skipped files? > > > Volume name : glsvol1 > > State of scrub: Active (Idle) > > Scrub impact: normal > > Scrub frequency: daily > > Bitrot error log location: /var/log/glusterfs/bitd.log > > Scrubber error log location: /var/log/glusterfs/scrub.log > > > ========================================================= > > Node: localhost > > Number of Scrubbed files: 1644 > > Number of Skipped files: 1001 > > Last completed scrub time: 2016-09-20 11:59:58 > > Duration of last scrub (D:M:H:M:S): 0:0:39:26 > > Error count: 0 > > > ========================================================= > > Node: 10.1.2.3 > > Number of Scrubbed files: 1644 > > Number of Skipped files: 1001 > > Last completed scrub time: 2016-09-20 10:50:00 > > Duration of last scrub (D:M:H:M:S): 0:0:38:17 > > Error count: 0 > > > ========================================================= > > Node: 10.1.2.4 > > Number of Scrubbed files: 981 > > Number of Skipped files: 1664 > > Last completed scrub time: 2016-09-20 12:38:01 > > Duration of last scrub (D:M:H:M:S): 0:0:35:19 > > Error count: 0 > > > ========================================================= > > Node: 10.1.2.1 > > Number of Scrubbed files: 1263 > > Number of Skipped files: 1382 > > Last completed scrub time: 2016-09-20 11:57:21 > > Duration of last scrub (D:M:H:M:S): 0:0:37:17 > > Error count: 0 > > > ========================================================= > > Node: 10.1.2.2 > > Number of Scrubbed files: 1644 > > Number of Skipped files: 1001 > > Last completed scrub time: 2016-09-20 11:59:25 > > Duration of last scrub (D:M:H:M:S): 0:0:39:18 > > Error count: 0 > > ========================================================= > > > > > Thanks > Amudhan > > > On Wed, Sep 21, 2016 at 11:45 AM, Kotresh Hiremath Ravishankar < > khiremat@xxxxxxxxxx> wrote: > > > Hi Amudhan, > > > > I don't think it's the limitation with read data from the brick. > > To limit the usage of CPU, throttling is done using token bucket > > algorithm. The log message showed is related to it. But even then > > I think it should not take 12 minutes for check-sum calculation unless > > there is an fd open (might be internal). Could you please cross verify > > if there are any fd opened on that file by looking into /proc? I will > > also test it out in the mean time and get back to you. > > > > Thanks and Regards, > > Kotresh H R > > > > ----- Original Message ----- > > > From: "Amudhan P" <amudhan83@xxxxxxxxx> > > > To: "Kotresh Hiremath Ravishankar" <khiremat@xxxxxxxxxx> > > > Cc: "Gluster Users" <gluster-users@xxxxxxxxxxx> > > > Sent: Tuesday, September 20, 2016 3:19:28 PM > > > Subject: Re: 3.8.3 Bitrot signature process > > > > > > Hi Kotresh, > > > > > > Please correct me if i am wrong, Once a file write completes and as soon > > as > > > closes fds, bitrot waits for 120 seconds and starts hashing and update > > > signature for the file in brick. > > > > > > But, what i am feeling that bitrot takes too much of time to complete > > > hashing. > > > > > > below is test result i would like to share. > > > > > > > > > writing data in below path using dd : > > > > > > /mnt/gluster/data/G (mount point) > > > -rw-r--r-- 1 root root 10M Sep 20 12:19 test53-bs10M-c1.nul > > > -rw-r--r-- 1 root root 100M Sep 20 12:19 test54-bs10M-c10.nul > > > > > > No any other write or read process is going on. > > > > > > > > > Checking file data in one of the brick. > > > > > > -rw-r--r-- 2 root root 2.5M Sep 20 12:23 test53-bs10M-c1.nul > > > -rw-r--r-- 2 root root 25M Sep 20 12:23 test54-bs10M-c10.nul > > > > > > file's stat and getfattr info from brick, after write process completed. > > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat test53-bs10M-c1.nul > > > File: ‘test53-bs10M-c1.nul’ > > > Size: 2621440 Blocks: 5120 IO Block: 4096 regular file > > > Device: 821h/2081d Inode: 536874168 Links: 2 > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) > > > Access: 2016-09-20 12:23:28.798886647 +0530 > > > Modify: 2016-09-20 12:23:28.994886646 +0530 > > > Change: 2016-09-20 12:23:28.998886646 +0530 > > > Birth: - > > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat test54-bs10M-c10.nul > > > File: ‘test54-bs10M-c10.nul’ > > > Size: 26214400 Blocks: 51200 IO Block: 4096 regular file > > > Device: 821h/2081d Inode: 536874169 Links: 2 > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) > > > Access: 2016-09-20 12:23:42.902886624 +0530 > > > Modify: 2016-09-20 12:23:44.378886622 +0530 > > > Change: 2016-09-20 12:23:44.378886622 +0530 > > > Birth: - > > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo getfattr -m. -e hex -d > > > test53-bs10M-c1.nul > > > # file: test53-bs10M-c1.nul > > > trusted.bit-rot.version=0x020000000000000057daa7b50002e5b4 > > > trusted.ec.config=0x0000080501000200 > > > trusted.ec.size=0x0000000000a00000 > > > trusted.ec.version=0x00000000000000500000000000000050 > > > trusted.gfid=0xe2416bd1aae4403c88f44286273bbe99 > > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo getfattr -m. -e hex -d > > > test54-bs10M-c10.nul > > > # file: test54-bs10M-c10.nul > > > trusted.bit-rot.version=0x020000000000000057daa7b50002e5b4 > > > trusted.ec.config=0x0000080501000200 > > > trusted.ec.size=0x0000000006400000 > > > trusted.ec.version=0x00000000000003200000000000000320 > > > trusted.gfid=0x54e018dd8c5a4bd79e0317729d8a57c5 > > > > > > > > > > > > file's stat and getfattr info from brick, after bitrot signature updated. > > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat test53-bs10M-c1.nul > > > File: ‘test53-bs10M-c1.nul’ > > > Size: 2621440 Blocks: 5120 IO Block: 4096 regular file > > > Device: 821h/2081d Inode: 536874168 Links: 2 > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) > > > Access: 2016-09-20 12:25:31.494886450 +0530 > > > Modify: 2016-09-20 12:23:28.994886646 +0530 > > > Change: 2016-09-20 12:27:00.994886307 +0530 > > > Birth: - > > > > > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo getfattr -m. -e hex -d > > > test53-bs10M-c1.nul > > > # file: test53-bs10M-c1.nul > > > trusted.bit-rot.signature=0x0102000000000000006de7493c5c > > 90f643357c268fbaaf461c1567e0334e4948023ce17268403aa37a > > > trusted.bit-rot.version=0x020000000000000057daa7b50002e5b4 > > > trusted.ec.config=0x0000080501000200 > > > trusted.ec.size=0x0000000000a00000 > > > trusted.ec.version=0x00000000000000500000000000000050 > > > trusted.gfid=0xe2416bd1aae4403c88f44286273bbe99 > > > > > > > > > gfstst-node5:/media/disk2/brick2/data/G$ stat test54-bs10M-c10.nul > > > File: ‘test54-bs10M-c10.nul’ > > > Size: 26214400 Blocks: 51200 IO Block: 4096 regular file > > > Device: 821h/2081d Inode: 536874169 Links: 2 > > > Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) > > > Access: 2016-09-20 12:25:47.510886425 +0530 > > > Modify: 2016-09-20 12:23:44.378886622 +0530 > > > Change: 2016-09-20 12:38:05.954885243 +0530 > > > Birth: - > > > > > > > > > gfstst-node5:/media/disk2/brick2/data/G$ sudo getfattr -m. -e hex -d > > > test54-bs10M-c10.nul > > > # file: test54-bs10M-c10.nul > > > trusted.bit-rot.signature=0x010200000000000000394c345f0b > > 0c63ee652627a62eed069244d35c4d5134e4f07d4eabb51afda47e > > > trusted.bit-rot.version=0x020000000000000057daa7b50002e5b4 > > > trusted.ec.config=0x0000080501000200 > > > trusted.ec.size=0x0000000006400000 > > > trusted.ec.version=0x00000000000003200000000000000320 > > > trusted.gfid=0x54e018dd8c5a4bd79e0317729d8a57c5 > > > > > > > > > (Actual time taken for reading file from brick for md5sum) > > > > > > gfstst-node5:/media/disk2/brick2/data/G$ time md5sum test53-bs10M-c1.nul > > > 8354dcaa18a1ecb52d0895bf00888c44 test53-bs10M-c1.nul > > > > > > real 0m0.045s > > > user 0m0.007s > > > sys 0m0.003s > > > > > > gfstst-node5:/media/disk2/brick2/data/G$ time md5sum > > test54-bs10M-c10.nul > > > bed3c0a4a1407f584989b4009e9ce33f test54-bs10M-c10.nul > > > > > > real 0m0.166s > > > user 0m0.062s > > > sys 0m0.011s > > > > > > As you can see that 'test54-bs10M-c10.nul' file took around 12 minutes to > > > update bitort signature (pls refer stat output for the file). > > > > > > what would be the cause for such a slow read?. Any limitation in read > > data > > > from brick? > > > > > > Also, i am seeing this line bitd.log, what does this mean? > > > [bit-rot.c:1784:br_rate_limit_signer] 0-glsvol1-bit-rot-0: [Rate Limit > > > Info] "tokens/sec (rate): 131072, maxlimit: 524288 > > > > > > > > > Thanks > > > Amudhan P > > > > > > > > > > > > On Mon, Sep 19, 2016 at 1:00 PM, Kotresh Hiremath Ravishankar < > > > khiremat@xxxxxxxxxx> wrote: > > > > > > > Hi Amudhan, > > > > > > > > Thanks for testing out the bitrot feature and sorry for the delayed > > > > response. > > > > Please find the answers inline. > > > > > > > > Thanks and Regards, > > > > Kotresh H R > > > > > > > > ----- Original Message ----- > > > > > From: "Amudhan P" <amudhan83@xxxxxxxxx> > > > > > To: "Gluster Users" <gluster-users@xxxxxxxxxxx> > > > > > Sent: Friday, September 16, 2016 4:14:10 PM > > > > > Subject: Re: 3.8.3 Bitrot signature process > > > > > > > > > > Hi, > > > > > > > > > > Can anyone reply to this mail. > > > > > > > > > > On Tue, Sep 13, 2016 at 12:49 PM, Amudhan P < amudhan83@xxxxxxxxx > > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > I am testing bitrot feature in Gluster 3.8.3 with disperse EC volume > > 4+1. > > > > > > > > > > When i write single small file (< 10MB) after 2 seconds i can see > > bitrot > > > > > signature in bricks for the file, but when i write multiple files > > with > > > > > different size ( > 10MB) it takes long time (> 24hrs) to see bitrot > > > > > signature in all the files. > > > > > > > > The default timeout for signing to happen is 120 seconds. So the > > > > signing will happen > > > > 120 secs after the last fd gets closed on that file. So if the file > > is > > > > being written > > > > continuously, it will not be signed until 120 secs after it's last > > fd is > > > > closed. > > > > > > > > > > My questions are. > > > > > 1. I have enabled scrub schedule as hourly and throttle as normal, > > does > > > > this > > > > > make any impact in delaying bitrot signature? > > > > No. > > > > > 2. other than "bitd.log" where else i can watch current status of > > bitrot, > > > > > like number of files added for signature and file status? > > > > Signature will happen after 120 sec of last fd closure, as said > > above. > > > > There is not status command which tracks the signature of the > > files. > > > > But there is bitrot status command which tracks the number of > > files > > > > scrubbed. > > > > > > > > #gluster vol bitrot <volname> scrub status > > > > > > > > > > > > > 3. where i can confirm that all the files in the brick are bitrot > > signed? > > > > > > > > As said, signing information of all the files is not tracked. > > > > > > > > > 4. is there any file read size limit in bitrot? > > > > > > > > I didn't get. Could you please elaborate this ? > > > > > > > > > 5. options for tuning bitrot for faster signing of files? > > > > > > > > Bitrot feature is mainly to detect silent corruption (bitflips) of > > > > files due to long > > > > term storage. Hence the default is 120 sec of last fd closure, the > > > > signing happens. > > > > But there is a tune able which can change the default 120 sec but > > > > that's only for > > > > testing purposes and we don't recommend it. > > > > > > > > gluster vol get master features.expiry-time > > > > > > > > For testing purposes, you can change this default and test. > > > > > > > > > > Thanks > > > > > Amudhan > > > > > > > > > > > > > > > _______________________________________________ > > > > > Gluster-users mailing list > > > > > Gluster-users@xxxxxxxxxxx > > > > > http://www.gluster.org/mailman/listinfo/gluster-users > > > > > > > > > > _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users