Hi, Raghavendra, How can I confirm that the "stat" value is from glusterfs cache or kernel cache? Now the issue can easy reproduce without the stat-prefetch or write-behind on. Best Regards, George -----Original Message----- From: Raghavendra Gowdappa [mailto:rgowdapp@xxxxxxxxxx] Sent: Wednesday, October 19, 2016 2:54 PM To: Lian, George (Nokia - CN/Hangzhou) <george.lian@xxxxxxxxx> Cc: Gluster-devel@xxxxxxxxxxx; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx>; Zhang, Bingxuan (Nokia - CN/Hangzhou) <bingxuan.zhang@xxxxxxxxx>; Zizka, Jan (Nokia - CZ/Prague) <jan.zizka@xxxxxxxxx> Subject: Re: Issue about the size of fstat is less than the really size of the syslog file ----- Original Message ----- > From: "George Lian (Nokia - CN/Hangzhou)" <george.lian@xxxxxxxxx> > To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > Cc: Gluster-devel@xxxxxxxxxxx, "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS" > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx>, "Bingxuan Zhang (Nokia - CN/Hangzhou)" > <bingxuan.zhang@xxxxxxxxx>, "Jan Zizka (Nokia - CZ/Prague)" <jan.zizka@xxxxxxxxx> > Sent: Wednesday, October 19, 2016 12:05:01 PM > Subject: RE: Issue about the size of fstat is less than the really size of the syslog file > > Hi, Raghavendra, > > Thanks a lots for your quickly update! > In my case, there are so many process(write) is writing to the syslog file, > it do involve the writer is in the same host and writing in same mount point > while the tail(reader) is reading it. > > The bug I just guess is: > When a writer write the data with write-behind, it call the call-back > function " mdc_writev_cbk" and called "mdc_inode_iatt_set_validate" to > validate the "iatt" data, but with the code I mentioned last mail, it do > nothing. mdc_inode_iatt_set_validate has following code <snippet> if (!iatt || !iatt->ia_ctime) { mdc->ia_time = 0; goto unlock; } </snippet> Which means a NULL iatt sets mdc->ia_time to 0. This results in subsequent lookup/stat calls to be NOT served from md-cache. Instead, the stat is served from backend bricks. So, I don't see an issue here. However, one case where a NULL iatt is different from a valid iatt (which differs from the value stored in md-cache) is that the latter results in a call to inode_invalidate. This invalidation propagates to kernel and all dentry and page cache corresponding to file is purged. So, I am suspecting whether the stale stat you saw was served from kernel cache (not from glusterfs). If this is the case, having mount options "attribute-timeout=0" and "entry-timeout=0" should've helped. I am still at loss to point out the RCA for this issue. > And in same time, the reader(tail) read the "iatt" data, but in case of the > cache-time is not timeout, it will return the "iatt" data without the last > change. > > Do your think it is a possible bug? > > Thanks & Best Regards, > George > > -----Original Message----- > From: Raghavendra Gowdappa [mailto:rgowdapp@xxxxxxxxxx] > Sent: Wednesday, October 19, 2016 2:06 PM > To: Lian, George (Nokia - CN/Hangzhou) <george.lian@xxxxxxxxx> > Cc: Gluster-devel@xxxxxxxxxxx; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx>; Zhang, Bingxuan (Nokia - > CN/Hangzhou) <bingxuan.zhang@xxxxxxxxx>; Zizka, Jan (Nokia - CZ/Prague) > <jan.zizka@xxxxxxxxx> > Subject: Re: Issue about the size of fstat is less than the > really size of the syslog file > > > > ----- Original Message ----- > > From: "George Lian (Nokia - CN/Hangzhou)" <george.lian@xxxxxxxxx> > > To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx> > > Cc: Gluster-devel@xxxxxxxxxxx, "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS" > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx>, "Bingxuan Zhang (Nokia > > - CN/Hangzhou)" > > <bingxuan.zhang@xxxxxxxxx>, "Jan Zizka (Nokia - CZ/Prague)" > > <jan.zizka@xxxxxxxxx> > > Sent: Wednesday, October 19, 2016 10:51:24 AM > > Subject: RE: Issue about the size of fstat is less than the > > really size of the syslog file > > > > Hi, Raghavendra, > > > > When we disable md-cache(gluster volume set log > > performance.md-cache-timeout > > 0), the issue seems gone. > > (we can't disable with " gluster volume set log performance.md-cache off" > > why?) > > Please use > #gluster volume set log performance.stat-prefetch off > > > > > So I double confuse that the code I abstract in last mail maybe have some > > issue for this case. > > Could you please share your comments? > > Please find my comments below. > > > > > Thanks & Best Regards, > > George > > > > -----Original Message----- > > From: Lian, George (Nokia - CN/Hangzhou) > > Sent: Friday, October 14, 2016 1:44 PM > > To: 'Raghavendra Gowdappa' <rgowdapp@xxxxxxxxxx> > > Cc: Gluster-devel@xxxxxxxxxxx; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx>; Zhang, Bingxuan (Nokia > > - > > CN/Hangzhou) <bingxuan.zhang@xxxxxxxxx>; Zizka, Jan (Nokia - CZ/Prague) > > <jan.zizka@xxxxxxxxx> > > Subject: RE: Issue about the size of fstat is less than the > > really size of the syslog file > > > > Hi, Raghavendra, > > > > Our version of GlusterFS is 3.6.9, and I also check the newest code of main > > branch, the function of " mdc_inode_iatt_set_validate" is almost same, from > > the following code of this function, > > We could see a "TODO" comments inline, does it mean if we enhance > > write-behind feature, the "iatt" field in callback will be NULL, so that > > inode_invalidate will not be called? So the size of file will not update > > since "write behind" enabled ? > > Is it the root cause for "tail" application failed with "file truncated" > > issue ? > > > > LOCK (&mdc->lock); > > { > > if (!iatt || !iatt->ia_ctime) { > > mdc->ia_time = 0; > > goto unlock; > > } > > > > /* > > * Invalidate the inode if the mtime or ctime has changed > > * and the prebuf doesn't match the value we have cached. > > * TODO: writev returns with a NULL iatt due to > > * performance/write-behind, causing invalidation on writes. > > */ > > The issue explained in this comment is hit only when writes are done. But, in > your use-case only "tail" is the application running on the mount (If I am > not wrong, the writer is running on a different mountpoint). So, I doubt > you are hitting this issue. But, you are saying that the issue goes away > when write-behind/md-cache is turned off pointing to some interaction > between md-cache and write-behind causing the issue. I need more time to > look into this issue. Can you file a bug on this? > > > if (IA_ISREG(inode->ia_type) && > > ((iatt->ia_mtime != mdc->md_mtime) || > > (iatt->ia_ctime != mdc->md_ctime))) > > if (!prebuf || (prebuf->ia_ctime != mdc->md_ctime) || > > (prebuf->ia_mtime != mdc->md_mtime)) > > inode_invalidate(inode); > > > > mdc_from_iatt (mdc, iatt); > > > > time (&mdc->ia_time); > > } > > > > Best Regards, > > George > > -----Original Message----- > > From: Raghavendra Gowdappa [mailto:rgowdapp@xxxxxxxxxx] > > Sent: Thursday, October 13, 2016 8:58 PM > > To: Lian, George (Nokia - CN/Hangzhou) <george.lian@xxxxxxxxx> > > Cc: Gluster-devel@xxxxxxxxxxx; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx>; Zhang, Bingxuan (Nokia > > - > > CN/Hangzhou) <bingxuan.zhang@xxxxxxxxx>; Zizka, Jan (Nokia - CZ/Prague) > > <jan.zizka@xxxxxxxxx> > > Subject: Re: Issue about the size of fstat is less than the > > really size of the syslog file > > > > > > > > ----- Original Message ----- > > > From: "George Lian (Nokia - CN/Hangzhou)" <george.lian@xxxxxxxxx> > > > To: Gluster-devel@xxxxxxxxxxx > > > Cc: "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS" > > > <I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx>, "Bingxuan Zhang > > > (Nokia > > > - CN/Hangzhou)" <bingxuan.zhang@xxxxxxxxx>, "Jan Zizka (Nokia - > > > CZ/Prague)" > > > <jan.zizka@xxxxxxxxx> > > > Sent: Thursday, October 13, 2016 2:33:53 PM > > > Subject: Issue about the size of fstat is less than the > > > really size of the syslog file > > > > > > Hi, Dear Expert, > > > We have use glusterfs as a network filesystem, and syslog store in there, > > > some clients on different host may write the syslog file via “glusterfs” > > > mount point. > > > Now we encounter an issue when we “tail” the syslog file, it will > > > occasional > > > failed with error “ file truncated ” > > > As we study and trace with the “tail” source code, it failed with the > > > following code: > > > if ( S_ISREG (mode) && stats.st_size < f[i].size ) > > > { > > > error (0, 0, _("%s: file truncated"), quotef (name)); > > > /* Assume the file was truncated to 0, > > > and therefore output all "new" data. */ > > > xlseek (fd, 0, SEEK_SET, name); > > > f[i].size = 0; > > > } > > > When stats.st_size < f[i].size, what mean the size report by fstat is > > > less > > > than “tail” had read, it lead to “file truncated”, we also use “strace” > > > tools to trace the tail application, the related tail strace log as the > > > below: > > > nanosleep({1, 0}, NULL) = 0 > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0 > > > nanosleep({1, 0}, NULL) = 0 > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0 > > > nanosleep({1, 0}, NULL) = 0 > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0 > > > nanosleep({1, 0}, NULL) = 0 > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192544549, ...}) = 0 > > > read(3, " Data … -"..., 8192) = 1444 > > > read(3, " Data.. "..., 8192) = 720 > > > read(3, "", 8192) = 0 > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192544789, ...}) = 0 > > > write(1, “DATA…..” ) = 2164 > > > write(2, "tail: ", 6tail: ) = 6 > > > write(2, "/mnt/log/master/syslog: file tru"..., 38/mnt/log/master/syslog: > > > file truncated) = 38 > > > as the above strace log, tail has read 1444+720=2164 bytes, > > > but fstat tell “tail” 192544789 – 192543105 = 1664 which less than 2164, > > > so > > > it lead to “tail” application “file truncated”. > > > And if we turn off “write-behind” feature, the issue will not be > > > reproduced > > > any more. > > > > That seems strange. There are no writes happening on the fd/inode through > > which tail is reading/stating from. So, it seems strange that write-behind > > is involved here. I suspect whether any of md-cache/read-ahead/io-cache is > > causing the issue. Can you, > > > > 1. Turn off md-cache, read-ahead, io-cache xlators > > 2. mount glusterfs with --attribute-timeout=0 > > 3. set write-behind on > > > > and rerun the tests? If you don't hit the issue, you can experiment by > > turning on/off of md-cache, read-ahead and io-cache translators and see > > what > > are the minimal number of xlators that need to be turned off to not hit the > > issue (with write-behind on)? > > > > regards, > > Raghavendra > > > > > So we think it may be related to cache consistence issue due to > > > performance > > > consider, but we still have concern that: > > > The syslog file is used only with “Append” mode, so the size of file > > > shouldn’t be reduced, when a client read the file, why “fstat” can’t > > > return > > > the really size match to the cache? > > > From current investigation, we doubt that the current implement of > > > “glusterfs” has a bug on “fstat” when cache is on. > > > Your comments is our highly appreciated! > > > Thanks & Best Regards > > > George > > > > > > _______________________________________________ > > > Gluster-devel mailing list > > > Gluster-devel@xxxxxxxxxxx > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > _______________________________________________ Gluster-devel mailing list Gluster-devel@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-devel