Re: Issue about the size of fstat is less than the really size of the syslog file

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

 

Today I add some log in glusterfs and use strace command to trace tail application, some log as the below:

 

31624 15:19:29.754261 fstat(3, {st_mode=S_IFREG|0644, st_size=31753498, ...}) = 0

31624 15:19:29.754292 write(1, "Nov  4 15:19:28 euca-10-254-195-"..., 369) = 369

31624 15:19:29.754336 nanosleep({1, 0}, NULL) = 0

31624 15:19:30.754609 fstat(3, {st_mode=S_IFREG|0644, st_size=31753678, ...}) = 0

31624 15:19:30.763434 read(3, "Nov  4 15:19:29 euca-10-254-195-"..., 8192) = 257

31624 15:19:30.763706 read(3, "", 8192) = 0

31624 15:19:30.771184 fstat(3, {st_mode=S_IFREG|0644, st_size=31753678, ...}) = 0

31624 15:19:30.771424 write(1, "Nov  4 15:19:29 euca-10-254-195-"..., 257) = 257

31624 15:19:30.771461 write(2, "tail: ", 6) = 6

31624 15:19:30.771524 write(2, "messages: file truncated", 24) = 24

 

[George/C] Truncated happen because 257 bytes read, but two fstat’s diff is 678-498=180 which less than 257.

 

[2016-11-04 15:19:30.770844] W [defaults.c:1007:default_readv_cbk] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x15c)[0x7faed40faebc] (--> /lib64/libglusterfs.so.0(default_readv_cbk+0x58)[0x7faed4103668] (--> /usr/lib64/glusterfs/3.6.9/xlator/cluster/distribute.so(dht_readv_cbk+0x12f)[0x7faec6e542ff] (--> /usr/lib64/glusterfs/3.6.9/xlator/cluster/replicate.so(afr_readv_cbk+0xeb)[0x7faec7083b9b] (--> /usr/lib64/glusterfs/3.6.9/xlator/protocol/client.so(client3_3_readv_cbk+0x465)[0x7faec72eec95] ))))) 0-debug: default_readv_cbk size[31753755]

[2016-11-04 15:19:30.770977] W [defaults.c:1007:default_readv_cbk] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x15c)[0x7faed40faebc] (--> /lib64/libglusterfs.so.0(default_readv_cbk+0x58)[0x7faed4103668] (--> /usr/lib64/glusterfs/3.6.9/xlator/performance/io-cache.so(ioc_frame_return+0x39f)[0x7faec67f4acf] (--> /usr/lib64/glusterfs/3.6.9/xlator/performance/io-cache.so(ioc_waitq_return+0x1f)[0x7faec67f4ccf] (--> /usr/lib64/glusterfs/3.6.9/xlator/performance/io-cache.so(ioc_fault_cbk+0x2ec)[0x7faec67f528c] ))))) 0-debug: default_readv_cbk size[0]

[2016-11-04 15:19:30.771091] W [defaults.c:1007:default_readv_cbk] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x15c)[0x7faed40faebc] (--> /lib64/libglusterfs.so.0(default_readv_cbk+0x58)[0x7faed4103668] (--> /lib64/libglusterfs.so.0(default_readv_cbk+0x10f)[0x7faed410371f] (--> /usr/lib64/glusterfs/3.6.9/xlator/performance/io-cache.so(ioc_frame_return+0x39f)[0x7faec67f4acf] (--> /usr/lib64/glusterfs/3.6.9/xlator/performance/io-cache.so(ioc_waitq_return+0x1f)[0x7faec67f4ccf] ))))) 0-debug: default_readv_cbk size[0]

[2016-11-04 15:19:30.771358] W [md-cache.c:961:mdc_fstat] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x15c)[0x7faed40faebc] (--> /usr/lib64/glusterfs/3.6.9/xlator/performance/md-cache.so(mdc_fstat+0x1b7)[0x7faec61d1977] (--> /usr/lib64/glusterfs/3.6.9/xlator/debug/io-stats.so(io_stats_fstat+0x149)[0x7faec5fb57c9] (--> /lib64/libglusterfs.so.0(default_fstat+0x5c)[0x7faed40ff15c] (--> /usr/lib64/glusterfs/3.6.9/xlator/meta.so(meta_fstat+0x2e)[0x7faec5d9f1ce] ))))) 0-debug: mdc_fstat from cache stat size[31753678]

 

[George/C] The log from mnt_log.log read_cbk do read 257 bytes(3175375531753498)

But in next fstat, the size is return from md-cache which is the old value 31753678.

 

I have one confuse, because I also add the debug log in mdc_readv_cbk, why it not called?(no this log from log file). And I suppose if mdc_readv_cbk is called, from the source code, it will update the stat size for mdc_cache.

 

Thanks & Best Regards,

George

From: Lian, George (Nokia - CN/Hangzhou)
Sent: Thursday, November 03, 2016 4:44 PM
To: 'Pranith Kumar Karampuri' <pkarampu@xxxxxxxxxx>; Raghavendra G <raghavendra@xxxxxxxxxxx>
Cc: Zhang, Bingxuan (Nokia - CN/Hangzhou) <bingxuan.zhang@xxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Zizka, Jan (Nokia - CZ/Prague) <jan.zizka@xxxxxxxxx>; Bao, Xiaohui (Nokia - CN/Hangzhou) <xiaohui.bao@xxxxxxxxx>; Li, Deqian (Nokia - CN/Hangzhou) <deqian.li@xxxxxxxxx>
Subject: RE: [Gluster-devel] Issue about the size of fstat is less than the really size of the syslog file

 

Hi,

 

The issue now is occurred in one test ENV in my company.

Generally, it is caused by some process random write the syslog to a syslog file with multiple host and multiple process, while the destination syslog file stored in GlusterFS mount point.

And then a “tail” application tail the syslog file continuously such like “tail –f syslog >abc.log”

And the issue occurred about ten minute also.

 

And now we try to reproduce the issue only with glusterfs and syslog service, if I reproduce it I will summary the detail reproduce step for you.

 

Thanks & Best Regards,

George

 

 

From: Pranith Kumar Karampuri [mailto:pkarampu@xxxxxxxxxx]
Sent: Thursday, November 03, 2016 4:32 PM
To: Raghavendra G <raghavendra@xxxxxxxxxxx>
Cc: Lian, George (Nokia - CN/Hangzhou) <george.lian@xxxxxxxxx>; Zhang, Bingxuan (Nokia - CN/Hangzhou) <bingxuan.zhang@xxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Zizka, Jan (Nokia - CZ/Prague) <jan.zizka@xxxxxxxxx>
Subject: Re: [Gluster-devel] Issue about the size of fstat is less than the really size of the syslog file

 

 

 

On Thu, Nov 3, 2016 at 1:10 PM, Raghavendra G <raghavendra@xxxxxxxxxxx> wrote:

 

 

On Thu, Nov 3, 2016 at 10:27 AM, Raghavendra G <raghavendra@xxxxxxxxxxx> wrote:

 

 

On Thu, Nov 3, 2016 at 7:16 AM, Lian, George (Nokia - CN/Hangzhou) <george.lian@xxxxxxxxx> wrote:

>Yes. I was assuming that the previous results were tested with:
>1. write-behind on with the fix
>2. quick-read and readdir-ahead off
# gluster volume info log

performance.quick-read: off
performance.readdir-ahead: off
performance.stat-prefetch: on
performance.write-behind: on


with the above configuration and write-behind.so with patch 2, the "tail truncated" issue still be there.

# tail -f syslog >/dev/null
tail: syslog: file truncated
tail: syslog: file truncated

FYI,

 

Thanks George. I'll take a look.

 

Can you please test with following configuration?

 

George,

Do you have any way we can re-create this issue? I mean by creating a similar workload as yours? It seems to catch good bugs in caching layer. It would help in automating this case. If you have the steps, we can also find the fix sooner because the iteration speed to test/verify would be faster. Let us know.

 

1. write-behind on with my fix

2. readdir-ahead and quick-read off

3. performance.stat-prefetch on

4. performance.force-readdirp off

5. dht.force-readdirp off

6. Also mount glusterfs with option "use-readdirp=no"

 

[root@booradley glusterfs]# mount -t glusterfs -o use-readdirp=no booradley:/newptop /mnt

 

[root@booradley glusterfs]# ps ax | grep -i mnt

14418 ?        Ssl    0:00 /usr/local/sbin/glusterfs --use-readdirp=no --volfile-server=booradley --volfile-id=/newptop /mnt

 

[root@booradley glusterfs]# gluster volume set newptop performance.write-behind on

volume set: success

 

[root@booradley glusterfs]# gluster volume set newptop performance.quick-read off

volume set: success

 

[root@booradley glusterfs]# gluster volume set newptop performance.stat-prefetch on

volume set: success

 

[root@booradley glusterfs]# gluster volume set newptop performance.force-readdirp off

volume set: success

 

[root@booradley glusterfs]# gluster volume set newptop dht.force-readdirp off

volume set: success

 

[root@booradley glusterfs]# gluster volume set newptop performance.readdir-ahead off

volume set: success

 

[root@booradley glusterfs]# gluster volume info newptop

 

Volume Name: newptop

Type: Distribute

Volume ID: 092756e1-e095-4e05-9f14-3e9a6aed908c

Status: Started

Snapshot Count: 0

Number of Bricks: 1

Transport-type: tcp

Bricks:

Brick1: booradley:/home/export/newptop

Options Reconfigured:

dht.force-readdirp: off

performance.force-readdirp: off

performance.stat-prefetch: on

performance.write-behind: on

performance.quick-read: off

transport.address-family: inet

performance.readdir-ahead: off

nfs.disable: on

 

 

 


Best Regards,
George


-----Original Message-----
From: Raghavendra Gowdappa [mailto:rgowdapp@xxxxxxxxxx]

Sent: Wednesday, November 02, 2016 5:41 PM
To: Lian, George (Nokia - CN/Hangzhou) <george.lian@xxxxxxxxx>
Cc: Raghavendra G <raghavendra@xxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx; Zizka, Jan (Nokia - CZ/Prague) <jan.zizka@xxxxxxxxx>; Zhang, Bingxuan (Nokia - CN/Hangzhou) <bingxuan.zhang@xxxxxxxxx>
Subject: Re: [Gluster-devel] Issue about the size of fstat is less than the really size of the syslog file



----- Original Message -----
> From: "George Lian (Nokia - CN/Hangzhou)" <george.lian@xxxxxxxxx>
> To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> Cc: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, Gluster-devel@xxxxxxxxxxx, "Jan Zizka (Nokia - CZ/Prague)"
> <jan.zizka@xxxxxxxxx>, "Bingxuan Zhang (Nokia - CN/Hangzhou)" <bingxuan.zhang@xxxxxxxxx>
> Sent: Wednesday, November 2, 2016 1:38:44 PM
> Subject: RE: [Gluster-devel] Issue about the size of fstat is less than the really size of the syslog file
>
> Yes, I confirm use the Patch 2.
>
> One update: the issue is occurred when readdir-ahead off and write-behind on.
> Seems gone when write-behind and readdir-ahead and quick-read all off.
> Not verified with readdir-ahead and quick-read both off and write-behind on
> till now.
>
> Need I test it with write-behind on and readdir-ahead and quick-read both
> off?

Yes. I was assuming that the previous results were tested with:
1. write-behind on with the fix
2. quick-read and readdir-ahead off

If not, test results with this configuration will help.

>
> Best Regards,
> George
>
> -----Original Message-----
> From: Raghavendra Gowdappa [mailto:rgowdapp@xxxxxxxxxx]
> Sent: Wednesday, November 02, 2016 4:04 PM
> To: Lian, George (Nokia - CN/Hangzhou) <george.lian@xxxxxxxxx>
> Cc: Raghavendra G <raghavendra@xxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx;
> Zizka, Jan (Nokia - CZ/Prague) <jan.zizka@xxxxxxxxx>; Zhang, Bingxuan (Nokia
> - CN/Hangzhou) <bingxuan.zhang@xxxxxxxxx>
> Subject: Re: [Gluster-devel] Issue about the size of fstat is less than the
> really size of the syslog file
>
>
>
> ----- Original Message -----
> > From: "George Lian (Nokia - CN/Hangzhou)" <george.lian@xxxxxxxxx>
> > To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>
> > Cc: "Raghavendra G" <raghavendra@xxxxxxxxxxx>, Gluster-devel@xxxxxxxxxxx,
> > "Jan Zizka (Nokia - CZ/Prague)"
> > <jan.zizka@xxxxxxxxx>, "Bingxuan Zhang (Nokia - CN/Hangzhou)"
> > <bingxuan.zhang@xxxxxxxxx>
> > Sent: Wednesday, November 2, 2016 1:29:13 PM
> > Subject: RE: [Gluster-devel] Issue about the size of fstat is less than the
> > really size of the syslog file
> >
> > Hi,
> >
> > When those 3 options turn off, the issue seems gone in about 3 hours,
> > otherwise, the issue will be occurred about every 10 minutes.
>
> That's a good news. IIRC, you mentioned that you saw the issue with just
> write-behind on, with fix applied (readdir-ahead and quick-read off). Can
> you please confirm you had patcset 2 of http://review.gluster.org/15757?
> patchset 1 had some issues that I corrected in 2.
>
> >
> > Best Regards,
> > George
> >
> > -----Original Message-----
> > From: Raghavendra Gowdappa [mailto:rgowdapp@xxxxxxxxxx]
> > Sent: Wednesday, November 02, 2016 1:07 PM
> > To: Lian, George (Nokia - CN/Hangzhou) <george.lian@xxxxxxxxx>
> > Cc: Raghavendra G <raghavendra@xxxxxxxxxxx>; Gluster-devel@xxxxxxxxxxx;
> > Zizka, Jan (Nokia - CZ/Prague) <jan.zizka@xxxxxxxxx>; Zhang, Bingxuan
> > (Nokia
> > - CN/Hangzhou) <bingxuan.zhang@xxxxxxxxx>
> > Subject: Re: [Gluster-devel] Issue about the size of fstat is less than the
> > really size of the syslog file
> >
> > Can you try with following xlators turned off?
> >
> > 1. write-behind
> > 2. readdir-ahead
> > 3. quick-read
> >
> > regards,
> > Raghavendra
> >
> > ----- Original Message -----
> > > From: "George Lian (Nokia - CN/Hangzhou)" <george.lian@xxxxxxxxx>
> > > To: "Raghavendra Gowdappa" <rgowdapp@xxxxxxxxxx>, "Raghavendra G"
> > > <raghavendra@xxxxxxxxxxx>
> > > Cc: Gluster-devel@xxxxxxxxxxx, "Jan Zizka (Nokia - CZ/Prague)"
> > > <jan.zizka@xxxxxxxxx>, "Bingxuan Zhang (Nokia -
> > > CN/Hangzhou)" <bingxuan.zhang@xxxxxxxxx>
> > > Sent: Wednesday, November 2, 2016 10:26:08 AM
> > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less than
> > > the
> > > really size of the syslog file
> > >
> > > Hi,
> > >
> > > Sorry, it seems the issue still there with readdir-ahead off.
> > >
> > > # tail -f syslog >syslog.2.log
> > > tail: syslog: file truncated
> > > tail: syslog: file truncated
> > > tail: syslog: file truncated
> > > tail: syslog: file truncated
> > > tail: syslog: file truncated
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > tail: syslog: file truncated
> > >
> > > The last line is new truncated error message after the following set with
> > > option readdir-ahead.
> > >
> > >
> > > # gluster volume set log performance.readdir-ahead off
> > > volume set: success
> > > [root@SN-0(RNC-124) /root]
> > > # gluster volume info log
> > >
> > > Volume Name: log
> > > Type: Replicate
> > > Volume ID: 9d987a23-1d64-4857-980a-548f1dd0d1e6
> > > Status: Started
> > > Number of Bricks: 1 x 2 = 2
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1: 169.254.0.12:/mnt/bricks/log/brick
> > > Brick2: 169.254.0.13:/mnt/bricks/log/brick
> > > Options Reconfigured:
> > > performance.readdir-ahead: off
> > >
> > > BRs
> > > George
> > >
> > > -----Original Message-----
> > > From: gluster-devel-bounces@xxxxxxxxxxx
> > > [mailto:gluster-devel-bounces@xxxxxxxxxxx] On Behalf Of Raghavendra
> > > Gowdappa
> > > Sent: Tuesday, November 01, 2016 9:29 PM
> > > To: Raghavendra G <raghavendra@xxxxxxxxxxx>
> > > Cc: Gluster-devel@xxxxxxxxxxx; Zizka, Jan (Nokia - CZ/Prague)
> > > <jan.zizka@xxxxxxxxx>; Zhang, Bingxuan (Nokia - CN/Hangzhou)
> > > <bingxuan.zhang@xxxxxxxxx>
> > > Subject: Re: [Gluster-devel] Issue about the size of fstat is less than
> > > the
> > > really size of the syslog file
> > >
> > >
> > >
> > > ----- Original Message -----
> > > > From: "Raghavendra G" <raghavendra@xxxxxxxxxxx>
> > > > To: "Pranith Kumar Karampuri" <pkarampu@xxxxxxxxxx>
> > > > Cc: Gluster-devel@xxxxxxxxxxx, "Jan Zizka (Nokia - CZ/Prague)"
> > > > <jan.zizka@xxxxxxxxx>, "Bingxuan Zhang (Nokia -
> > > > CN/Hangzhou)" <bingxuan.zhang@xxxxxxxxx>
> > > > Sent: Tuesday, November 1, 2016 6:55:31 PM
> > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is less than
> > > > the
> > > > really size of the syslog file
> > > >
> > > > Hi George,
> > > >
> > > > You can try your tests in two scenarios:
> > > >
> > > > 1. Switch off following performance translators:
> > > > a. write-behind
> > > > b. quick-read
> > > > c. readdir-ahead
> > > >
> > > > Once you see that the issue goes away in the above scenario, you can
> > > > apply
> > > > write-behind fix [1] and turn write-behind on.
> > > >
> > > > We have a fix in quick-read at [2].
> > > >
> > > > You can also try to test with the quick-read fix and turning quick-read
> > > > on.
> > > > However, since your issue is stale stat I doubt quick-read is not the
> > >
> > > s/quick-read is not the culprit/quick-read is the culprit/
> > >
> > > > culprit here (The issue with quick-read is stale data).
> > > >
> > > > We see that readdir-ahead is not updating/flushing the stats stored
> > > > along
> > > > with dentries when write or file modifications happen. That seem to be
> > > > the
> > > > issue with readdir-ahead. For now, during both of the above two
> > > > scenarios,
> > > > please keep readdir-ahead off.
> > > >
> > > > [1] http://review.gluster.org/15757/2
> > > > [2] http://review.gluster.org/#/c/15759/2
> > > >
> > > > regards,
> > > > Raghavendra
> > > >
> > > > On Tue, Nov 1, 2016 at 7:55 AM, Pranith Kumar Karampuri <
> > > > pkarampu@xxxxxxxxxx
> > > > > wrote:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, Nov 1, 2016 at 7:32 AM, Lian, George (Nokia - CN/Hangzhou) <
> > > > george.lian@xxxxxxxxx > wrote:
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Hi,
> > > >
> > > >
> > > >
> > > > I will test it with your patches and update to you when I have result.
> > > >
> > > > hi George,
> > > > Please use http://review.gluster.org/#/c/15757/2 i.e. second version of
> > > > Raghavendra's patch. I tested it and it worked fine. We are still
> > > > trying
> > > > to
> > > > figure out quick-read and readdir-ahead as I type this mail.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Thanks a lots
> > > >
> > > >
> > > >
> > > > Best Regards,
> > > >
> > > > George
> > > >
> > > >
> > > >
> > > > From: Pranith Kumar Karampuri [mailto: pkarampu@xxxxxxxxxx ]
> > > > Sent: Monday, October 31, 2016 11:23 AM
> > > > To: Lian, George (Nokia - CN/Hangzhou) < george.lian@xxxxxxxxx >
> > > > Cc: Raghavendra Gowdappa < rgowdapp@xxxxxxxxxx >; Zhang, Bingxuan
> > > > (Nokia
> > > > -
> > > > CN/Hangzhou) < bingxuan.zhang@xxxxxxxxx >; Gluster-devel@xxxxxxxxxxx ;
> > > > Zizka, Jan (Nokia - CZ/Prague) < jan.zizka@xxxxxxxxx >
> > > >
> > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is less than
> > > > the
> > > > really size of the syslog file
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Removing I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx , it is
> > > > causing
> > > > mail delivery problems for me.
> > > >
> > > >
> > > > George,
> > > >
> > > >
> > > > Raghavendra and I made some progress on this issue. We were in parallel
> > > > working on another issue which is similar where elastic search indices
> > > > are
> > > > getting corrupted because of wrong stat sizes in our opinion. So I have
> > > > been
> > > > running different translator stacks in identifying the problematic
> > > > xlators
> > > > which are leading to indices corruption.
> > > >
> > > >
> > > > We found the list to be 1) Write-behind, 2) Quick-read, 3)
> > > > Readdir-ahead.
> > > > Raghavendra and I just had a chat and we are suspecting that lack of
> > > > lookup/readdirp implementation in write-behind could be the reason for
> > > > this
> > > > problem. Similar problems may exist in other two xlators too. But we
> > > > are
> > > > working on write-behind with priority.
> > > >
> > > >
> > > > Our theory is this:
> > > >
> > > >
> > > > If we do a 4KB write for example and it is cached in write-behind and
> > > > we
> > > > do
> > > > a
> > > > lookup on the file/do a readdirp on the directory with this file we
> > > > send
> > > > out
> > > > wrong stat value to the kernel. There are different caches between
> > > > kernel
> > > > and gluster which may lead to fstat never coming till write-behind. So
> > > > we
> > > > need to make sure that we don't get into this situation.
> > > >
> > > >
> > > > Action items:
> > > >
> > > >
> > > > At the moment Raghavendra is working on a patch to implement
> > > > lookup/readdirp
> > > > in write-behind. I am going to test the same for elastic search. Will
> > > > it
> > > > be
> > > > possible for you to test your application against the same patch and
> > > > confirm
> > > > that the patch fixes the problem?
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Fri, Oct 28, 2016 at 12:08 PM, Pranith Kumar Karampuri <
> > > > pkarampu@xxxxxxxxxx > wrote:
> > > >
> > > >
> > > >
> > > >
> > > > hi George,
> > > >
> > > >
> > > > It would help if we can identify the bare minimum xlators which are
> > > > contributing to the issue like Raghavendra was mentioning earlier. We
> > > > were
> > > > wondering if it is possible for you to help us in identifying the issue
> > > > by
> > > > running the workload on a modified setup? We can suggest testing out
> > > > using
> > > > custom volfiles so that we can slowly build the graph which could be
> > > > causing
> > > > this issue. We would like you guys to try out this problem with just
> > > > posix-xlator and fuse and nothing else.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Oct 27, 2016 at 1:40 PM, Lian, George (Nokia - CN/Hangzhou) <
> > > > george.lian@xxxxxxxxx > wrote:
> > > >
> > > >
> > > >
> > > > Hi, Raghavendra,
> > > >
> > > > Could you please give some suggestion for this issue? we try to find
> > > > the
> > > > clue
> > > > for this issue for a long time, but it has no progress:(
> > > >
> > > > Thanks & Best Regards,
> > > > George
> > > >
> > > > -----Original Message-----
> > > > From: Lian, George (Nokia - CN/Hangzhou)
> > > > Sent: Wednesday, October 19, 2016 4:40 PM
> > > > To: 'Raghavendra Gowdappa' < rgowdapp@xxxxxxxxxx >
> > > > Cc: Gluster-devel@xxxxxxxxxxx ; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS <
> > > > I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx >; Zhang, Bingxuan
> > > > (Nokia
> > > > -
> > > > CN/Hangzhou) < bingxuan.zhang@xxxxxxxxx >; Zizka, Jan (Nokia -
> > > > CZ/Prague)
> > > > <
> > > > jan.zizka@xxxxxxxxx >
> > > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less than
> > > > the
> > > > really size of the syslog file
> > > >
> > > > Hi, Raghavendra
> > > >
> > > > Just now, we test it with glusterfs log with debug-level "TRACE", and
> > > > let
> > > > some application trigger "glusterfs" produce large log, in that case,
> > > > when
> > > > we set write-behind and stat-prefetch both OFF,
> > > > Tail the glusterfs log such like mnt-{VOLUME-NAME}.log, it still failed
> > > > with
> > > > "file truncated",
> > > >
> > > > So that means if file's IO in huge amount, the issue will still be
> > > > there
> > > > even
> > > > write-behind and stat-prefetch both OFF.
> > > >
> > > > Best Regards,
> > > > George
> > > >
> > > > -----Original Message-----
> > > > From: Raghavendra Gowdappa [mailto: rgowdapp@xxxxxxxxxx ]
> > > >
> > > >
> > > > Sent: Wednesday, October 19, 2016 2:54 PM
> > > > To: Lian, George (Nokia - CN/Hangzhou) < george.lian@xxxxxxxxx >
> > > > Cc: Gluster-devel@xxxxxxxxxxx ; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS <
> > > > I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx >; Zhang, Bingxuan
> > > > (Nokia
> > > > -
> > > > CN/Hangzhou) < bingxuan.zhang@xxxxxxxxx >; Zizka, Jan (Nokia -
> > > > CZ/Prague)
> > > > <
> > > > jan.zizka@xxxxxxxxx >
> > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is less than
> > > > the
> > > > really size of the syslog file
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > > From: "George Lian (Nokia - CN/Hangzhou)" < george.lian@xxxxxxxxx >
> > > > > To: "Raghavendra Gowdappa" < rgowdapp@xxxxxxxxxx >
> > > > > Cc: Gluster-devel@xxxxxxxxxxx , "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
> > > > > < I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx >, "Bingxuan Zhang
> > > > > (Nokia - CN/Hangzhou)"
> > > > > < bingxuan.zhang@xxxxxxxxx >, "Jan Zizka (Nokia - CZ/Prague)" <
> > > > > jan.zizka@xxxxxxxxx >
> > > > > Sent: Wednesday, October 19, 2016 12:05:01 PM
> > > > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less
> > > > > than
> > > > > the
> > > > > really size of the syslog file
> > > > >
> > > > > Hi, Raghavendra,
> > > > >
> > > > > Thanks a lots for your quickly update!
> > > > > In my case, there are so many process(write) is writing to the syslog
> > > > > file,
> > > > > it do involve the writer is in the same host and writing in same
> > > > > mount
> > > > > point
> > > > > while the tail(reader) is reading it.
> > > > >
> > > > > The bug I just guess is:
> > > > > When a writer write the data with write-behind, it call the call-back
> > > > > function " mdc_writev_cbk" and called "mdc_inode_iatt_set_validate"
> > > > > to
> > > > > validate the "iatt" data, but with the code I mentioned last mail, it
> > > > > do
> > > > > nothing.
> > > >
> > > > mdc_inode_iatt_set_validate has following code
> > > >
> > > > <snippet>
> > > > if (!iatt || !iatt->ia_ctime) {
> > > > mdc->ia_time = 0;
> > > > goto unlock;
> > > > }
> > > > </snippet>
> > > >
> > > > Which means a NULL iatt sets mdc->ia_time to 0. This results in
> > > > subsequent
> > > > lookup/stat calls to be NOT served from md-cache. Instead, the stat is
> > > > served from backend bricks. So, I don't see an issue here.
> > > >
> > > > However, one case where a NULL iatt is different from a valid iatt
> > > > (which
> > > > differs from the value stored in md-cache) is that the latter results
> > > > in
> > > > a
> > > > call to inode_invalidate. This invalidation propagates to kernel and
> > > > all
> > > > dentry and page cache corresponding to file is purged. So, I am
> > > > suspecting
> > > > whether the stale stat you saw was served from kernel cache (not from
> > > > glusterfs). If this is the case, having mount options
> > > > "attribute-timeout=0"
> > > > and "entry-timeout=0" should've helped.
> > > >
> > > > I am still at loss to point out the RCA for this issue.
> > > >
> > > >
> > > > > And in same time, the reader(tail) read the "iatt" data, but in case
> > > > > of
> > > > > the
> > > > > cache-time is not timeout, it will return the "iatt" data without the
> > > > > last
> > > > > change.
> > > > >
> > > > > Do your think it is a possible bug?
> > > > >
> > > > > Thanks & Best Regards,
> > > > > George
> > > > >
> > > > > -----Original Message-----
> > > > > From: Raghavendra Gowdappa [mailto: rgowdapp@xxxxxxxxxx ]
> > > > > Sent: Wednesday, October 19, 2016 2:06 PM
> > > > > To: Lian, George (Nokia - CN/Hangzhou) < george.lian@xxxxxxxxx >
> > > > > Cc: Gluster-devel@xxxxxxxxxxx ; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> > > > > < I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx >; Zhang, Bingxuan
> > > > > (Nokia -
> > > > > CN/Hangzhou) < bingxuan.zhang@xxxxxxxxx >; Zizka, Jan (Nokia -
> > > > > CZ/Prague)
> > > > > < jan.zizka@xxxxxxxxx >
> > > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is less
> > > > > than
> > > > > the
> > > > > really size of the syslog file
> > > > >
> > > > >
> > > > >
> > > > > ----- Original Message -----
> > > > > > From: "George Lian (Nokia - CN/Hangzhou)" < george.lian@xxxxxxxxx >
> > > > > > To: "Raghavendra Gowdappa" < rgowdapp@xxxxxxxxxx >
> > > > > > Cc: Gluster-devel@xxxxxxxxxxx ,
> > > > > > "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
> > > > > > < I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx >, "Bingxuan
> > > > > > Zhang
> > > > > > (Nokia
> > > > > > - CN/Hangzhou)"
> > > > > > < bingxuan.zhang@xxxxxxxxx >, "Jan Zizka (Nokia - CZ/Prague)"
> > > > > > < jan.zizka@xxxxxxxxx >
> > > > > > Sent: Wednesday, October 19, 2016 10:51:24 AM
> > > > > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less
> > > > > > than
> > > > > > the
> > > > > > really size of the syslog file
> > > > > >
> > > > > > Hi, Raghavendra,
> > > > > >
> > > > > > When we disable md-cache(gluster volume set log
> > > > > > performance.md-cache-timeout
> > > > > > 0), the issue seems gone.
> > > > > > (we can't disable with " gluster volume set log
> > > > > > performance.md-cache
> > > > > > off"
> > > > > > why?)
> > > > >
> > > > > Please use
> > > > > #gluster volume set log performance.stat-prefetch off
> > > > >
> > > > > >
> > > > > > So I double confuse that the code I abstract in last mail maybe
> > > > > > have
> > > > > > some
> > > > > > issue for this case.
> > > > > > Could you please share your comments?
> > > > >
> > > > > Please find my comments below.
> > > > >
> > > > > >
> > > > > > Thanks & Best Regards,
> > > > > > George
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Lian, George (Nokia - CN/Hangzhou)
> > > > > > Sent: Friday, October 14, 2016 1:44 PM
> > > > > > To: 'Raghavendra Gowdappa' < rgowdapp@xxxxxxxxxx >
> > > > > > Cc: Gluster-devel@xxxxxxxxxxx ; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> > > > > > < I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx >; Zhang,
> > > > > > Bingxuan
> > > > > > (Nokia
> > > > > > -
> > > > > > CN/Hangzhou) < bingxuan.zhang@xxxxxxxxx >; Zizka, Jan (Nokia -
> > > > > > CZ/Prague)
> > > > > > < jan.zizka@xxxxxxxxx >
> > > > > > Subject: RE: [Gluster-devel] Issue about the size of fstat is less
> > > > > > than
> > > > > > the
> > > > > > really size of the syslog file
> > > > > >
> > > > > > Hi, Raghavendra,
> > > > > >
> > > > > > Our version of GlusterFS is 3.6.9, and I also check the newest code
> > > > > > of
> > > > > > main
> > > > > > branch, the function of " mdc_inode_iatt_set_validate" is almost
> > > > > > same,
> > > > > > from
> > > > > > the following code of this function,
> > > > > > We could see a "TODO" comments inline, does it mean if we enhance
> > > > > > write-behind feature, the "iatt" field in callback will be NULL, so
> > > > > > that
> > > > > > inode_invalidate will not be called? So the size of file will not
> > > > > > update
> > > > > > since "write behind" enabled ?
> > > > > > Is it the root cause for "tail" application failed with "file
> > > > > > truncated"
> > > > > > issue ?
> > > > > >
> > > > > > LOCK (&mdc->lock);
> > > > > > {
> > > > > > if (!iatt || !iatt->ia_ctime) {
> > > > > > mdc->ia_time = 0;
> > > > > > goto unlock;
> > > > > > }
> > > > > >
> > > > > > /*
> > > > > > * Invalidate the inode if the mtime or ctime has changed
> > > > > > * and the prebuf doesn't match the value we have cached.
> > > > > > * TODO: writev returns with a NULL iatt due to
> > > > > > * performance/write-behind, causing invalidation on writes.
> > > > > > */
> > > > >
> > > > > The issue explained in this comment is hit only when writes are done.
> > > > > But,
> > > > > in
> > > > > your use-case only "tail" is the application running on the mount (If
> > > > > I
> > > > > am
> > > > > not wrong, the writer is running on a different mountpoint). So, I
> > > > > doubt
> > > > > you are hitting this issue. But, you are saying that the issue goes
> > > > > away
> > > > > when write-behind/md-cache is turned off pointing to some interaction
> > > > > between md-cache and write-behind causing the issue. I need more time
> > > > > to
> > > > > look into this issue. Can you file a bug on this?
> > > > >
> > > > > > if (IA_ISREG(inode->ia_type) &&
> > > > > > ((iatt->ia_mtime != mdc->md_mtime) ||
> > > > > > (iatt->ia_ctime != mdc->md_ctime)))
> > > > > > if (!prebuf || (prebuf->ia_ctime != mdc->md_ctime) ||
> > > > > > (prebuf->ia_mtime != mdc->md_mtime))
> > > > > > inode_invalidate(inode);
> > > > > >
> > > > > > mdc_from_iatt (mdc, iatt);
> > > > > >
> > > > > > time (&mdc->ia_time);
> > > > > > }
> > > > > >
> > > > > > Best Regards,
> > > > > > George
> > > > > > -----Original Message-----
> > > > > > From: Raghavendra Gowdappa [mailto: rgowdapp@xxxxxxxxxx ]
> > > > > > Sent: Thursday, October 13, 2016 8:58 PM
> > > > > > To: Lian, George (Nokia - CN/Hangzhou) < george.lian@xxxxxxxxx >
> > > > > > Cc: Gluster-devel@xxxxxxxxxxx ; I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS
> > > > > > < I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx >; Zhang,
> > > > > > Bingxuan
> > > > > > (Nokia
> > > > > > -
> > > > > > CN/Hangzhou) < bingxuan.zhang@xxxxxxxxx >; Zizka, Jan (Nokia -
> > > > > > CZ/Prague)
> > > > > > < jan.zizka@xxxxxxxxx >
> > > > > > Subject: Re: [Gluster-devel] Issue about the size of fstat is less
> > > > > > than
> > > > > > the
> > > > > > really size of the syslog file
> > > > > >
> > > > > >
> > > > > >
> > > > > > ----- Original Message -----
> > > > > > > From: "George Lian (Nokia - CN/Hangzhou)" < george.lian@xxxxxxxxx
> > > > > > > >
> > > > > > > To: Gluster-devel@xxxxxxxxxxx
> > > > > > > Cc: "I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX_GMS"
> > > > > > > < I_EXT_MBB_WCDMA_SWD3_DA1_MATRIX@xxxxxxxxxxxxxxxx >, "Bingxuan
> > > > > > > Zhang
> > > > > > > (Nokia
> > > > > > > - CN/Hangzhou)" < bingxuan.zhang@xxxxxxxxx >, "Jan Zizka (Nokia -
> > > > > > > CZ/Prague)"
> > > > > > > < jan.zizka@xxxxxxxxx >
> > > > > > > Sent: Thursday, October 13, 2016 2:33:53 PM
> > > > > > > Subject: [Gluster-devel] Issue about the size of fstat is less
> > > > > > > than
> > > > > > > the
> > > > > > > really size of the syslog file
> > > > > > >
> > > > > > > Hi, Dear Expert,
> > > > > > > We have use glusterfs as a network filesystem, and syslog store
> > > > > > > in
> > > > > > > there,
> > > > > > > some clients on different host may write the syslog file via
> > > > > > >
glusterfs
> > > > > > > mount point.
> > > > > > > Now we encounter an issue when we
tail the syslog file, it will
> > > > > > > occasional
> > > > > > > failed with error
file truncated
> > > > > > > As we study and trace with the
tail source code, it failed with
> > > > > > > the
> > > > > > > following code:
> > > > > > > if ( S_ISREG (mode) && stats.st_size < f[i].size )
> > > > > > > {
> > > > > > > error (0, 0, _("%s: file truncated"), quotef (name));
> > > > > > > /* Assume the file was truncated to 0,
> > > > > > > and therefore output all "new" data. */
> > > > > > > xlseek (fd, 0, SEEK_SET, name);
> > > > > > > f[i].size = 0;
> > > > > > > }
> > > > > > > When stats.st_size < f[i].size, what mean the size report by
> > > > > > > fstat
> > > > > > > is
> > > > > > > less
> > > > > > > than
tail had read, it lead to file truncated, we also use
> > > > > > >
strace
> > > > > > > tools to trace the tail application, the related tail strace log
> > > > > > > as
> > > > > > > the
> > > > > > > below:
> > > > > > > nanosleep({1, 0}, NULL) = 0
> > > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0
> > > > > > > nanosleep({1, 0}, NULL) = 0
> > > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0
> > > > > > > nanosleep({1, 0}, NULL) = 0
> > > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192543105, ...}) = 0
> > > > > > > nanosleep({1, 0}, NULL) = 0
> > > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192544549, ...}) = 0
> > > > > > > read(3, " Data
-"..., 8192) = 1444
> > > > > > > read(3, " Data.. "..., 8192) = 720
> > > > > > > read(3, "", 8192) = 0
> > > > > > > fstat(3, {st_mode=S_IFREG|0644, st_size=192544789, ...}) = 0
> > > > > > > write(1,
DATA.. ) = 2164
> > > > > > > write(2, "tail: ", 6tail: ) = 6
> > > > > > > write(2, "/mnt/log/master/syslog: file tru"...,
> > > > > > > 38/mnt/log/master/syslog:
> > > > > > > file truncated) = 38
> > > > > > > as the above strace log, tail has read 1444+720=2164 bytes,
> > > > > > > but fstat tell
tail 192544789 192543105 = 1664 which less
> > > > > > > than
> > > > > > > 2164,
> > > > > > > so
> > > > > > > it lead to
tail application file truncated.
> > > > > > > And if we turn off
write-behind feature, the issue will not be
> > > > > > > reproduced
> > > > > > > any more.
> > > > > >
> > > > > > That seems strange. There are no writes happening on the fd/inode
> > > > > > through
> > > > > > which tail is reading/stating from. So, it seems strange that
> > > > > > write-behind
> > > > > > is involved here. I suspect whether any of
> > > > > > md-cache/read-ahead/io-cache
> > > > > > is
> > > > > > causing the issue. Can you,
> > > > > >
> > > > > > 1. Turn off md-cache, read-ahead, io-cache xlators
> > > > > > 2. mount glusterfs with --attribute-timeout=0
> > > > > > 3. set write-behind on
> > > > > >
> > > > > > and rerun the tests? If you don't hit the issue, you can experiment
> > > > > > by
> > > > > > turning on/off of md-cache, read-ahead and io-cache translators and
> > > > > > see
> > > > > > what
> > > > > > are the minimal number of xlators that need to be turned off to not
> > > > > > hit
> > > > > > the
> > > > > > issue (with write-behind on)?
> > > > > >
> > > > > > regards,
> > > > > > Raghavendra
> > > > > >
> > > > > > > So we think it may be related to cache consistence issue due to
> > > > > > > performance
> > > > > > > consider, but we still have concern that:
> > > > > > > The syslog file is used only with
Append mode, so the size of
> > > > > > > file
> > > > > > > shouldn
t be reduced, when a client read the file, why fstat
> > > > > > > can
t
> > > > > > > return
> > > > > > > the really size match to the cache?
> > > > > > > From current investigation, we doubt that the current implement
> > > > > > > of
> > > > > > >
glusterfs has a bug on fstat when cache is on.
> > > > > > > Your comments is our highly appreciated!
> > > > > > > Thanks & Best Regards
> > > > > > > George
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Gluster-devel mailing list
> > > > > > > Gluster-devel@xxxxxxxxxxx
> > > > > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > > > >
> > > > >
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel@xxxxxxxxxxx
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > Pranith
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > >
> > > >
> > > > Pranith
> > > >
> > > >
> > > >
> > > > --
> > > > Pranith
> > > >
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel@xxxxxxxxxxx
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > >
> > > >
> > > >
> > > > --
> > > > Raghavendra G
> > > >
> > > > _______________________________________________
> > > > Gluster-devel mailing list
> > > > Gluster-devel@xxxxxxxxxxx
> > > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > > _______________________________________________
> > > Gluster-devel mailing list
> > > Gluster-devel@xxxxxxxxxxx
> > > http://www.gluster.org/mailman/listinfo/gluster-devel
> > >
> >
>
_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel




--

Raghavendra G



 

--

Raghavendra G


_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel




--

Pranith

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-devel

[Index of Archives]     [Gluster Users]     [Ceph Users]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux