Re: cephfs page cache

Sean Redmond <sean.redmond1@xxxxxxxxx> · Fri, 2 Sep 2016 10:10:51 +0100

I have checked all the servers in scope running 'dmesg | grep -i stale' and it does not yield any results. 
As a test I have rebooted the servers in scope and I can still replicate the behavior 100% of the time.

On Fri, Sep 2, 2016 at 4:37 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
I think about this again. This issue could be caused by stale session.

Could you check kernel logs of your servers. Are there any ceph

related kernel message (such as "ceph: mds0 caps stale")

Regards

Yan, Zheng

On Thu, Sep 1, 2016 at 11:02 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> wrote:

> Hi,

>

> It seems to be using syscall mmap() from what I read this indicates it is

> using memory-mapped IO.

>

> Please see a strace here: http://pastebin.com/6wjhSNrP

>

> Thanks

>

> On Wed, Aug 31, 2016 at 5:51 PM, Sean Redmond <sean.redmond1@xxxxxxxxx>

> wrote:

>>

>> I am not sure how to tell?

>>

>> Server1 and Server2 mount the ceph file system using kernel client 4.7.2

>> and I can replicate the problem using '/usr/bin/sum' to read the file or a

>> http GET request via a web server (apache).

>>

>> On Wed, Aug 31, 2016 at 2:38 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:

>>>

>>> On Wed, Aug 31, 2016 at 12:49 AM, Sean Redmond <sean.redmond1@xxxxxxxxx>

>>> wrote:

>>> > Hi,

>>> >

>>> > I have been able to pick through the process a little further and

>>> > replicate

>>> > it via the command line. The flow seems looks like this:

>>> >

>>> > 1) The user uploads an image to webserver server 'uploader01' it gets

>>> > written to a path such as

>>> > '/cephfs/webdata/static/456/JHL/66448H-755h.jpg'

>>> > on cephfs

>>> >

>>> > 2) The MDS makes the file meta data available for this new file

>>> > immediately

>>> > to all clients.

>>> >

>>> > 3) The 'uploader01' server asynchronously commits the file contents to

>>> > disk

>>> > as sync is not explicitly called during the upload.

>>> >

>>> > 4) Before step 3 is done the visitor requests the file via one of two

>>> > web

>>> > servers server1 or server2 - the MDS provides the meta data but the

>>> > contents

>>> > of the file is not committed to disk yet so the data read returns 0's -

>>> > This

>>> > is then cached by the file system page cache until it expires or is

>>> > flushed

>>> > manually.

>>>

>>> do server1 or server2 use memory-mapped IO to read the file?

>>>

>>> Regards

>>> Yan, Zheng

>>>

>>> >

>>> > 5) As step 4 typically only happens on one of the two web servers

>>> > before

>>> > step 3 is complete we get the mismatch between server1 and server2 file

>>> > system page cache.

>>> >

>>> > The below demonstrates how to reproduce this issue

>>> >

>>> > http://pastebin.com/QK8AemAb

>>> >

>>> > As we can see the checksum of the file returned by the web server is 0

>>> > as

>>> > the file contents has not been flushed to disk from server uploader01

>>> >

>>> > If however we call ‘sync’ as shown below the checksum is correct:

>>> >

>>> > http://pastebin.com/p4CfhEFt

>>> >

>>> > If we also wait for 10 seconds for the kernel to flush the dirty pages,

>>> > we

>>> > can also see the checksum is valid:

>>> >

>>> > http://pastebin.com/1w6UZzNQ

>>> >

>>> > It looks it maybe a race between the time it takes the uploader01

>>> > server to

>>> > commit the file to the file system and the fast incoming read request

>>> > from

>>> > the visiting user to server1 or server2.

>>> >

>>> > Thanks

>>> >

>>> >

>>> > On Tue, Aug 30, 2016 at 10:21 AM, Sean Redmond

>>> > <sean.redmond1@xxxxxxxxx>

>>> > wrote:

>>> >>

>>> >> You are correct it only seems to impact recently modified files.

>>> >>

>>> >> On Tue, Aug 30, 2016 at 3:36 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:

>>> >>>

>>> >>> On Tue, Aug 30, 2016 at 2:11 AM, Gregory Farnum <gfarnum@xxxxxxxxxx>

>>> >>> wrote:

>>> >>> > On Mon, Aug 29, 2016 at 7:14 AM, Sean Redmond

>>> >>> > <sean.redmond1@xxxxxxxxx>

>>> >>> > wrote:

>>> >>> >> Hi,

>>> >>> >>

>>> >>> >> I am running cephfs (10.2.2) with kernel 4.7.0-1. I have noticed

>>> >>> >> that

>>> >>> >> frequently static files are showing empty when serviced via a web

>>> >>> >> server

>>> >>> >> (apache). I have tracked this down further and can see when

>>> >>> >> running a

>>> >>> >> checksum against the file on the cephfs file system on the node

>>> >>> >> serving the

>>> >>> >> empty http response the checksum is '00000'

>>> >>> >>

>>> >>> >> The below shows the checksum on a defective node.

>>> >>> >>

>>> >>> >> [root@server2]# ls -al

>>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg

>>> >>> >> -rw-r--r-- 1 apache apache 53317 Aug 28 23:46

>>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg

>>> >>>

>>> >>> It seems this file was modified recently. Maybe the web server

>>> >>> silently modifies the files. Please check if this issue happens on

>>> >>> older files.

>>> >>>

>>> >>> Regards

>>> >>> Yan, Zheng

>>> >>>

>>> >>> >>

>>> >>> >> [root@server2]# sum /cephfs/webdata/static/456/JHL/66448H-755h.jpg

>>> >>> >> 00000    53

>>> >>> >

>>> >>> > So can we presume there are no file contents, and it's just 53

>>> >>> > blocks

>>> >>> > of zeros?

>>> >>> >

>>> >>> > This doesn't sound familiar to me; Zheng, do you have any ideas?

>>> >>> > Anyway, ceph-fuse shouldn't be susceptible to this bug even with

>>> >>> > the

>>> >>> > page cache enabled; if you're just serving stuff via the web it's

>>> >>> > probably a better idea anyway (harder to break, easier to update,

>>> >>> > etc).

>>> >>> > -Greg

>>> >>> >

>>> >>> >>

>>> >>> >> The below shows the checksum on a working node.

>>> >>> >>

>>> >>> >> [root@server1]# ls -al

>>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg

>>> >>> >> -rw-r--r-- 1 apache apache 53317 Aug 28 23:46

>>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg

>>> >>> >>

>>> >>> >> [root@server1]# sum /cephfs/webdata/static/456/JHL/66448H-755h.jpg

>>> >>> >> 03620    53

>>> >>> >> [root@server1]#

>>> >>> >>

>>> >>> >> If I flush the cache as shown below the checksum returns as

>>> >>> >> expected

>>> >>> >> and the

>>> >>> >> web server serves up valid content.

>>> >>> >>

>>> >>> >> [root@server2]# echo 3 > /proc/sys/vm/drop_caches

>>> >>> >> [root@server2]# sum /cephfs/webdata/static/456/JHL/66448H-755h.jpg

>>> >>> >> 03620    53

>>> >>> >>

>>> >>> >> After some time typically less than 1hr the issue repeats, It

>>> >>> >> seems to

>>> >>> >> not

>>> >>> >> repeat if I take any one of the servers out of the LB and only

>>> >>> >> serve

>>> >>> >> requests from one of the servers.

>>> >>> >>

>>> >>> >> I may try and use the FUSE client has has a mount option direct_io

>>> >>> >> that

>>> >>> >> looks to disable page cache.

>>> >>> >>

>>> >>> >> I have been hunting in the ML and tracker but could not see

>>> >>> >> anything

>>> >>> >> really

>>> >>> >> close to this issue, Any input or feedback on similar experiences

>>> >>> >> is

>>> >>> >> welcome.

>>> >>> >>

>>> >>> >> Thanks

>>> >>> >>

>>> >>> >>

>>> >>> >> _______________________________________________

>>> >>> >> ceph-users mailing list

>>> >>> >> ceph-users@xxxxxxxxxxxxxx

>>> >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>> >>> >>

>>> >>> > _______________________________________________

>>> >>> > ceph-users mailing list

>>> >>> > ceph-users@xxxxxxxxxxxxxx

>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>>> >>

>>> >>

>>> >

>>

>>

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com