Re: cephfs page cache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I finally reproduced this issue. Adding following lines to httpd.conf
can workaround this issue.

EnableMMAP off
EnableSendfile off




On Sat, Sep 3, 2016 at 11:07 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
> On Fri, Sep 2, 2016 at 5:10 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> wrote:
>> I have checked all the servers in scope running 'dmesg | grep -i stale' and
>> it does not yield any results.
>>
>> As a test I have rebooted the servers in scope and I can still replicate the
>> behavior 100% of the time.
>>
>
> Can you reproduce this bug manually? (updating file on one server and
> reading the file on another server). If you do, please enable
> debug_mds=10 , repeat the steps that reproduce this and send log to
> us.
>
> Regards
> Yan, Zheng
>
>
>> On Fri, Sep 2, 2016 at 4:37 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>>
>>> I think about this again. This issue could be caused by stale session.
>>> Could you check kernel logs of your servers. Are there any ceph
>>> related kernel message (such as "ceph: mds0 caps stale")
>>>
>>> Regards
>>> Yan, Zheng
>>>
>>>
>>> On Thu, Sep 1, 2016 at 11:02 PM, Sean Redmond <sean.redmond1@xxxxxxxxx>
>>> wrote:
>>> > Hi,
>>> >
>>> > It seems to be using syscall mmap() from what I read this indicates it
>>> > is
>>> > using memory-mapped IO.
>>> >
>>> > Please see a strace here: http://pastebin.com/6wjhSNrP
>>> >
>>> > Thanks
>>> >
>>> > On Wed, Aug 31, 2016 at 5:51 PM, Sean Redmond <sean.redmond1@xxxxxxxxx>
>>> > wrote:
>>> >>
>>> >> I am not sure how to tell?
>>> >>
>>> >> Server1 and Server2 mount the ceph file system using kernel client
>>> >> 4.7.2
>>> >> and I can replicate the problem using '/usr/bin/sum' to read the file
>>> >> or a
>>> >> http GET request via a web server (apache).
>>> >>
>>> >> On Wed, Aug 31, 2016 at 2:38 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>> >>>
>>> >>> On Wed, Aug 31, 2016 at 12:49 AM, Sean Redmond
>>> >>> <sean.redmond1@xxxxxxxxx>
>>> >>> wrote:
>>> >>> > Hi,
>>> >>> >
>>> >>> > I have been able to pick through the process a little further and
>>> >>> > replicate
>>> >>> > it via the command line. The flow seems looks like this:
>>> >>> >
>>> >>> > 1) The user uploads an image to webserver server 'uploader01' it
>>> >>> > gets
>>> >>> > written to a path such as
>>> >>> > '/cephfs/webdata/static/456/JHL/66448H-755h.jpg'
>>> >>> > on cephfs
>>> >>> >
>>> >>> > 2) The MDS makes the file meta data available for this new file
>>> >>> > immediately
>>> >>> > to all clients.
>>> >>> >
>>> >>> > 3) The 'uploader01' server asynchronously commits the file contents
>>> >>> > to
>>> >>> > disk
>>> >>> > as sync is not explicitly called during the upload.
>>> >>> >
>>> >>> > 4) Before step 3 is done the visitor requests the file via one of
>>> >>> > two
>>> >>> > web
>>> >>> > servers server1 or server2 - the MDS provides the meta data but the
>>> >>> > contents
>>> >>> > of the file is not committed to disk yet so the data read returns
>>> >>> > 0's -
>>> >>> > This
>>> >>> > is then cached by the file system page cache until it expires or is
>>> >>> > flushed
>>> >>> > manually.
>>> >>>
>>> >>> do server1 or server2 use memory-mapped IO to read the file?
>>> >>>
>>> >>> Regards
>>> >>> Yan, Zheng
>>> >>>
>>> >>> >
>>> >>> > 5) As step 4 typically only happens on one of the two web servers
>>> >>> > before
>>> >>> > step 3 is complete we get the mismatch between server1 and server2
>>> >>> > file
>>> >>> > system page cache.
>>> >>> >
>>> >>> > The below demonstrates how to reproduce this issue
>>> >>> >
>>> >>> > http://pastebin.com/QK8AemAb
>>> >>> >
>>> >>> > As we can see the checksum of the file returned by the web server is
>>> >>> > 0
>>> >>> > as
>>> >>> > the file contents has not been flushed to disk from server
>>> >>> > uploader01
>>> >>> >
>>> >>> > If however we call ‘sync’ as shown below the checksum is correct:
>>> >>> >
>>> >>> > http://pastebin.com/p4CfhEFt
>>> >>> >
>>> >>> > If we also wait for 10 seconds for the kernel to flush the dirty
>>> >>> > pages,
>>> >>> > we
>>> >>> > can also see the checksum is valid:
>>> >>> >
>>> >>> > http://pastebin.com/1w6UZzNQ
>>> >>> >
>>> >>> > It looks it maybe a race between the time it takes the uploader01
>>> >>> > server to
>>> >>> > commit the file to the file system and the fast incoming read
>>> >>> > request
>>> >>> > from
>>> >>> > the visiting user to server1 or server2.
>>> >>> >
>>> >>> > Thanks
>>> >>> >
>>> >>> >
>>> >>> > On Tue, Aug 30, 2016 at 10:21 AM, Sean Redmond
>>> >>> > <sean.redmond1@xxxxxxxxx>
>>> >>> > wrote:
>>> >>> >>
>>> >>> >> You are correct it only seems to impact recently modified files.
>>> >>> >>
>>> >>> >> On Tue, Aug 30, 2016 at 3:36 AM, Yan, Zheng <ukernel@xxxxxxxxx>
>>> >>> >> wrote:
>>> >>> >>>
>>> >>> >>> On Tue, Aug 30, 2016 at 2:11 AM, Gregory Farnum
>>> >>> >>> <gfarnum@xxxxxxxxxx>
>>> >>> >>> wrote:
>>> >>> >>> > On Mon, Aug 29, 2016 at 7:14 AM, Sean Redmond
>>> >>> >>> > <sean.redmond1@xxxxxxxxx>
>>> >>> >>> > wrote:
>>> >>> >>> >> Hi,
>>> >>> >>> >>
>>> >>> >>> >> I am running cephfs (10.2.2) with kernel 4.7.0-1. I have
>>> >>> >>> >> noticed
>>> >>> >>> >> that
>>> >>> >>> >> frequently static files are showing empty when serviced via a
>>> >>> >>> >> web
>>> >>> >>> >> server
>>> >>> >>> >> (apache). I have tracked this down further and can see when
>>> >>> >>> >> running a
>>> >>> >>> >> checksum against the file on the cephfs file system on the node
>>> >>> >>> >> serving the
>>> >>> >>> >> empty http response the checksum is '00000'
>>> >>> >>> >>
>>> >>> >>> >> The below shows the checksum on a defective node.
>>> >>> >>> >>
>>> >>> >>> >> [root@server2]# ls -al
>>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
>>> >>> >>> >> -rw-r--r-- 1 apache apache 53317 Aug 28 23:46
>>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
>>> >>> >>>
>>> >>> >>> It seems this file was modified recently. Maybe the web server
>>> >>> >>> silently modifies the files. Please check if this issue happens on
>>> >>> >>> older files.
>>> >>> >>>
>>> >>> >>> Regards
>>> >>> >>> Yan, Zheng
>>> >>> >>>
>>> >>> >>> >>
>>> >>> >>> >> [root@server2]# sum
>>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
>>> >>> >>> >> 00000    53
>>> >>> >>> >
>>> >>> >>> > So can we presume there are no file contents, and it's just 53
>>> >>> >>> > blocks
>>> >>> >>> > of zeros?
>>> >>> >>> >
>>> >>> >>> > This doesn't sound familiar to me; Zheng, do you have any ideas?
>>> >>> >>> > Anyway, ceph-fuse shouldn't be susceptible to this bug even with
>>> >>> >>> > the
>>> >>> >>> > page cache enabled; if you're just serving stuff via the web
>>> >>> >>> > it's
>>> >>> >>> > probably a better idea anyway (harder to break, easier to
>>> >>> >>> > update,
>>> >>> >>> > etc).
>>> >>> >>> > -Greg
>>> >>> >>> >
>>> >>> >>> >>
>>> >>> >>> >> The below shows the checksum on a working node.
>>> >>> >>> >>
>>> >>> >>> >> [root@server1]# ls -al
>>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
>>> >>> >>> >> -rw-r--r-- 1 apache apache 53317 Aug 28 23:46
>>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
>>> >>> >>> >>
>>> >>> >>> >> [root@server1]# sum
>>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
>>> >>> >>> >> 03620    53
>>> >>> >>> >> [root@server1]#
>>> >>> >>> >>
>>> >>> >>> >> If I flush the cache as shown below the checksum returns as
>>> >>> >>> >> expected
>>> >>> >>> >> and the
>>> >>> >>> >> web server serves up valid content.
>>> >>> >>> >>
>>> >>> >>> >> [root@server2]# echo 3 > /proc/sys/vm/drop_caches
>>> >>> >>> >> [root@server2]# sum
>>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg
>>> >>> >>> >> 03620    53
>>> >>> >>> >>
>>> >>> >>> >> After some time typically less than 1hr the issue repeats, It
>>> >>> >>> >> seems to
>>> >>> >>> >> not
>>> >>> >>> >> repeat if I take any one of the servers out of the LB and only
>>> >>> >>> >> serve
>>> >>> >>> >> requests from one of the servers.
>>> >>> >>> >>
>>> >>> >>> >> I may try and use the FUSE client has has a mount option
>>> >>> >>> >> direct_io
>>> >>> >>> >> that
>>> >>> >>> >> looks to disable page cache.
>>> >>> >>> >>
>>> >>> >>> >> I have been hunting in the ML and tracker but could not see
>>> >>> >>> >> anything
>>> >>> >>> >> really
>>> >>> >>> >> close to this issue, Any input or feedback on similar
>>> >>> >>> >> experiences
>>> >>> >>> >> is
>>> >>> >>> >> welcome.
>>> >>> >>> >>
>>> >>> >>> >> Thanks
>>> >>> >>> >>
>>> >>> >>> >>
>>> >>> >>> >> _______________________________________________
>>> >>> >>> >> ceph-users mailing list
>>> >>> >>> >> ceph-users@xxxxxxxxxxxxxx
>>> >>> >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >>> >>> >>
>>> >>> >>> > _______________________________________________
>>> >>> >>> > ceph-users mailing list
>>> >>> >>> > ceph-users@xxxxxxxxxxxxxx
>>> >>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >>> >>
>>> >>> >>
>>> >>> >
>>> >>
>>> >>
>>> >
>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux