I finally reproduced this issue. Adding following lines to httpd.conf can workaround this issue. EnableMMAP off EnableSendfile off On Sat, Sep 3, 2016 at 11:07 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote: > On Fri, Sep 2, 2016 at 5:10 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> wrote: >> I have checked all the servers in scope running 'dmesg | grep -i stale' and >> it does not yield any results. >> >> As a test I have rebooted the servers in scope and I can still replicate the >> behavior 100% of the time. >> > > Can you reproduce this bug manually? (updating file on one server and > reading the file on another server). If you do, please enable > debug_mds=10 , repeat the steps that reproduce this and send log to > us. > > Regards > Yan, Zheng > > >> On Fri, Sep 2, 2016 at 4:37 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote: >>> >>> I think about this again. This issue could be caused by stale session. >>> Could you check kernel logs of your servers. Are there any ceph >>> related kernel message (such as "ceph: mds0 caps stale") >>> >>> Regards >>> Yan, Zheng >>> >>> >>> On Thu, Sep 1, 2016 at 11:02 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> >>> wrote: >>> > Hi, >>> > >>> > It seems to be using syscall mmap() from what I read this indicates it >>> > is >>> > using memory-mapped IO. >>> > >>> > Please see a strace here: http://pastebin.com/6wjhSNrP >>> > >>> > Thanks >>> > >>> > On Wed, Aug 31, 2016 at 5:51 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> >>> > wrote: >>> >> >>> >> I am not sure how to tell? >>> >> >>> >> Server1 and Server2 mount the ceph file system using kernel client >>> >> 4.7.2 >>> >> and I can replicate the problem using '/usr/bin/sum' to read the file >>> >> or a >>> >> http GET request via a web server (apache). >>> >> >>> >> On Wed, Aug 31, 2016 at 2:38 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote: >>> >>> >>> >>> On Wed, Aug 31, 2016 at 12:49 AM, Sean Redmond >>> >>> <sean.redmond1@xxxxxxxxx> >>> >>> wrote: >>> >>> > Hi, >>> >>> > >>> >>> > I have been able to pick through the process a little further and >>> >>> > replicate >>> >>> > it via the command line. The flow seems looks like this: >>> >>> > >>> >>> > 1) The user uploads an image to webserver server 'uploader01' it >>> >>> > gets >>> >>> > written to a path such as >>> >>> > '/cephfs/webdata/static/456/JHL/66448H-755h.jpg' >>> >>> > on cephfs >>> >>> > >>> >>> > 2) The MDS makes the file meta data available for this new file >>> >>> > immediately >>> >>> > to all clients. >>> >>> > >>> >>> > 3) The 'uploader01' server asynchronously commits the file contents >>> >>> > to >>> >>> > disk >>> >>> > as sync is not explicitly called during the upload. >>> >>> > >>> >>> > 4) Before step 3 is done the visitor requests the file via one of >>> >>> > two >>> >>> > web >>> >>> > servers server1 or server2 - the MDS provides the meta data but the >>> >>> > contents >>> >>> > of the file is not committed to disk yet so the data read returns >>> >>> > 0's - >>> >>> > This >>> >>> > is then cached by the file system page cache until it expires or is >>> >>> > flushed >>> >>> > manually. >>> >>> >>> >>> do server1 or server2 use memory-mapped IO to read the file? >>> >>> >>> >>> Regards >>> >>> Yan, Zheng >>> >>> >>> >>> > >>> >>> > 5) As step 4 typically only happens on one of the two web servers >>> >>> > before >>> >>> > step 3 is complete we get the mismatch between server1 and server2 >>> >>> > file >>> >>> > system page cache. >>> >>> > >>> >>> > The below demonstrates how to reproduce this issue >>> >>> > >>> >>> > http://pastebin.com/QK8AemAb >>> >>> > >>> >>> > As we can see the checksum of the file returned by the web server is >>> >>> > 0 >>> >>> > as >>> >>> > the file contents has not been flushed to disk from server >>> >>> > uploader01 >>> >>> > >>> >>> > If however we call ‘sync’ as shown below the checksum is correct: >>> >>> > >>> >>> > http://pastebin.com/p4CfhEFt >>> >>> > >>> >>> > If we also wait for 10 seconds for the kernel to flush the dirty >>> >>> > pages, >>> >>> > we >>> >>> > can also see the checksum is valid: >>> >>> > >>> >>> > http://pastebin.com/1w6UZzNQ >>> >>> > >>> >>> > It looks it maybe a race between the time it takes the uploader01 >>> >>> > server to >>> >>> > commit the file to the file system and the fast incoming read >>> >>> > request >>> >>> > from >>> >>> > the visiting user to server1 or server2. >>> >>> > >>> >>> > Thanks >>> >>> > >>> >>> > >>> >>> > On Tue, Aug 30, 2016 at 10:21 AM, Sean Redmond >>> >>> > <sean.redmond1@xxxxxxxxx> >>> >>> > wrote: >>> >>> >> >>> >>> >> You are correct it only seems to impact recently modified files. >>> >>> >> >>> >>> >> On Tue, Aug 30, 2016 at 3:36 AM, Yan, Zheng <ukernel@xxxxxxxxx> >>> >>> >> wrote: >>> >>> >>> >>> >>> >>> On Tue, Aug 30, 2016 at 2:11 AM, Gregory Farnum >>> >>> >>> <gfarnum@xxxxxxxxxx> >>> >>> >>> wrote: >>> >>> >>> > On Mon, Aug 29, 2016 at 7:14 AM, Sean Redmond >>> >>> >>> > <sean.redmond1@xxxxxxxxx> >>> >>> >>> > wrote: >>> >>> >>> >> Hi, >>> >>> >>> >> >>> >>> >>> >> I am running cephfs (10.2.2) with kernel 4.7.0-1. I have >>> >>> >>> >> noticed >>> >>> >>> >> that >>> >>> >>> >> frequently static files are showing empty when serviced via a >>> >>> >>> >> web >>> >>> >>> >> server >>> >>> >>> >> (apache). I have tracked this down further and can see when >>> >>> >>> >> running a >>> >>> >>> >> checksum against the file on the cephfs file system on the node >>> >>> >>> >> serving the >>> >>> >>> >> empty http response the checksum is '00000' >>> >>> >>> >> >>> >>> >>> >> The below shows the checksum on a defective node. >>> >>> >>> >> >>> >>> >>> >> [root@server2]# ls -al >>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >>> >>> >>> >> -rw-r--r-- 1 apache apache 53317 Aug 28 23:46 >>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >>> >>> >>> >>> >>> >>> It seems this file was modified recently. Maybe the web server >>> >>> >>> silently modifies the files. Please check if this issue happens on >>> >>> >>> older files. >>> >>> >>> >>> >>> >>> Regards >>> >>> >>> Yan, Zheng >>> >>> >>> >>> >>> >>> >> >>> >>> >>> >> [root@server2]# sum >>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >>> >>> >>> >> 00000 53 >>> >>> >>> > >>> >>> >>> > So can we presume there are no file contents, and it's just 53 >>> >>> >>> > blocks >>> >>> >>> > of zeros? >>> >>> >>> > >>> >>> >>> > This doesn't sound familiar to me; Zheng, do you have any ideas? >>> >>> >>> > Anyway, ceph-fuse shouldn't be susceptible to this bug even with >>> >>> >>> > the >>> >>> >>> > page cache enabled; if you're just serving stuff via the web >>> >>> >>> > it's >>> >>> >>> > probably a better idea anyway (harder to break, easier to >>> >>> >>> > update, >>> >>> >>> > etc). >>> >>> >>> > -Greg >>> >>> >>> > >>> >>> >>> >> >>> >>> >>> >> The below shows the checksum on a working node. >>> >>> >>> >> >>> >>> >>> >> [root@server1]# ls -al >>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >>> >>> >>> >> -rw-r--r-- 1 apache apache 53317 Aug 28 23:46 >>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >>> >>> >>> >> >>> >>> >>> >> [root@server1]# sum >>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >>> >>> >>> >> 03620 53 >>> >>> >>> >> [root@server1]# >>> >>> >>> >> >>> >>> >>> >> If I flush the cache as shown below the checksum returns as >>> >>> >>> >> expected >>> >>> >>> >> and the >>> >>> >>> >> web server serves up valid content. >>> >>> >>> >> >>> >>> >>> >> [root@server2]# echo 3 > /proc/sys/vm/drop_caches >>> >>> >>> >> [root@server2]# sum >>> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >>> >>> >>> >> 03620 53 >>> >>> >>> >> >>> >>> >>> >> After some time typically less than 1hr the issue repeats, It >>> >>> >>> >> seems to >>> >>> >>> >> not >>> >>> >>> >> repeat if I take any one of the servers out of the LB and only >>> >>> >>> >> serve >>> >>> >>> >> requests from one of the servers. >>> >>> >>> >> >>> >>> >>> >> I may try and use the FUSE client has has a mount option >>> >>> >>> >> direct_io >>> >>> >>> >> that >>> >>> >>> >> looks to disable page cache. >>> >>> >>> >> >>> >>> >>> >> I have been hunting in the ML and tracker but could not see >>> >>> >>> >> anything >>> >>> >>> >> really >>> >>> >>> >> close to this issue, Any input or feedback on similar >>> >>> >>> >> experiences >>> >>> >>> >> is >>> >>> >>> >> welcome. >>> >>> >>> >> >>> >>> >>> >> Thanks >>> >>> >>> >> >>> >>> >>> >> >>> >>> >>> >> _______________________________________________ >>> >>> >>> >> ceph-users mailing list >>> >>> >>> >> ceph-users@xxxxxxxxxxxxxx >>> >>> >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> >> >>> >>> >>> > _______________________________________________ >>> >>> >>> > ceph-users mailing list >>> >>> >>> > ceph-users@xxxxxxxxxxxxxx >>> >>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >> >>> >>> >> >>> >>> > >>> >> >>> >> >>> > >> >> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com