On Fri, Sep 2, 2016 at 11:35 AM, Sean Redmond <sean.redmond1@xxxxxxxxx> wrote: > Hi, > > That makes sense, I have worked around this by forcing the sync within the > application running under apache and it is working very well now without the > need for the 'sync' mount option. > > What interesting is that in the pastebin provided below it shows a way to > replicate this, I was just using a wget to download a file to the ceph file > system instead of using apache to do the upload, just to simplify it, but > maybe wget is also using memory-mapped IO. That appears to be the case, yeah: https://lists.gnu.org/archive/html/bug-wget/2013-09/msg00004.html I'm starting to feel a little better. Glad you found a workaround. :) -Greg > > http://pastebin.com/QK8AemAb > > Thanks > > On Fri, Sep 2, 2016 at 6:32 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >> >> On Thu, Sep 1, 2016 at 8:02 AM, Sean Redmond <sean.redmond1@xxxxxxxxx> >> wrote: >> > Hi, >> > >> > It seems to be using syscall mmap() from what I read this indicates it >> > is >> > using memory-mapped IO. >> > >> > Please see a strace here: http://pastebin.com/6wjhSNrP >> >> Zheng meant is Apache using memory-mapped IO. From a quick google it >> does in some configurations, but I'm not sure how common it is. >> >> We ask because Ceph does not synchronize mmap IO for you and Apache >> probably isn't doing it either; that would fit the symptoms you're >> seeing. Regular buffered IO should not be exhibiting any of these >> issues, although obviously we can't guarantee there are no bugs. >> -Greg >> >> > >> > Thanks >> > >> > On Wed, Aug 31, 2016 at 5:51 PM, Sean Redmond <sean.redmond1@xxxxxxxxx> >> > wrote: >> >> >> >> I am not sure how to tell? >> >> >> >> Server1 and Server2 mount the ceph file system using kernel client >> >> 4.7.2 >> >> and I can replicate the problem using '/usr/bin/sum' to read the file >> >> or a >> >> http GET request via a web server (apache). >> >> >> >> On Wed, Aug 31, 2016 at 2:38 PM, Yan, Zheng <ukernel@xxxxxxxxx> wrote: >> >>> >> >>> On Wed, Aug 31, 2016 at 12:49 AM, Sean Redmond >> >>> <sean.redmond1@xxxxxxxxx> >> >>> wrote: >> >>> > Hi, >> >>> > >> >>> > I have been able to pick through the process a little further and >> >>> > replicate >> >>> > it via the command line. The flow seems looks like this: >> >>> > >> >>> > 1) The user uploads an image to webserver server 'uploader01' it >> >>> > gets >> >>> > written to a path such as >> >>> > '/cephfs/webdata/static/456/JHL/66448H-755h.jpg' >> >>> > on cephfs >> >>> > >> >>> > 2) The MDS makes the file meta data available for this new file >> >>> > immediately >> >>> > to all clients. >> >>> > >> >>> > 3) The 'uploader01' server asynchronously commits the file contents >> >>> > to >> >>> > disk >> >>> > as sync is not explicitly called during the upload. >> >>> > >> >>> > 4) Before step 3 is done the visitor requests the file via one of >> >>> > two >> >>> > web >> >>> > servers server1 or server2 - the MDS provides the meta data but the >> >>> > contents >> >>> > of the file is not committed to disk yet so the data read returns >> >>> > 0's - >> >>> > This >> >>> > is then cached by the file system page cache until it expires or is >> >>> > flushed >> >>> > manually. >> >>> >> >>> do server1 or server2 use memory-mapped IO to read the file? >> >>> >> >>> Regards >> >>> Yan, Zheng >> >>> >> >>> > >> >>> > 5) As step 4 typically only happens on one of the two web servers >> >>> > before >> >>> > step 3 is complete we get the mismatch between server1 and server2 >> >>> > file >> >>> > system page cache. >> >>> > >> >>> > The below demonstrates how to reproduce this issue >> >>> > >> >>> > http://pastebin.com/QK8AemAb >> >>> > >> >>> > As we can see the checksum of the file returned by the web server is >> >>> > 0 >> >>> > as >> >>> > the file contents has not been flushed to disk from server >> >>> > uploader01 >> >>> > >> >>> > If however we call ‘sync’ as shown below the checksum is correct: >> >>> > >> >>> > http://pastebin.com/p4CfhEFt >> >>> > >> >>> > If we also wait for 10 seconds for the kernel to flush the dirty >> >>> > pages, >> >>> > we >> >>> > can also see the checksum is valid: >> >>> > >> >>> > http://pastebin.com/1w6UZzNQ >> >>> > >> >>> > It looks it maybe a race between the time it takes the uploader01 >> >>> > server to >> >>> > commit the file to the file system and the fast incoming read >> >>> > request >> >>> > from >> >>> > the visiting user to server1 or server2. >> >>> > >> >>> > Thanks >> >>> > >> >>> > >> >>> > On Tue, Aug 30, 2016 at 10:21 AM, Sean Redmond >> >>> > <sean.redmond1@xxxxxxxxx> >> >>> > wrote: >> >>> >> >> >>> >> You are correct it only seems to impact recently modified files. >> >>> >> >> >>> >> On Tue, Aug 30, 2016 at 3:36 AM, Yan, Zheng <ukernel@xxxxxxxxx> >> >>> >> wrote: >> >>> >>> >> >>> >>> On Tue, Aug 30, 2016 at 2:11 AM, Gregory Farnum >> >>> >>> <gfarnum@xxxxxxxxxx> >> >>> >>> wrote: >> >>> >>> > On Mon, Aug 29, 2016 at 7:14 AM, Sean Redmond >> >>> >>> > <sean.redmond1@xxxxxxxxx> >> >>> >>> > wrote: >> >>> >>> >> Hi, >> >>> >>> >> >> >>> >>> >> I am running cephfs (10.2.2) with kernel 4.7.0-1. I have >> >>> >>> >> noticed >> >>> >>> >> that >> >>> >>> >> frequently static files are showing empty when serviced via a >> >>> >>> >> web >> >>> >>> >> server >> >>> >>> >> (apache). I have tracked this down further and can see when >> >>> >>> >> running a >> >>> >>> >> checksum against the file on the cephfs file system on the node >> >>> >>> >> serving the >> >>> >>> >> empty http response the checksum is '00000' >> >>> >>> >> >> >>> >>> >> The below shows the checksum on a defective node. >> >>> >>> >> >> >>> >>> >> [root@server2]# ls -al >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >> >>> >>> >> -rw-r--r-- 1 apache apache 53317 Aug 28 23:46 >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >> >>> >>> >> >>> >>> It seems this file was modified recently. Maybe the web server >> >>> >>> silently modifies the files. Please check if this issue happens on >> >>> >>> older files. >> >>> >>> >> >>> >>> Regards >> >>> >>> Yan, Zheng >> >>> >>> >> >>> >>> >> >> >>> >>> >> [root@server2]# sum >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >> >>> >>> >> 00000 53 >> >>> >>> > >> >>> >>> > So can we presume there are no file contents, and it's just 53 >> >>> >>> > blocks >> >>> >>> > of zeros? >> >>> >>> > >> >>> >>> > This doesn't sound familiar to me; Zheng, do you have any ideas? >> >>> >>> > Anyway, ceph-fuse shouldn't be susceptible to this bug even with >> >>> >>> > the >> >>> >>> > page cache enabled; if you're just serving stuff via the web >> >>> >>> > it's >> >>> >>> > probably a better idea anyway (harder to break, easier to >> >>> >>> > update, >> >>> >>> > etc). >> >>> >>> > -Greg >> >>> >>> > >> >>> >>> >> >> >>> >>> >> The below shows the checksum on a working node. >> >>> >>> >> >> >>> >>> >> [root@server1]# ls -al >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >> >>> >>> >> -rw-r--r-- 1 apache apache 53317 Aug 28 23:46 >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >> >>> >>> >> >> >>> >>> >> [root@server1]# sum >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >> >>> >>> >> 03620 53 >> >>> >>> >> [root@server1]# >> >>> >>> >> >> >>> >>> >> If I flush the cache as shown below the checksum returns as >> >>> >>> >> expected >> >>> >>> >> and the >> >>> >>> >> web server serves up valid content. >> >>> >>> >> >> >>> >>> >> [root@server2]# echo 3 > /proc/sys/vm/drop_caches >> >>> >>> >> [root@server2]# sum >> >>> >>> >> /cephfs/webdata/static/456/JHL/66448H-755h.jpg >> >>> >>> >> 03620 53 >> >>> >>> >> >> >>> >>> >> After some time typically less than 1hr the issue repeats, It >> >>> >>> >> seems to >> >>> >>> >> not >> >>> >>> >> repeat if I take any one of the servers out of the LB and only >> >>> >>> >> serve >> >>> >>> >> requests from one of the servers. >> >>> >>> >> >> >>> >>> >> I may try and use the FUSE client has has a mount option >> >>> >>> >> direct_io >> >>> >>> >> that >> >>> >>> >> looks to disable page cache. >> >>> >>> >> >> >>> >>> >> I have been hunting in the ML and tracker but could not see >> >>> >>> >> anything >> >>> >>> >> really >> >>> >>> >> close to this issue, Any input or feedback on similar >> >>> >>> >> experiences >> >>> >>> >> is >> >>> >>> >> welcome. >> >>> >>> >> >> >>> >>> >> Thanks >> >>> >>> >> >> >>> >>> >> >> >>> >>> >> _______________________________________________ >> >>> >>> >> ceph-users mailing list >> >>> >>> >> ceph-users@xxxxxxxxxxxxxx >> >>> >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>> >>> >> >> >>> >>> > _______________________________________________ >> >>> >>> > ceph-users mailing list >> >>> >>> > ceph-users@xxxxxxxxxxxxxx >> >>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >>> >> >> >>> >> >> >>> > >> >> >> >> >> > > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com