Hi,
I am running cephfs (10.2.2) with kernel 4.7.0-1. I have noticed that frequently static files are showing empty when serviced via a web server (apache). I have tracked this down further and can see when running a checksum against the file on the cephfs file system on the node serving the empty http response the checksum is '00000'
The below shows the checksum on a defective node.
[root@server2]# ls -al /cephfs/webdata/static/456/JHL/66448H-755h.jpg
-rw-r--r-- 1 apache apache 53317 Aug 28 23:46 /cephfs/webdata/static/456/JHL/66448H-755h.jpg
[root@server2]# sum /cephfs/webdata/static/456/JHL/66448H-755h.jpg
00000 53
The below shows the checksum on a working node.
[root@server1]# ls -al /cephfs/webdata/static/456/JHL/66448H-755h.jpg
-rw-r--r-- 1 apache apache 53317 Aug 28 23:46 /cephfs/webdata/static/456/JHL/66448H-755h.jpg
[root@server1]# sum /cephfs/webdata/static/456/JHL/66448H-755h.jpg
03620 53
[root@server1]#
If I flush the cache as shown below the checksum returns as expected and the web server serves up valid content.
[root@server2]# echo 3 > /proc/sys/vm/drop_caches
[root@server2]# sum /cephfs/webdata/static/456/JHL/66448H-755h.jpg
03620 53
After some time typically less than 1hr the issue repeats, It seems to not repeat if I take any one of the servers out of the LB and only serve requests from one of the servers.
I may try and use the FUSE client has has a mount option direct_io that looks to disable page cache.
I have been hunting in the ML and tracker but could not see anything really close to this issue, Any input or feedback on similar experiences is welcome.
Thanks
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com