cephfs 'lag' / hang

Don Waterloo <don.waterloo@xxxxxxxxx> · Fri, 18 Dec 2015 15:34:40 -0500

I have 3 systems w/ a cephfs mounted on them.And i am seeing material 'lag'. By 'lag' i mean it hangs for little bits of time (1s, sometimes 5s).
But very non repeatable.

If i run 
time find . -type f -print0 | xargs -0 stat > /dev/null
it might take ~130ms.
But, it might take 10s. Once i've done it, it tends to stay @ the ~130ms, suggesting whatever data is now in cache. On the cases it hangs, if i remove the stat, its hanging on the find of one file. It might hiccup 1 or 2 times in the find across 10k files.

This lag might affect e.g. 'cwd', writing a file, basically all operations.

Does anyone have any suggestions? Its very irritating problem. I do no see errors in dmesg.

The 3 systems w/ the filesystem mounted are running Ubuntu 15.10 w/ 4.3.0-040300-generic kernel. They are running cephfs from the kernel driver, mounted in /etc/fstab as:

10.100.10.60,10.100.10.61,10.100.10.62:/ /cephfs ceph _netdev,noauto,noatime,x-systemd.requires=network-online.target,x-systemd.automount,x-systemd.device-timeout=10,name=admin,secret=XXXX== 0 2

I have 3 mds, 1 active, 2 standby. The 3 machines are also the mons {nubo-1/-2/-3} are the ones that have the cephfs mounted.

They have a 9K mtu between the systems, and i have checked with ping -s ### -M do <dest> that there are no blackholes in size... up to 8954 works, and and 8955 gives 'would fragment'.

All the storage devices are 1TB Samsung SSD, and all are on sata. There is no material load on the system while this is occurring (a bit of background fs usage i guess, but its otherwise idle, just me).

$ ceph status
    cluster b23abffc-71c4-4464-9449-3f2c9fbe1ded
     health HEALTH_OK
     monmap e1: 3 mons at {nubo-1=10.100.10.60:6789/0,nubo-2=10.100.10.61:6789/0,nubo-3=10.100.10.62:6789/0}
            election epoch 1070, quorum 0,1,2 nubo-1,nubo-2,nubo-3
     mdsmap e587: 1/1/1 up {0=nubo-2=up:active}, 2 up:standby
     osdmap e2346: 6 osds: 6 up, 6 in
      pgmap v113350: 840 pgs, 6 pools, 143 GB data, 104 kobjects
            288 GB used, 5334 GB / 5622 GB avail
                 840 active+clean

I've checked and the network between them is perfect: no loss, ~no latency ( << 1ms, they are adjacent on an L2 segment), as are all the osd [there are 6 osd].

ceph osd tree
ID WEIGHT  TYPE NAME        UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 5.48996 root default                                       
-2 0.89999     host nubo-1                                    
 0 0.89999         osd.0         up  1.00000          1.00000 
-3 0.89999     host nubo-2                                    
 1 0.89999         osd.1         up  1.00000          1.00000 
-4 0.89999     host nubo-3                                    
 2 0.89999         osd.2         up  1.00000          1.00000 
-5 0.92999     host nubo-19                                   
 3 0.92999         osd.3         up  1.00000          1.00000 
-6 0.92999     host nubo-20                                   
 4 0.92999         osd.4         up  1.00000          1.00000 
-7 0.92999     host nubo-21                                   
 5 0.92999         osd.5         up  1.00000          1.00000 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com