Hi ,
I'm running a small cephFS ( 21 TB , 16 OSDs having different sizes between 400G and 3.5 TB ) cluster that is used as a file warehouse (both small and big files).
Every day there are times when a lot of processes running on the client servers ( using either fuse of kernel client) become stuck in D state and when I run a strace of them I see them waiting in FUTEX_WAIT syscall.
The same issue I'm able to see on all OSD demons.
The ceph version I'm running is Firefly 0.80.10 both on clients and on server daemons.
I use ext4 as osd filesystem.
Operating system on servers : Ubuntu 14.04 and kernel 3.13.
Operaing system on clients : Ubuntu 12.04 LTS with HWE option kernel 3.13
The osd daemons are using RAID5 virtual disks (6 x 300 GB 10K RPM disks on RAID controller Dell PERC H700 with 512MB BBU using write-back mode).
The servers which the ceph daemons are running on are also hosting KVM VMs ( OpenStack Nova ).
Because of this unfortunate setup the performance is really bad, but at least I shouldn't see as many locking issues (or shoud I ? ).
The only thing which temporarily improves the performance is restarting every osd. After such a restart I see some processes on client machines resume I/O but only for a couple of
hours, then the whole process must be repeated.
I cannot afford to run a setup without RAID because there isn't enough RAM left for a couple of osd daemons.
The ceph.conf settings I use :
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
filestore xattr use omap = true
osd pool default size = 2
osd pool default min size = 1
osd pool default pg num = 128
osd pool default pgp num = 128
public network = 10.71.13.0/24
cluster network = 10.71.12.0/24
Did someone else experienced this kind of behaviour (stuck processes in FUTEX_WAIT syscall) when running firefly release on Ubuntu 14.04 ?
Thanks,
Simion Rad.
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com