Re: ceph daemons stucked in FUTEX_WAIT syscall

Simion Rad <Simion.Rad@xxxxxxxxx> · Tue, 14 Jul 2015 10:30:46 +0000

Hi , 

The output of ceph -s : 

cluster 50961297-815c-4598-8efe-5e08203f9fea
     health HEALTH_OK
     monmap e5: 5 mons at {pshn05=10.71.13.5:6789/0,pshn06=10.71.13.6:6789/0,pshn13=10.71.13.13:6789/0,psosctl111=10.71.13.111:6789/0,psosctl112=10.71.13.112:6789/0}, election epoch 258, quorum 0,1,2,3,4 pshn05,pshn06,pshn13,psosctl111,psosctl112
     mdsmap e173: 1/1/1 up {0=pshn17=up:active}, 4 up:standby
     osdmap e21319: 16 osds: 16 up, 16 in
      pgmap v3301189: 384 pgs, 3 pools, 4906 GB data, 3794 kobjects
            9940 GB used, 10170 GB / 21187 GB avail
                 384 active+clean

I don't use any ceph client (kernel or fuse) on the same nodes that run osd/mon/mds daemons.
Yes, I see slow operations warnings from time to time when I'm looking at ceph -w.
The number of iops on the servers aren't that high and I think the write-back cache of the RAID controller sould be able to help with the journal ops.

Simion Rad.
________________________________________
From: Gregory Farnum [greg@xxxxxxxxxxx]
Sent: Tuesday, July 14, 2015 12:38
To: Simion Rad
Cc: ceph-users@xxxxxxxx
Subject: Re:  ceph daemons stucked in FUTEX_WAIT syscall

On Mon, Jul 13, 2015 at 11:00 PM, Simion Rad <Simion.Rad@xxxxxxxxx> wrote:
> Hi ,
>
> I'm running a small cephFS ( 21 TB , 16 OSDs having different sizes between
> 400G and 3.5 TB ) cluster that is used as a file warehouse (both small and
> big files).
> Every day there are times when a lot of processes running on the client
> servers ( using either fuse of kernel client) become stuck in D state and
> when I run a strace of them I see them waiting in FUTEX_WAIT syscall.
> The same issue I'm able to see on all OSD demons.
> The ceph version I'm running is Firefly 0.80.10 both on clients and on
> server daemons.
> I use ext4 as osd filesystem.
> Operating system on servers : Ubuntu 14.04 and kernel 3.13.
> Operaing system on clients : Ubuntu 12.04 LTS with HWE option kernel 3.13
> The osd daemons are using RAID5 virtual disks (6 x 300 GB 10K RPM disks on
> RAID controller Dell PERC H700 with 512MB BBU using write-back mode).
> The servers which the ceph daemons are running on are also hosting KVM VMs (
> OpenStack Nova ).
> Because of this unfortunate setup the performance is really bad, but at
> least I shouldn't see as many locking issues (or shoud I ? ).
> The only thing which temporarily improves the performance is restarting
> every osd. After such a restart I see some processes on client machines
> resume I/O but only for a couple of
> hours,  then the whole process must be repeated.
> I cannot afford to run a setup without RAID because there isn't enough RAM
> left for a couple of osd daemons.
>
> The ceph.conf settings I use  :
>
> auth cluster required = cephx
> auth service required = cephx
> auth client required = cephx
> filestore xattr use omap = true
> osd pool default size = 2
> osd pool default min size = 1
> osd pool default pg num = 128
> osd pool default pgp num = 128
> public network = 10.71.13.0/24
> cluster network = 10.71.12.0/24
>
> Did someone else experienced this kind of behaviour (stuck processes in
> FUTEX_WAIT syscall) when running firefly release on Ubuntu 14.04 ?

What's the output of "ceph -s" on your cluster?
When your clients get stuck, is the cluster complaining about stuck
ops on the OSDs?
Are you running kernel clients on the same boxes as your OSDs?

If I were to guess I'd imagine that you might just have overloaded
your cluster and the FUTEX_WAIT is the clients waiting for writes to
get acknowledged, but if restarting the OSDs brings everything back up
for a few hours that might not be the case.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com