Re: osd daemons stuck in D state

Jan Schermer <jan@xxxxxxxxxxx> · Mon, 27 Jul 2015 14:38:15 +0200

When those processes become blocked are the drives busy or idle?
Can you post the output from
"ps -awexo pid,tt,user,fname,tmout,f,wchan” on those processes when that happens?

My guess would be they really are waiting for the disk array for some reason - can you check if you can read/write to the OSD partitions when this happens? iostat output?
And not sure what your HBA is but sometimes very bad things happen when you saturate the drives and cache completely, like lowering the LUN queue depth to 1 (which completely kills IO) or even dropping commands (but that would likely show in dmesg). This situation is often masked by driver completely. And ceph is really good at saturating drives.

Also how much memory does your machine have? vm.min_free_kbytes = 2640322 looks pretty high to me and it could block anything anytime if kswapd kicks in and starts cleaning pages. (But the whole system would be unusable so you’d likely notice that)

Jan

On 27 Jul 2015, at 13:24, Simion Rad <Simion.Rad@xxxxxxxxx> wrote:

Hello all, 

When I try to add more than one osds to a host and the backfilling process starts , all the osd daemons except one of them become stuck in D state. When this happends they are shown as out and down (when running ceph osd tree).

The only way I can kill the processes is to 
remove the osds  from crushmap , then run kill -9 on them and then wait for a couple of minutes.
There are no exception messages in osd logs and the dmesg looks ok too (nothing out of the ordinary).
I run ceph firefly 0.80.10 on Ubuntu 14.04 (linux 3.13).
The osds are running on RAID0 LUNs (2 drives for every diskgroup) created on a Dell MD3000 array with Hitachi hard drives (450 GB , 15K RPM).
The issue happens even with 2 or 3 osds active on the host.
I have only 1 Gb/s link to the host. Could the network bandwidth be the issue ? 

The settings from sysctl.conf:

net.core.netdev_max_backlog = 250000
net.core.optmem_max = 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_mem = 16777216 16777216 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 87380 16777216
net.ipv4.tcp_low_latency = 1
net.ipv4.tcp_sack = 0
net.ipv4.tcp_timestamps = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.rp_filter = 0
net.ipv4.ip_forward = 1
net.ipv4.tcp_tw_recycle = 0
net.ipv4.tcp_tw_reuse = 0
net.ipv4.tcp_window_scaling = 0
net.ipv4.route.flush=1
vm.min_free_kbytes = 2640322
vm.swappiness = 0
vm.overcommit_memory = 1
vm.oom_kill_allocating_task = 0
vm.dirty_expire_centisecs = 360000
vm.dirty_writeback_centisecs = 360000
kernel.pid_max = 4194303
fs.file-max = 16815744
vm.dirty_ratio = 99
vm.dirty_background_ratio = 99
vm.vfs_cache_pressure = 100

Thanks, 
Simion Rad.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com