Hello all,
When I try to add more than one osds to a host and the backfilling process starts , all the osd daemons except one of them become stuck in D state. When this happends they are shown as out and down (when running ceph osd tree). The only way I can kill the processes is to remove the osds from crushmap , then run kill -9 on them and then wait for a couple of minutes. There are no exception messages in osd logs and the dmesg looks ok too (nothing out of the ordinary). I run ceph firefly 0.80.10 on Ubuntu 14.04 (linux 3.13). The osds are running on RAID0 LUNs (2 drives for every diskgroup) created on a Dell MD3000 array with Hitachi hard drives (450 GB , 15K RPM). The issue happens even with 2 or 3 osds active on the host. I have only 1 Gb/s link to the host. Could the network bandwidth be the issue ? The settings from sysctl.conf: net.core.netdev_max_backlog = 250000 net.core.optmem_max = 16777216 net.core.rmem_default = 16777216 net.core.wmem_default = 16777216 net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 net.ipv4.tcp_mem = 16777216 16777216 16777216 net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 87380 16777216 net.ipv4.tcp_low_latency = 1 net.ipv4.tcp_sack = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.all.rp_filter = 0 net.ipv4.ip_forward = 1 net.ipv4.tcp_tw_recycle = 0 net.ipv4.tcp_tw_reuse = 0 net.ipv4.tcp_window_scaling = 0 net.ipv4.route.flush=1 vm.min_free_kbytes = 2640322 vm.swappiness = 0 vm.overcommit_memory = 1 vm.oom_kill_allocating_task = 0 vm.dirty_expire_centisecs = 360000 vm.dirty_writeback_centisecs = 360000 kernel.pid_max = 4194303 fs.file-max = 16815744 vm.dirty_ratio = 99 vm.dirty_background_ratio = 99 vm.vfs_cache_pressure = 100 Thanks, Simion Rad. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com