Re: mon not marking dead osds down and slow streaming write performance

Sage Weil <sage@xxxxxxxxxxx> · Thu, 20 Dec 2012 10:47:26 -0800 (PST)

On Wed, 19 Dec 2012, Michael Chapman wrote:
> Hi all,
> 
> I apologise if this list is only for dev issues and not for operators,
> I didn't see a more general list on the ceph website.
> 
> I have 5 OSD processes per host, and an FC uplink port failure caused
> kernel panics in two hosts - 0404 and 0401. The mon log looks like
> this:
> 
> 2012-12-19 13:30:38.634865 7f9a0f167700 10 mon.3@0(leader).osd e2184
> preprocess_query osd_failure(osd.404 172.22.4.4:6812/12835 for 8832
> e2184 v2184) v3 from osd.602 172.22.4.6:6806/5152
> 2012-12-19 13:30:38.634875 7f9a0f167700  5 mon.3@0(leader).osd e2184
> can_mark_down current up_ratio 0.298429 < min 0.3, will not mark
> osd.404 down

This probably means that there are too many in osds and not enough of them 
are up.  Can you attach a 'ceph osd dump'?

It may also be that it's because your osd ids are too sparse.. if that's 
the case, this is a bug.  But just a heads up that you don't get much 
control over the osd id that is assigned, so trying to keep them in sync 
with the host may be a losing battle.  :/

sage

> 2012-12-19 13:30:38.634880 7f9a0f167700  5 mon.3@0(leader).osd e2184 preprocess_
> 
> The cluster appears healthy
> 
> root@os-0405:~# ceph -s
>    health HEALTH_OK
>    monmap e3: 1 mons at {3=172.22.4.5:6789/0}, election epoch 1, quorum 0 3
>    osdmap e2184: 191 osds: 57 up, 57 in
>     pgmap v205386: 121952 pgs: 121951 active+clean, 1
> active+clean+scrubbing; 4437 MB data, 49497 MB used, 103 TB / 103 TB
> avail
>    mdsmap e1: 0/0/1 up
> 
> root@os-0405:~# ceph osd tree
> 
> # id    weight  type name       up/down reweight
> -1      30      pool default
> -3      30              rack unknownrack
> -2      6                       host os-0401
> 100     1                               osd.100 up      1
> 101     1                               osd.101 up      1
> 102     1                               osd.102 up      1
> 103     1                               osd.103 up      1
> 104     1                               osd.104 up      1
> 112     1                               osd.112 up      1
> -4      6                       host os-0402
> 200     1                               osd.200 up      1
> 201     1                               osd.201 up      1
> 202     1                               osd.202 up      1
> 203     1                               osd.203 up      1
> 204     1                               osd.204 up      1
> 212     1                               osd.212 up      1
> -5      6                       host os-0403
> 300     1                               osd.300 up      1
> 301     1                               osd.301 up      1
> 302     1                               osd.302 up      1
> 303     1                               osd.303 up      1
> 304     1                               osd.304 up      1
> 312     1                               osd.312 up      1
> -6      6                       host os-0404
> 400     1                               osd.400 up      1
> 401     1                               osd.401 up      1
> 402     1                               osd.402 up      1
> 403     1                               osd.403 up      1
> 404     1                               osd.404 up      1
> 412     1                               osd.412 up      1
> -7      0                       host os-0405
> -8      6                       host os-0406
> 600     1                               osd.600 up      1
> 601     1                               osd.601 up      1
> 602     1                               osd.602 up      1
> 603     1                               osd.603 up      1
> 604     1                               osd.604 up      1
> 612     1                               osd.612 up      1
> 
> but os-0404 has no osd processes running anymore.
> 
> root@os-0404:~# ps aux | grep ceph
> root      4964  0.0  0.0   9628   920 pts/1    S+   13:31   0:00 grep
> --color=auto ceph
> 
> and even if it did, it can't access the luns in order to mount the xfs
> filesystems with all the osd data.
> 
> What is preventing the mon from marking the osds on 0404 down?
> 
> A second issue I have been having is that my reads+writes are very
> bursty, going from 8MB/s to 200MB/s when doing a dd from a physical
> client over 10GbE. It seems to be waiting on the mon most of the time,
> and iostat shows long io wait times for the disk the mon is using. I
> can also see it writing ~40MB/s constantly to disk in iotop, though I
> don't know if this is random or sequential. I see a lot of waiting for
> sub ops which I thought might be a result of the io wait.
> 
> Is that a normal amount of activity for a mon process? Should I be
> running the mon processes off more than just a single sata disk to
> keep up with ~30 OSD processes?
> 
> Thanks for your time.
> 
>  - Michael Chapman
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html