Re: recoverying from 95% full osd

Roman Hlynovskiy <roman.hlynovskiy@xxxxxxxxx> · Thu, 10 Jan 2013 11:32:56 +0600

Hello again!

I left the system in working state overnight and got it in a wierd
state this morning:

chef@ceph-node02:/var/log/ceph$ ceph -s
   health HEALTH_OK
   monmap e4: 3 mons at
{a=192.168.7.11:6789/0,b=192.168.7.12:6789/0,c=192.168.7.13:6789/0},
election epoch 254, quorum 0,1,2 a,b,c
   osdmap e348: 3 osds: 3 up, 3 in
    pgmap v114606: 384 pgs: 384 active+clean; 161 GB data, 326 GB
used, 429 GB / 755 GB avail
   mdsmap e4623: 1/1/1 up {0=b=up:active}, 1 up:standby

so, it looks ok from the first point of view,  however I am not able
to mount ceph from any of nodes:
be01:~# mount /var/www/jroger.org/data
mount: 192.168.7.11:/: can't read superblock

on the nodes, which had ceph mounted yesterday I am able to look
through the filesystem, but any kind of data read causes client to
hang.

I made a trace on the active mds with debug ms/mds = 20
(http://wh.of.kz/ceph_logs.tar.gz)
Could you please help to identify what's going on.

2013/1/9 Roman Hlynovskiy <roman.hlynovskiy@xxxxxxxxx>:
>>> How many pgs do you have? ('ceph osd dump | grep ^pool').
>>
>> I believe this is it. 384 PGs, but three pools of which only one (or maybe a second one, sort of) is in use. Automatically setting the right PG counts is coming some day, but until then being able to set up pools of the right size is a big gotcha. :(
>> Depending on how mutable the data is, recreate with larger PG counts on the pools in use. Otherwise we can do something more detailed.
>> -Greg
>
> hm... what would be recommended PG size per pool ?
>
> chef@cephgw:~$ ceph osd lspools
> 0 data,1 metadata,2 rbd,
> chef@cephgw:~$ ceph osd pool get data pg_num
> PG_NUM: 128
> chef@cephgw:~$ ceph osd pool get metadata pg_num
> PG_NUM: 128
> chef@cephgw:~$ ceph osd pool get rbd pg_num
> PG_NUM: 128
>
> according to the http://ceph.com/docs/master/rados/operations/placement-groups/
>
>                 (OSDs * 100)
> Total PGs = ------------
>                    Replicas
>
> I have 3 OSDs and 2 replicas for each object, which gives recommended PG = 150
>
> will it make much difference to set 150 instead of 128 or I should
> base on different values?
>
> btw, just one more off-topic question:
>
> chef@ceph-node03:~$ ceph pg dump| egrep -v '^(0\.|1\.|2\.)'| column -t
> dumped             all        in         format     plain
> version            113906
> last_osdmap_epoch  323
> last_pg_scan       1
> full_ratio         0.95
> nearfull_ratio     0.85
> pg_stat            objects    mip        degr       unf    bytes
>   log           disklog   state     state_stamp  v  reported  up
> acting  last_scrub  scrub_stamp  last_deep_scrub  deep_scrub_stamp
> pool               0          74748      0          0      0
>   286157692336  17668034  17668034
> pool               1          618        0          0      0
>   131846062     6414518   6414518
> pool               2          0          0          0      0
>   0             0         0
> sum                75366      0          0          0
> 286289538398  24082552      24082552
> osdstat            kbused     kbavail    kb         hb     in
>   hb            out
> 0                  157999220  106227596  264226816  [1,2]  []
> 1                  185604948  78621868   264226816  [0,2]  []
> 2                  219475396  44751420   264226816  [0,1]  []
> sum                563079564  229600884  792680448
>
> pool 0 (data) is used for data storage
> pool 1 (metadata) is used for metadata storage
>
> what is pool 2 (rbd) for? looks like it's absolutely empty.
>
>
>>
>>>
>>> You might also adjust the crush tunables, see
>>>
>>> http://ceph.com/docs/master/rados/operations/crush-map/?highlight=tunable#tunables
>>>
>>> sage
>>>
>
> Thanks for the link, Sage I set tunable values according to the doc.
> Btw, online document is missing magical param for crushmap which
> allows those scary_tunables )
>
>
>
> --
> ...WBR, Roman Hlynovskiy

-- 
...WBR, Roman Hlynovskiy
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html