Relation between PGs and OSDs and disk space.

Thorvald Hallvardsson <thorvald.hallvardsson@xxxxxxxxx> · Thu, 13 Feb 2014 16:13:19 +0000

Hi guys,

First of all I would like to welcome everyone as it's my first post to that group.

I have got some general and basic CEPH questions (for you probably). I found that the official documentation is not really the best of the best in this subject and a lots of things are actually pretty confusing - some of them completely different in the real world. 

I'm just starting with CEPH and everything is just brand new to me. I have got some questions from my initial findings so far. 

I have got a cluster containing 3 monitors and 9 OSDs (1TB each) on 3 different nodes (so 3 OSDs on each node). 

I created a test pool which is at the moment destroyed however when I was testing some things and rebooted one of the monitors to see what happens everything was fine but had the message "HEALTH_WARN pool test has too few pgs". I increased numer of pg and pgp to 300 for that pool and everything just started resyncing. Fair enough. However before I did that I had something like:

cluster 8b7e94df-3a7f-4d45-8a7c-d61c0ad25478
     health HEALTH_WARN pool test has too few pgs
     monmap e1: 3 mons at {ceph-test-mon-01=172.17.12.11:6789/0,ceph-test-mon-02=172.17.12.12:6789/0,ceph-test-mon-03=172.17.12.13:6789/0}, election epoch 6, quorum 0,1,2 ceph-test-mon-01,ceph-test-mon-02,ceph-test-mon-03                                                                                                                                                                              

     osdmap e109: 9 osds: 9 up, 9 in                                                                                                                                                                          
      pgmap v1580: 200 pgs, 4 pools, 40912 MB data, 10232 objects                                                                                                                                             

            121 GB used, 7283 GB / 7404 GB avail                                                                                                                                                              
                 200 active+clean

So the interesting part is actually here:
121 GB used, 7283 GB / 7404 GB avail 

# ceph df                                                                 
GLOBAL:                                                                                             

    SIZE      AVAIL     RAW USED     %RAW USED                                                      
    8329G     8209G     120G         1.45                                                           

POOLS:

    NAME         ID     USED       %USED     OBJECTS 
    data         0      0          0         0       
    metadata     1      0          0         0       
    rbd          2      139        0         2       

    test        3      40912M     0.48      10230   

and after seting pg and pgp_num to 300 I had this:

cluster 8b7e94df-3a7f-4d45-8a7c-d61c0ad25478                                                                                 

     health HEALTH_OK                                                                                                            
     monmap e1: 3 mons at {ceph-test-mon-01=172.17.12.11:6789/0,ceph-test-mon-02=172.17.12.12:6789/0,ceph-test-mon-03=172.17.12.13:6789/0}, election epoch 12, quorum 0,1,2 ceph-test-mon-01,ceph-test-mon-02,ceph-test-mon-03                                                                                                                                                                              

     osdmap e643: 9 osds: 9 up, 9 in                                                                                                                                                                           
      pgmap v3399: 492 pgs, 4 pools, 1497 GB data, 374 kobjects                                                                                                                                                

            120 GB used, 8209 GB / 8329 GB avail                                                                                                                                                               
                 492 active+clean 

# ceph df 
GLOBAL:                               
    SIZE      AVAIL     RAW USED     %RAW USED 
    8329G     8209G     120G         1.45      

POOLS:
    NAME         ID     USED      %USED     OBJECTS 

    data         0      0         0         0       
    metadata     1      0         0         0       
    rbd          2      139       0         2       
    test        3      1497G     17.98     383525 

So in the first example I don't get these allocation numbers. If the raw space 120GB used why the pool size is 40GB ? 

Also in the second case when I increased number of pd/pgp that size grew up to 1.4TB. 

So where is that relation and why the pool size grew up so much ?

Also is there any easy way to find how the space is being distributed and what contains what ? I found that I can log to the OSD boxes and df shows me the utilization. However this can be pretty complicated if I will have multiple boxes - let me say more than 10. 

Thank you for your help.

Regards,
TH

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com