Re: Reg: PG

David Turner <drakonstein@xxxxxxxxx> · Thu, 04 May 2017 16:12:11 +0000

If you delete and recreate the pools you will indeed lose data.  Your cephfs_metadata pool will have almost no data in it.  I have a 9TB cephfs_data pool and 40MB in the cephfs_metadata pool.  It shouldn't have anywhere near 128 PGs in it based on a cluster this size.  When you increase your cluster size you will want to keep track of how many PGs you have per OSD to maintain your desired ratio.
Like I mentioned earlier, this warning isn't a critical issue as long as you have enough memory to handle this many PGs per OSD daemon.  Assume at least 3x memory usage during recovery over what it uses while it's healthy.  As long as you have the system resources to handle this, you can increase the warning threshold so your cluster is health_ok again.

On Thu, May 4, 2017 at 11:33 AM psuresh <psuresh@xxxxxxxxxxxx> wrote:
 Hi David,

Thanks for your explanation.   I have ran following command to create pg pool. 

ceph osd pool create cephfs_data 128
ceph osd pool create cephfs_metadata 128
ceph fs new dev-ceph-setup cephfs_metadata cephfs_data

Is it a proper way for 3 osd?   

Does delete and recreate pg pools will have data loss on ceph cluster?    

In future if i increase osd count do i need to change pg pool size?  

Regards,
Suresh 

---- On Thu, 04 May 2017 20:11:45 +0530 David Turner <drakonstein@xxxxxxxxx> wrote ----

I'm guessing you have more than just the 1 pool with 128 PGs in your cluster (seeing as you have 320 PGs total, I would guess 2 pools with 128 PGs and 1 pool with 64 PGs).  The combined total number of PGs for all of your pools is 320 and with only 3 OSDs and most likely replica size 3... that leaves you with too many (320) PGs per OSD.  This will not likely affect your testing, but if you want to fix the problem you will need to delete and recreate your pools with a combined lower total number of PGs.

The number of PGs is supposed to reflect how much data each pool is going to have.  If you have 1 pool that will have 75% of your cluster's data, another pool with 20%, and a third pool with 5%... then the number of PGs they have should reflect that.  Based on trying to have somewhere between 100-200 PGs per osd, and the above estimation for data distribution, you should have 128 PGs in the first pool, 32 PGs in the second, and 8 PGs in the third.  Each OSD would have 168 PGs and each PG will be roughly the same size between each pool.  If you were to add more OSDs, then you would need to increase those numbers to account for the additional OSDs to maintain the same distribution.  The above math is only for 3 OSDs.  If you had 6 OSDs, then the goal would be to have somewhere between 200-400 PGs total to maintain the same 100-200 PGs per OSD.

On Thu, May 4, 2017 at 10:24 AM psuresh <psuresh@xxxxxxxxxxxx> wrote:

Hi,

I'm running 3 osd in my test setup.   I have created PG pool with 128 as per the ceph documentation.   
But i'm getting too many PGs warning.   Can anyone clarify? why i'm getting this warning.   

Each OSD contain 240GB disk. 

    cluster 9d325da2-3d87-4b6b-8cca-e52a4b65aa08
     health HEALTH_WARN
            too many PGs per OSD (320 > max 300)
     monmap e2: 3 mons at {dev-ceph-mon1:6789/0,dev-ceph-mon2:6789/0,dev-ceph-mon3:6789/0}
            election epoch 6, quorum 0,1,2 dev-ceph-mon1,dev-ceph-mon2,dev-ceph-mon3
      fsmap e40: 1/1/1 up {0=dev-ceph-mds-active=up:active}
     osdmap e356: 3 osds: 3 up, 3 in
            flags sortbitwise,require_jewel_osds
      pgmap v32407: 320 pgs, 3 pools, 27456 MB data, 220 kobjects
            100843 MB used, 735 GB / 833 GB avail
                 320 active+clean

Regards,
Suresh
_______________________________________________
 ceph-users mailing list
 ceph-users@xxxxxxxxxxxxxx
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com