Re: Uneven data distribution with even pg distribution after rebalancing

"shadow_lin"<shadow_lin@xxxxxxx> · Tue, 26 Jun 2018 13:10:24 +0800

This is the formated pg dump result:
https://pasteboard.co/HrBZv3s.png

You can see the pg distribution of each pool on each osd is fine.

2018-06-26 

shadow_lin 

  发件人：David Turner <drakonstein@xxxxxxxxx>
  发送时间：2018-06-26 10:32
  主题：Re: Re: Re: [ceph-users] Uneven data distribution 
  with even pg distribution after rebalancing
  收件人："shadow_lin"<shadow_lin@xxxxxxx>
  抄送："ceph-users"<ceph-users@xxxxxxxxxxxxxx>

  If you look at ceph pg dump, you'll see the size ceph believes each PG 
  is. From your ceph df, your PGs for the rbd_pool will be almost zero. So if 
  you have an osd with 6 of those PGs and another with none of them, but both 
  osds have the same number of PGs overall... The osd with none of them will be 
  more full than the other. I bet that the osd you had that was really full just 
  had less of those PGs than the rest.

  On Mon, Jun 25, 2018, 10:25 PM shadow_lin <shadow_lin@xxxxxxx> wrote:

    Hi David,
    I am sure most(if not all) data are in one 
    pool.
    rbd_pool is only for omap for EC 
    rbd.

    ceph df:

GLOBAL:

    SIZE     AVAIL       RAW 
    USED     %RAW USED

    427T     
    100555G         
    329T         77.03

POOLS:

    NAME            
    ID     USED     
    %USED     MAX AVAIL     
    OBJECTS

    ec_rbd_pool     3      
    219T     81.40        
    50172G     57441718

    rbd_pool        
    4       
    144         
    0        
    37629G           
    19

    2018-06-26 

    shadow_lin 

      发件人：David Turner <drakonstein@xxxxxxxxx>
      发送时间：2018-06-26 10:21
      主题：Re: Re: [ceph-users] Uneven data distribution 
      with even pg distribution after rebalancing

      收件人："shadow_lin"<shadow_lin@xxxxxxx>
      抄送："ceph-users"<ceph-users@xxxxxxxxxxxxxx>

      You have 2 different pools. PGs in each pool are going to be a 
      different size.  It's like saying 12x + 13y should equal 2x + 23y 
      because they each have 25 X's and Y's. Having equal PG counts on each osd 
      is only balanced if you have a single pool or have a case where all PGs 
      are identical in size. The latter is not likely.

      On Mon, Jun 25, 2018, 10:02 PM shadow_lin <shadow_lin@xxxxxxx> 
      wrote:

      Hi 
        David,
    I am afraid I can't run the command you provide 
        now,because I tried to remove another osd on that host to see if it 
        would make the data distribution even and it did.
    The 
        pg number of my pools are at power of 2.
    Below is from 
        my note before removed another osd:
        pool 
        3 'ec_rbd_pool' erasure size 6 min_size 5 crush_rule 2 object_hash 
        rjenkins pg_num 1024 pgp_num 1024 last_change 3248 flags 
        hashpspool,ec_overwrites,nearfull stripe_width 16384 application 
        rbd
        pool 4 'rbd_pool' replicated size 2 
        min_size 1 crush_rule 0 object_hash rjenkins pg_num 128 pgp_num 128 
        last_change 3248 flags hashpspool,nearfull stripe_width 0 application 
        rbd
    pg distribution of osd of all pools:

          https://pasteboard.co/HrBZv3s.png

        What I don't understand is why data distribution is uneven when pg 
        distribution is even.

2018-06-26 

shadow_lin 

发件人：David Turner <drakonstein@xxxxxxxxx>
发送时间：2018-06-26 
        01:24
主题：Re: [ceph-users] Uneven data distribution with even pg 
        distribution after rebalancing
收件人："shadow_lin"<shadow_lin@xxxxxxx>
抄送："ceph-users"<ceph-users@xxxxxxxxxxxxxx>

I should be able 
        to answer this question for you if you can supply the output of the 
        following commands.  It will print out all of your pool names along 
        with how many PGs are in that pool.  My guess is that you don't 
        have a power of 2 number of PGs in your pool.  Alternatively you 
        might have multiple pools and the PGs from the various pools are just 
        different sizes.

ceph osd lspools | tr ',' '\n' | awk 
        '/^[0-9]/ {print $2}' | while read pool; do echo $pool: $(ceph osd pool 
        get $pool pg_num | cut -d' ' -f2); done
ceph df

For me the 
        output looks like this.
rbd: 64
cephfs_metadata: 
        64
cephfs_data: 256
rbd-ssd: 32

GLOBAL:

          SIZE       AVAIL      RAW 
        USED     %RAW USED
    46053G    
         26751G       19301G        
         41.91
POOLS:
    NAME        
                ID     USED    
           %USED     MAX AVAIL    
         OBJECTS
    rbd-replica        
         4        897G     11.36  
               7006G      263000

          cephfs_metadata     6        
        141M      0.05          
        268G       11945
    cephfs_data  
               7      10746G    
         43.41        14012G    
         2795782
    rbd-replica-ssd    
         9        241G     47.30  
                268G      
         75061

On Sun, Jun 24, 2018 at 9:48 PM shadow_lin <shadow_lin@xxxxxxx> wrote:

Hi 
        List,
   The enviroment is:
   Ceph 
        12.2.4
   Balancer module on and in upmap mode

         Failure domain is per host, 2 OSD per host
   EC k=4 
        m=2
   PG distribution is almost even before and after the 
        rebalancing.

   After marking out one of the osd,I 
        noticed a lot of the data was moving into the other osd on the same host 
        .

   Ceph osd df result is(osd.20 and osd.21 are in the 
        same host and osd.20 was marked out):

ID CLASS WEIGHT  
        REWEIGHT SIZE  USE   AVAIL %USE  VAR  
        PGS
19   hdd 9.09560  1.00000 9313G 7079G 2233G 76.01 
        1.00 135
21   hdd 9.09560  1.00000 9313G 8123G 1190G 
        87.21 1.15 135
22   hdd 9.09560  1.00000 9313G 7026G 
        2287G 75.44 1.00 133
23   hdd 9.09560  1.00000 9313G 
        7026G 2286G 75.45 1.00 134

   I am using RBD only so 
        the objects should all be 4m .I don't understand why osd 21 got 
        significant more data 
with the same pg as other osds.

         Is this behavior expected or I misconfiged something or  some 
        kind of bug?

         Thanks

2018-06-25
shadow_lin 

_______________________________________________
ceph-users 
        mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com