Re: ceph osd df

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, 10 Jan 2015, Mykola Golub wrote:
> On Mon, Jan 05, 2015 at 11:03:40AM -0800, Sage Weil wrote:
> > We see a fair number of issues and confusion with OSD utilization and 
> > unfortunately there is easy way to see a summary of the current OSD 
> > utilization state.  'ceph pg dump' includes raw data but it not very 
> > friendly.  'ceph osd tree' shows weights but not actual utilization.  
> > 'ceph health detail' tells you the nearfull osds but only when they reach 
> > the warning threshold.
> > 
> > Opened a ticket for a new command that summarizes just the relevant info:
> > 
> > 	http://tracker.ceph.com/issues/10452
> > 
> > Suggestions welcome.  It's a pretty simple implementation (the mon has 
> > all the info; just need to add the command to present it) so I'm hoping it 
> > can get into hammer.  If anyone is interested in doing the 
> > implementation that would be great too!
> 
> I am interested in implementing this.
> 
> Here is my approach, for preliminary review and discussion.
>
> https://github.com/ceph/ceph/pull/3347

Awesome!  I made a few comments on the pull request.

> Only plane text format is available currently. As both "osd only" and
> "tree" outputs look useful I implemented both and added "tree" option
> to tell which to choose.

This sounds fine to me.  We will want to include the formatted output 
before merging, though!

> In http://tracker.ceph.com/issues/10452#note-2 Travis Rhoden suggested
> to extend 'ceph osd tree' command to provide this data instead, but
> I prefer to have many small specialized commands instead of one with
> large output. But if other people also think that it is better to add
> a '--detail' to osd tree instead of new command, I will change this.

Works for me.
 
> Also, I am not sure I got an idea how standard deviation should be
> calculated. Sage's note in 10452:
> 
>  - standard deviation (of normalized
>    actual_osd_utilization/crush_weight/reweight value)
>    
> I don't see why utilization should be normalized by
> reweight/crush_weight ratio? As I understand the goal is to have
> utilization be the same for all devices (thus deviation as small as
> possible), does not matter what reweight values we have?

Yeah, I think you're right.  If I'm reading the code correct you're still 
including reweight in there but I think it can be safely dropped.

> Some examples of command output for my dev environments:
> 
>  % ceph osd df
>  ID WEIGHT REWEIGHT %UTIL VAR  
>  0    1.00     1.00 18.12 1.00 
>  1    1.00     1.00 18.14 1.00 
>  2    1.00     1.00 18.13 1.00 

I wonder if we should try to standardize the table formats.  'ceph osd 
tree' current looks like

# id	weight	type name	up/down	reweight
-1	3	root default
-2	3		host maetl
0	1			osd.0	up	1	
1	1			osd.1	up	1	
2	1			osd.2	up	1	

That is, lowercase headers (with a # header prefix).  It's also not using 
TableFormatter (which it predates).

It's also pretty sloppy with the precision and formatting:

$ ./ceph osd crush reweight osd.1 
.0001
reweighted item id 1 name 'osd.1' to 0.0001 in crush map
$ ./ceph osd tree
# id	weight	type name	up/down	reweight
-1	2	root default
-2	2		host maetl
0	1			osd.0	up	1	
1	9.155e-05			osd.1	up	1	
2	1			osd.2	up	1	
$ ./ceph osd crush reweight osd.1 .001
reweighted item id 1 name 'osd.1' to 0.001 in crush map
$ ./ceph osd tree
# id	weight	type name	up/down	reweight
-1	2.001	root default
-2	2.001		host maetl
0	1			osd.0	up	1	
1	0.0009918			osd.1	up	1	
2	1			osd.2	up	1	

Given that the *actual* precision of these weights is 16.16 bit 
fixed-point, that's a lower bound of .00001.  I'm not sure we want to 
print 1.00000 all the time, though?  Although I suppose it's better than

      1
      2
 .00001

In a perfect world I suppose TableFormatter (or whatever) would adjust the 
precision of all printed values to the highest precision needed by any 
item in the list, but maybe just sticking to 5 digits for 
everything is best for simplicity.

Anyway, any interest in making a single stringify_weight() helper and 
fixing up 'ceph osd tree' to also use it and TableFormatter too?  :)

sage


>  --
>  AVG %UTIL: 18.13  MIN/MAX VAR: 1.00/1.00  DEV: 0
>  
>  % ceph osd df tree
>  ID WEIGHT REWEIGHT %UTIL VAR  NAME            
>  -1   3.00        - 18.13 1.00 root default    
>  -2   3.00        - 18.13 1.00     host zhuzha 
>  0    1.00     1.00 18.12 1.00         osd.0   
>  1    1.00     1.00 18.14 1.00         osd.1   
>  2    1.00     1.00 18.13 1.00         osd.2   
>  --
>  AVG %UTIL: 18.13  MIN/MAX VAR: 1.00/1.00  DEV: 0
>  
>  % ceph osd df
>  ID WEIGHT REWEIGHT %UTIL VAR  
>  0    1.00     1.00 38.15 0.91 
>  1    1.00     1.00 44.15 1.06 
>  2    1.00     1.00 45.66 1.09 
>  3    1.00     1.00 44.15 1.06 
>  4    1.00     0.80 36.82 0.88 
>  --
>  AVG %UTIL: 41.78  MIN/MAX VAR: 0.88/1.09  DEV: 6.19
>  
>  % ceph osd df tree
>  ID WEIGHT REWEIGHT %UTIL VAR  NAME          
>  -1   5.00        - 41.78 1.00 root default  
>  -2   1.00        - 38.15 0.91     host osd1 
>  0    1.00     1.00 38.15 0.91         osd.0 
>  -3   1.00        - 44.15 1.06     host osd2 
>  1    1.00     1.00 44.15 1.06         osd.1 
>  -4   1.00        - 45.66 1.09     host osd3 
>  2    1.00     1.00 45.66 1.09         osd.2 
>  -5   1.00        - 44.15 1.06     host osd4 
>  3    1.00     1.00 44.15 1.06         osd.3 
>  -6   1.00        - 36.82 0.88     host osd5 
>  4    1.00     0.80 36.82 0.88         osd.4 
>  --
>  AVG %UTIL: 41.78  MIN/MAX VAR: 0.88/1.09  DEV: 6.19
> 
> -- 
> Mykola Golub
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux