Re: Multiple kernel RBD clients failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Travis,

Both you and Yan saw the same thing, in that the drives in my test system go from 300GB to 4TB. I used ceph-deploy to create all the OSDs, which I assume picked the weights of 0.26 for my 300GB drives, and 3.63 for my 4TB drives. All the OSDs that are reporting nearly full are the 300GB drives. As several of my OSDs reached 94% full, I assume that is when the krbd driver reported no more space. It would be nice if "ceph health detail" reported the OSD as full and not as near full:

# ceph health detail
HEALTH_WARN 9 near full osd(s)
osd.9 is near full at 85%
osd.29 is near full at 85%
osd.43 is near full at 91%
osd.45 is near full at 88%
osd.47 is near full at 88%
osd.55 is near full at 94%
osd.59 is near full at 94%
osd.67 is near full at 94%
osd.83 is near full at 94%

I have put into my development notes that a 10x size difference in drives in a pool is a bad idea. In my next test I will run with just the 3TB and 4TB drives.

Thank you for confirming what was going on.

Eric

Eric,

Yeah, your OSD weights are a little crazy...

For example, looking at one host from your output of "ceph osd
tree"...

-3      31.5            host tca23
1       3.63                    osd.1   up      1
7       0.26                    osd.7   up      1
13      2.72                    osd.13  up      1
19      2.72                    osd.19  up      1
25      0.26                    osd.25  up      1
31      3.63                    osd.31  up      1
37      2.72                    osd.37  up      1
43      0.26                    osd.43  up      1
49      3.63                    osd.49  up      1
55      0.26                    osd.55  up      1
61      3.63                    osd.61  up      1
67      0.26                    osd.67  up      1
73      3.63                    osd.73  up      1
79      0.26                    osd.79  up      1
85      3.63                    osd.85  up      1

osd.7 is set to 0.26, with others set to > 3.  Under normal
circumstances, the rule of thumb would be to set weights equal to the
disk size in TB.  So, a 2TB disk would have a weight of 2, a 1.5TB
disk == 1.5, etc.

These weights control what proportion of data is directed to each OSD.
>I'm guessing you do have very different size disks, though, as it
looks like the disk that are reporting near full all have relatively
small weights (OSD 43 is at 91%, weight = 0.26).  Is this really a
260GB disk?  A mix of HDD and SSDs? or maybe just a small partition?
Either way, you probably have something wrong with the weights.  I'd
look into that.  Having a single pool made of disks of such varied
size may not be a good option, but I'm not sure if that's your setup
or not.

To the best of my knowledge, Ceph halts IO operations when any disk
reaches the near full scenario (85% by default).  I'm not 100% certain
on that one, but I believe that is true.

Hope that helps,

- Travis

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux