On Sun, Nov 19, 2017 at 9:34 PM, Tracy Reed <treed@xxxxxxxxxxxxxxx> wrote: > On Sun, Nov 19, 2017 at 02:20:41AM PST, Gregory Farnum spake thusly: >> Oh, I meant one of the things that will show how the OSDs are arranged in >> the crush map. I think there's a more complete crush dump, or the osd tree. >> -Greg > > Ah, ok. Here's an OSD tree. The 5 down OSDs have been down for quite a > while. I need to clean those up. They should not be related to the > current situation. Okay, so the hosts look okay (although very uneven numbers of OSDs). But the sizes are pretty wonky. Are the disks really that mismatched in size? I note that many of them in host10 are set to 1.0, but most of the others are some fraction less than that. On reboot, unless you've disabled it, the OSDs are checking their size and host and submitting that to the monitors. If that is setting a different size for the OSD than it was previously recorded as (eg, from "1" to "fractions of a terabyte I really am") it will set a new weight and rebalance data accordingly. I imagine that this is the problem, one way or another. Search for stuff around the crush_location_hook option and related scripts. -Greg > > $ ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 89.57471 root default > -2 1.98000 host ceph01 > 50 1.98000 osd.50 up 0.63564 1.00000 > -3 12.24998 host ceph02 > 0 4.09999 osd.0 up 0.73538 1.00000 > 1 4.14999 osd.1 up 0.73756 1.00000 > 61 4.00000 osd.61 up 0.72095 1.00000 > -4 5.64998 host ceph03 > 68 1.79999 osd.68 up 0.49768 1.00000 > 69 2.00000 osd.69 up 0.51645 1.00000 > 51 1.84999 osd.51 up 0.55830 1.00000 > -5 7.07480 host ceph04 > 6 0.13300 osd.6 up 0.43202 1.00000 > 7 0.07999 osd.7 up 0.38002 1.00000 > 8 0.13300 osd.8 up 0.62971 1.00000 > 9 0.14400 osd.9 up 0.37433 1.00000 > 10 0.12000 osd.10 up 0.48959 1.00000 > 11 0.12000 osd.11 up 0.80687 1.00000 > 12 0.09999 osd.12 up 0.54256 1.00000 > 13 0.10999 osd.13 up 0.50609 1.00000 > 14 0.12000 osd.14 up 0.47690 1.00000 > 15 0.13300 osd.15 up 0.38451 1.00000 > 17 0.09999 osd.17 up 0.43924 1.00000 > 18 0.10999 osd.18 up 0.61641 1.00000 > 19 0.09999 osd.19 up 0.28282 1.00000 > 20 0.13300 osd.20 up 0.51439 1.00000 > 21 0.07999 osd.21 up 0.59743 1.00000 > 22 0.15999 osd.22 up 0.56779 1.00000 > 23 0.10999 osd.23 up 0.52554 1.00000 > 24 0.09999 osd.24 up 0.38129 1.00000 [32/9674] > 25 0.13300 osd.25 up 0.74908 1.00000 > 26 0.09999 osd.26 up 0.46111 1.00000 > 27 0.12999 osd.27 up 0.77921 1.00000 > 28 0.98999 osd.28 up 0.45566 1.00000 > 29 1.81799 osd.29 up 0.49480 1.00000 > 30 1.81799 osd.30 up 0.55083 1.00000 > -6 18.09805 host ceph05 > 39 0.12999 osd.39 up 0.62727 1.00000 > 40 0.10999 osd.40 up 0.55324 1.00000 > 41 0.09999 osd.41 up 0.35399 1.00000 > 42 0.12000 osd.42 up 0.33769 1.00000 > 44 0.10999 osd.44 up 0.77229 1.00000 > 45 0.14999 osd.45 up 0.33379 1.00000 > 32 0.12000 osd.32 up 0.64165 1.00000 > 33 0.10999 osd.33 up 0.43105 1.00000 > 34 0.12000 osd.34 up 0.39639 1.00000 > 35 0.15999 osd.35 up 0.48267 1.00000 > 36 0.12999 osd.36 up 0.43410 1.00000 > 37 0.09999 osd.37 up 0.30211 1.00000 > 53 0.12000 osd.53 up 0.42612 1.00000 > 54 0.27199 osd.54 up 0.52895 1.00000 > 55 0.21999 osd.55 up 0.41858 1.00000 > 56 0.27199 osd.56 up 0.53149 1.00000 > 57 0.17999 osd.57 up 0.44426 1.00000 > 58 0.23499 osd.58 up 0.42625 1.00000 > 59 0.27199 osd.59 up 0.46089 1.00000 > 60 0.27199 osd.60 up 0.58600 1.00000 > 31 0.12000 osd.31 up 0.42482 1.00000 > 3 0.89999 osd.3 up 0.52879 1.00000 > 47 1.98000 osd.47 up 0.57500 1.00000 > 46 4.14999 osd.46 up 0.76709 1.00000 > 38 0.04880 osd.38 down 0 1.00000 > 48 1.98000 osd.48 up 0.61130 1.00000 > 49 1.98000 osd.49 up 0.62679 1.00000 > 2 1.81850 osd.2 down 0 1.00000 > 43 1.81799 osd.43 up 0.51196 1.00000 > -7 7.27399 host ceph06 > 52 1.81799 osd.52 up 0.53760 1.00000 > 62 1.81799 osd.62 up 0.59447 1.00000 > 64 1.81799 osd.64 up 0.55638 1.00000 > 63 1.81799 osd.63 up 0.60434 1.00000 > -8 5.45398 host ceph07 > 65 1.81799 osd.65 up 0.52611 1.00000 > 67 1.81799 osd.67 up 0.61052 1.00000 > 70 1.81799 osd.70 up 0.56075 1.00000 > -9 2.69798 host ceph08 > 4 0.90900 osd.4 up 0.45261 1.00000 > 5 0.87999 osd.5 up 0.46480 1.00000 > 16 0.90900 osd.16 up 0.48987 1.00000 > -10 29.09595 host ceph10 > 66 1.81850 osd.66 down 0 1.00000 > 71 1.81850 osd.71 down 0 1.00000 > 72 1.81850 osd.72 down 0 1.00000 > 73 1.81850 osd.73 up 0.89394 1.00000 > 74 1.81850 osd.74 up 1.00000 1.00000 > 75 1.81850 osd.75 up 0.99260 1.00000 > 76 1.81850 osd.76 up 1.00000 1.00000 > 77 1.81850 osd.77 up 0.94002 1.00000 > 78 1.81850 osd.78 up 0.96278 1.00000 > 79 1.81850 osd.79 up 1.00000 1.00000 > 80 1.81850 osd.80 up 0.99380 1.00000 > 81 1.81850 osd.81 up 1.00000 1.00000 > 82 1.81850 osd.82 up 1.00000 1.00000 > 83 1.81850 osd.83 up 1.00000 1.00000 > 84 1.81850 osd.84 up 1.00000 1.00000 > 85 1.81850 osd.85 up 1.00000 1.00000 > > > -- > Tracy Reed > http://tracyreed.org > Digital signature attached for your safety. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com