On 8/18/19 6:43 PM, Brad Hubbard wrote:
That's this code.
3114 switch (alg) {
3115 case CRUSH_BUCKET_UNIFORM:
3116 size = sizeof(crush_bucket_uniform);
3117 break;
3118 case CRUSH_BUCKET_LIST:
3119 size = sizeof(crush_bucket_list);
3120 break;
3121 case CRUSH_BUCKET_TREE:
3122 size = sizeof(crush_bucket_tree);
3123 break;
3124 case CRUSH_BUCKET_STRAW:
3125 size = sizeof(crush_bucket_straw);
3126 break;
3127 case CRUSH_BUCKET_STRAW2:
3128 size = sizeof(crush_bucket_straw2);
3129 break;
3130 default:
3131 {
3132 char str[128];
3133 snprintf(str, sizeof(str), "unsupported bucket algorithm:
%d", alg);
3134 throw buffer::malformed_input(str);
3135 }
3136 }
CRUSH_BUCKET_UNIFORM = 1
CRUSH_BUCKET_LIST = 2
CRUSH_BUCKET_TREE = 3
CRUSH_BUCKET_STRAW = 4
CRUSH_BUCKET_STRAW2 = 5
So valid values for bucket algorithms are 1 through 5 but, for
whatever reason, at least one of yours is being interpreted as "-1"
this doesn't seem like something that would just happen spontaneously
with no changes to the cluster.
What recent changes have you made to the osdmap? What recent changes
have you made to the crushmap? Have you recently upgraded?
Brad,
There were no recent changes to the cluster/osd config to my knowledge.
The only person who would make any such changes should have been me. A
few weeks ago, we added 90 new HDD OSDs all at once and the cluster was
still backfilling onto those, but none of the pools on the now-affected
OSDs were involved in that.
It seems that all of the SSDs are likely to be in this same state, but I
haven't checked every single one.
I sent a complete image of one of the 1TB OSDs (compressed to about
41GB) via ceph-post-file. I put it the id in the tracker issue I opened
for this, https://tracker.ceph.com/issues/41240
I don't know if you or any other devs could use that for further
insight, but I'm hopeful.
Thanks,
-Troy
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com