After some creative PG surgery, everything is coming back online cleanly. I went through one at a time(80-90 PG's) on the least filled(new osd.5) and export-remove'd each PG that was causing the assertion failures after testing starting the OSD. # tail -f /var/log/ceph/ceph-osd.5.log | grep -A1 "unlocked" (This helped identify the PG loaded right before the assertion failure). I've kept the flawed PG's with the striping issue in case they were needed for anything later on. This allowed the OSD to finally start with only clean PG's from that pool left. Then I went through and started that process on the other OSD(0), which is going to take forever because that had existing data. Paused with that, and I identified the incomplete/inactive PG's and then exported those from the downed osd.0, then imported into the osd.5 that was able to come online. Some of the imports identified split PG's where there were contents for other missing PG's as part of a few of the imports. Using the import capability while specifiying the split pgid allowed those additional objects to import and to satisfy all of the missing shards for additional objects that I hadn't yet identified source PG's for. 5/6 OSD's up and running, and now all of the PG's are active now, and all of the data is back working. Still undersized/backfilling/moving but it seems there isn't any data loss.
Now I can either continue going through one at a time removing the erroneous PG's from osd.0 or potentially blow it away and start a fresh OSD. Is there a recommended path there?
Second question, if I bring up the original OSD after pruning all of the flawed PG copies with the stripe issue, is it important to remove the leftover PG copies that were successfully imported into osd.5? I'm thinking I would want to, and can leave the exports around just in case. Once data starts changing(new writes) I would imagine the exports wouldn't work(or could they potentially screw something up?)
After all of this, i'm going to make a new cephfs filesystem with a new metadata/data pool with the newer ec settings to copy all of the data over into with fresh PG's, and might consider moving to k=4,m=2 instead ;)
On Wed, Jul 3, 2019 at 2:28 PM Austin Workman <soilflames@xxxxxxxxx> wrote:
That makes more sense.Setting min_size = 4 on the EC pool allows data to flow again(kind of not really because of the still missing 22 other PG's) maybe this automatically raised to 5 when I adjusted the EC pool originally?, outside of the 21 unknown and 1 down PG which are probably depending on the two OSD's. These are probably the 22 PG's that actually got fully moved around(maybe even converted to k=5/m=1?). Would be great if I can find a way to start those other two OSD's, and just deal with whatever state is causing the OSD's to crash.On Wed, Jul 3, 2019 at 2:18 PM Janne Johansson <icepic.dz@xxxxxxxxx> wrote:Den ons 3 juli 2019 kl 20:51 skrev Austin Workman <soilflames@xxxxxxxxx>:But a very strange number shows up in the active sections of the pg's that's the same number roughly as 2147483648..... This seems very odd, and maybe the value got lodged somewhere it doesn't belong which is causing an issue.That pg number is "-1" or something for a signed 32bit int, which means "I don't know which one it was anymore" which you can get in PG lists when OSDs are gone.--May the most significant bit of your life be positive.
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com