> > The majority of the pools have ‘replicated size 3 min_size 2’. > Groovy. > I do see a few pools such as .rgw.control and a few others have ‘replicated size 3 min_size 1’. Not a good way to run. Set min_size to 2 after you get healthy. > I am not using erasure encoding and none of the pools are set to ‘replicated size 3 min_size 3’. Odd that you’re in this situation. You might increase the retries in your crush rules. You might also set min_ size temporarily to 1 on pool #0, which may let these PGs activate and recover, then immediately set back to 2, then investigate if all PGs now have a full acting set. NB There is some risk here. > > Thank you, > > Shain > > > From: Anthony D'Atri <anthony.datri@xxxxxxxxx> > Date: Sunday, October 13, 2024 at 11:29 AM > To: Shain Miley <SMiley@xxxxxxx> > Cc: ceph-users@xxxxxxx <ceph-users@xxxxxxx> > Subject: Re: Reduced data availability: 3 pgs inactive, 3 pgs down > !-------------------------------------------------------------------| > External Email - Use Caution > > |-------------------------------------------------------------------! > > When you get the cluster healthy, redeploy those Filestore OSDs as BlueStore. Not before. > > > Does you r pool have size=3, min_size=3? Is this a replicated pool? Or EC 2,1? > > Don’t mark lost, there are things we can do. I don’t want to suggest anything until you share the above info. > >> On Oct 13, 2024, at 10:00 AM, Shain Miley <SMiley@xxxxxxx> wrote: >> >> Hello, >> >> I am seeing the following information after reviewing ‘ceph health detail’: >> >> [WRN] PG_AVAILABILITY: Reduced data availability: 3 pgs inactive, 3 pgs down >> >> pg 0.1a is down, acting [234,35] >> >> pg 0.20 is down, acting [226,267] >> >> pg 0.2f is down, acting [227,161] >> >> >> When I query each of those pgs I see the following message on each of them: >> >> "peering_blocked_by": [ >> >> { >> >> "osd": 233, >> >> "current_lost_at": 0, >> >> "comment": "starting or marking this osd lost may let us proceed" >> >> } >> >> >> Osd.233 crashed a while ago and when I try to start it the log shows some sort of issue with the filesystem: >> >> >> ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable) >> >> 1: (()+0x12980) [0x7f2779617980] >> >> 2: (gsignal()+0xc7) [0x7f27782c9fb7] >> >> 3: (abort()+0x141) [0x7f27782cb921] >> >> 4: (ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1b2) [0x556ebe773ddf] >> >> 5: (FileStore::_do_transaction(ceph::os::Transaction&, unsigned long, int, ThreadPool::TPHandle*, char const*)+0x62b3) [0x556ebebe2753] >> >> 6: (FileStore::_do_transactions(std::vector<ceph::os::Transaction, std::allocator<ceph::os::Transaction> >&, unsigned long, ThreadPool::TPHandle*, char const*)+0x48) [0x556ebebe3f38] >> >> 7: (JournalingObjectStore::journal_replay(unsigned long)+0x105a) [0x556ebebfc56a] >> >> 8: (FileStore::mount()+0x438a) [0x556ebebda82a] >> >> 9: (OSD::init()+0x4d1) [0x556ebe80fdc1] >> >> 10: (main()+0x3f8c) [0x556ebe77ad2c] >> >> 11: (__libc_start_main()+0xe7) [0x7f27782acbf7] >> >> 12: (_start()+0x2a) [0x556ebe78fc4a] >> >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. >> >> >> >> >> >> At this point I am thinking about either running an xfs repair on osd.233 and trying to see if I can get it back up (once the pgs are healthy again I would likey zap/readd or replace the drive). >> >> >> >> Another option it sounds like is to mark the osd as lost. >> >> >> >> I am just looking for advice on what exactly I should do next to try to minimize the chances of any data loss. >> >> Here is the query output for each of those pgs: >> https://urldefense.com/v3/__https://pastebin.com/YbfnpZGC__;!!Iwwt!XTVUuKiQDmZ8ZXQP-pHoxFFWYAIntSqVBuXcigFVVWbYMtpJTcQeg4BzgQQxWSAhs1BKujMNHx4rDIhAStU$<https://urldefense.com/v3/__https:/pastebin.com/YbfnpZGC__;!!Iwwt!XTVUuKiQDmZ8ZXQP-pHoxFFWYAIntSqVBuXcigFVVWbYMtpJTcQeg4BzgQQxWSAhs1BKujMNHx4rDIhAStU$> >> >> >> >> Thank you, >> >> Shain >> >> >> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx