On Sat, 26 Aug 2017, Linux Chips wrote: > On 08/26/2017 01:46 AM, Sage Weil wrote: > > On Sat, 26 Aug 2017, Linux Chips wrote: > > > On 08/24/2017 06:58 AM, Sage Weil wrote: > > > > > > > > Okay, so I think the combination of (1) removing empty PGs and (2) > > > > pruning > > > > past_intervals will help. (1) can be scripted by looking in > > > > current/$pg_HEAD directories and picking out the ones with 0 or 1 > > > > objects > > > > in them, doing ceph-objecstore-tool export to make a backup (just in > > > > case), and then removing them (with ceph-objectstore-tool). Be careful > > > > of > > > > PGs for empty pools since those will be naturally empty (and you want > > > > to keep them). > > > > > > > > For (2), see the wip-prune-past-intervals-jewel branch in ceph-ci.git.. > > > > if > > > > that is applied to the kraken branch it ought ot work (although it's > > > > untested). Alternatively, you can just upgrade to luminous, as it > > > > implements a more sophisticated version of the same thing. You need to > > > > upgrade mons, mark all osds down, upgrade osds and start at least one of > > > > them, and then set 'ceph osd require-osd-release luminous' before it'll > > > > switch to the new past intervals representation. Definitely test it on > > > > your test cluster to ensure it reduces the memory usage! > > > > > > > > If that doesn't sort things out we'll need to see a heap profile for an > > > > OOMing OSD to make sure we know what is using all of the RAM... > > > > > > > > sage > > > > > > > > > > well, big thank you sage. we tested the upgrade on the test cluster. and > > > it > > > did work in an awesome way. and we did not even needed to remove any pg > > > with > > > the tool. > > > > OMG what a relief! This is great news for a Friday night! :) > > > > > I read about the prune thing in the release notes, so when our attempts > > > failed > > > to start the cluster, we tried upgrading, but it did not help. it turned > > > out > > > that we missed the 'ceph osd require-osd-release luminous' thing. I mean > > > we > > > was looking on the command in the release notes upgrade section, and said > > > to > > > each other "it dose not matter, it would only restrict the old osds from > > > joining" and moved on. damn, we would be up a week ago. > > > having said that, I think the release notes should highlight this in the > > > future. > > > > Good point--I'll update the wording to make it clear that a lot of new > > behavior does not kick in until the switch is flipped. > > > > > now we have upgraded the production cluster, and it is up and running now, > > > memory foot print was down to the tenth. the largest ram using osd i saw > > > was > > > about 6.5GB. > > > but we faced some issues, particularly OSDs crashing with "FAILED > > > assert(interval.last > last)" > > > > Just to clarify: this production cluster never ran the hacky kraken patch > > I posted that prune past intervals, right? If so, then yes, please open a > > ticket with any/all osd.299 bugs you still have. If it ran the kraken > > patch, then let's not bother--I don't want to confuse the situation with > > logs from that weird code.no we did not run any patched versions, this > > cluster only ran on the > packages from the official repo. > any more info we need to get into the bug report before i clean the osds up? Hmm, if you can dump the osdmaps from the mon around the failure epoch, 1050351 (say, +/1 100 epochs to be safe?) that would help. You can stick them in a directory with the logs and use ceph-post-file to upload the whole thing. > > Anyway, I'm delighted you're back up (and I'm sorry the release notes > > wording didn't help lead you there a week ago)! > > > > FWIW the 6.5gb still sounds very high; I'd confirm that after the cluster > > stabilizes a restarted OSD shrinks back down to a more typical size (I > > suspect the allocator isn't releasing memory back to the OS due to > > fragmentation etc). > to be clear here, the 6.5 was during peering operation. the usage got lower > after a few minuets. they are now at about 2-2.5GB. sometimes they spike to > 4G, i think when starting a new recovery or something. > another thing we noticed. in kraken (and jewel) if the noup is set, the > starting osd will consume about 3G ram by the time it is waiting for the noup > flag removal. while in luminous, it was about 700-800 MB. This is probably mostly due to the smaller default osdmap cache size, but it's good to see that it's effective. Thanks! sage > > > > > Thanks, and enjoy the weekend! > > sage > > > > > > > > > > > > > logs: > > > > > > -34> 2017-08-26 00:38:00.505114 7f14556b4d00 0 osd.299 1085665 > > > load_pgs > > > opened 455 pgs > > > -33> 2017-08-26 00:38:00.505787 7f14556b4d00 10 osd.299 1085665 19.f1e > > > needs 1050342-1085230 > > > -32> 2017-08-26 00:38:00.505814 7f14556b4d00 1 osd.299 1085665 > > > build_past_intervals_parallel over 1050342-1085230 > > > -31> 2017-08-26 00:38:00.505818 7f14556b4d00 10 osd.299 1085665 > > > build_past_intervals_parallel epoch 1050342 > > > -30> 2017-08-26 00:38:00.505824 7f14556b4d00 20 osd.299 0 get_map > > > 1050342 - > > > loading and decoding 0x7f14b3dfb0c0 > > > -29> 2017-08-26 00:38:00.506245 7f14556b4d00 10 osd.299 0 add_map_bl > > > 1050342 780781 bytes > > > -28> 2017-08-26 00:38:00.508539 7f14556b4d00 10 osd.299 1085665 > > > build_past_intervals_parallel epoch 1050342 pg 19.f1e first map, acting > > > [80] > > > up [80], same_interval_since = 1050342 > > > -27> 2017-08-26 00:38:00.508547 7f14556b4d00 10 osd.299 1085665 > > > build_past_intervals_parallel epoch 1050343 > > > -26> 2017-08-26 00:38:00.508550 7f14556b4d00 20 osd.299 0 get_map > > > 1050343 - > > > loading and decoding 0x7f14b3dfad80 > > > -25> 2017-08-26 00:38:00.508997 7f14556b4d00 10 osd.299 0 add_map_bl > > > 1050343 781371 bytes > > > -24> 2017-08-26 00:38:00.511176 7f14556b4d00 10 osd.299 1085665 > > > build_past_intervals_parallel epoch 1050344 > > > -23> 2017-08-26 00:38:00.511196 7f14556b4d00 20 osd.299 0 get_map > > > 1050344 - > > > loading and decoding 0x7f14b3dfb740 > > > -22> 2017-08-26 00:38:00.511625 7f14556b4d00 10 osd.299 0 add_map_bl > > > 1050344 782446 bytes > > > -21> 2017-08-26 00:38:00.513813 7f14556b4d00 10 osd.299 1085665 > > > build_past_intervals_parallel epoch 1050345 > > > -20> 2017-08-26 00:38:00.513820 7f14556b4d00 20 osd.299 0 get_map > > > 1050345 - > > > loading and decoding 0x7f14b3dfba80 > > > -19> 2017-08-26 00:38:00.514260 7f14556b4d00 10 osd.299 0 add_map_bl > > > 1050345 782071 bytes > > > -18> 2017-08-26 00:38:00.516463 7f14556b4d00 10 osd.299 1085665 > > > build_past_intervals_parallel epoch 1050346 > > > -17> 2017-08-26 00:38:00.516488 7f14556b4d00 20 osd.299 0 get_map > > > 1050346 - > > > loading and decoding 0x7f14b79c4000 > > > -16> 2017-08-26 00:38:00.516927 7f14556b4d00 10 osd.299 0 add_map_bl > > > 1050346 781955 bytes > > > -15> 2017-08-26 00:38:00.519047 7f14556b4d00 10 osd.299 1085665 > > > build_past_intervals_parallel epoch 1050347 > > > -14> 2017-08-26 00:38:00.519054 7f14556b4d00 20 osd.299 0 get_map > > > 1050347 - > > > loading and decoding 0x7f14b79c4340 > > > -13> 2017-08-26 00:38:00.519500 7f14556b4d00 10 osd.299 0 add_map_bl > > > 1050347 781930 bytes > > > -12> 2017-08-26 00:38:00.521612 7f14556b4d00 10 osd.299 1085665 > > > build_past_intervals_parallel epoch 1050348 > > > -11> 2017-08-26 00:38:00.521619 7f14556b4d00 20 osd.299 0 get_map > > > 1050348 - > > > loading and decoding 0x7f14b79c4680 > > > -10> 2017-08-26 00:38:00.522074 7f14556b4d00 10 osd.299 0 add_map_bl > > > 1050348 784883 bytes > > > -9> 2017-08-26 00:38:00.524245 7f14556b4d00 10 osd.299 1085665 > > > build_past_intervals_parallel epoch 1050349 > > > -8> 2017-08-26 00:38:00.524252 7f14556b4d00 20 osd.299 0 get_map > > > 1050349 - > > > loading and decoding 0x7f14b79c49c0 > > > -7> 2017-08-26 00:38:00.524706 7f14556b4d00 10 osd.299 0 add_map_bl > > > 1050349 785081 bytes > > > -6> 2017-08-26 00:38:00.526854 7f14556b4d00 10 osd.299 1085665 > > > build_past_intervals_parallel epoch 1050350 > > > -5> 2017-08-26 00:38:00.526861 7f14556b4d00 20 osd.299 0 get_map > > > 1050350 - > > > loading and decoding 0x7f14b79c4d00 > > > -4> 2017-08-26 00:38:00.527330 7f14556b4d00 10 osd.299 0 add_map_bl > > > 1050350 785948 bytes > > > -3> 2017-08-26 00:38:00.529505 7f14556b4d00 10 osd.299 1085665 > > > build_past_intervals_parallel epoch 1050351 > > > -2> 2017-08-26 00:38:00.529512 7f14556b4d00 20 osd.299 0 get_map > > > 1050351 - > > > loading and decoding 0x7f14b79c5040 > > > -1> 2017-08-26 00:38:00.529979 7f14556b4d00 10 osd.299 0 add_map_bl > > > 1050351 788650 bytes > > > 0> 2017-08-26 00:38:00.534373 7f14556b4d00 -1 > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/osd/osd_types.cc: > > > In function 'virtual void pi_compact_rep::add_interval(bool, const > > > PastIntervals::pg_interval_t&)' thread 7f14556b4d00 time 2017-08-26 > > > 00:38:00.532119 > > > /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.1.4/rpm/el7/BUILD/ceph-12.1.4/src/osd/osd_types.cc: > > > 3205: FAILED assert(interval.last > last) > > > > > > ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous > > > (rc) > > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > > const*)+0x110) [0x7f145612f420] > > > 2: (pi_compact_rep::add_interval(bool, PastIntervals::pg_interval_t > > > const&)+0x3b2) [0x7f1455e030b2] > > > 3: (PastIntervals::check_new_interval(int, int, std::vector<int, > > > std::allocator<int> > const&, std::vector<int, std::allocator<int> > > > > const&, > > > int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, > > > std::allocator<int> > const&, unsigned int, unsigned int, > > > std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, pg_t, > > > IsPGRecoverablePredicate*, PastIntervals*, std::ostream*)+0x380) > > > [0x7f1455de8ab0] > > > 4: (OSD::build_past_intervals_parallel()+0xa8f) [0x7f1455bbc71f] > > > 5: (OSD::load_pgs()+0x503) [0x7f1455bbef13] > > > 6: (OSD::init()+0x2179) [0x7f1455bd7779] > > > 7: (main()+0x2def) [0x7f1455add56f] > > > 8: (__libc_start_main()+0xf5) [0x7f1451d14b35] > > > 9: (()+0x4ac8a6) [0x7f1455b7b8a6] > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > > > to > > > interpret this. > > > > > > --- logging levels --- > > > 0/ 5 none > > > 0/ 1 lockdep > > > 0/ 1 context > > > 1/ 1 crush > > > 1/ 5 mds > > > 1/ 5 mds_balancer > > > 1/ 5 mds_locker > > > 1/ 5 mds_log > > > 1/ 5 mds_log_expire > > > 1/ 5 mds_migrator > > > 0/ 1 buffer > > > 0/ 1 timer > > > 0/ 1 filer > > > 0/ 1 striper > > > 0/ 1 objecter > > > 0/ 5 rados > > > 0/ 5 rbd > > > 0/ 5 rbd_mirror > > > 0/ 5 rbd_replay > > > 0/ 5 journaler > > > 0/ 5 objectcacher > > > 0/ 5 client > > > 20/20 osd > > > 0/ 5 optracker > > > 0/ 5 objclass > > > 1/ 3 filestore > > > 1/ 3 journal > > > 0/ 5 ms > > > 1/ 5 mon > > > 0/10 monc > > > 1/ 5 paxos > > > 0/ 5 tp > > > 1/ 5 auth > > > 1/ 5 crypto > > > 1/ 1 finisher > > > 1/ 5 heartbeatmap > > > 1/ 5 perfcounter > > > 1/ 5 rgw > > > 1/10 civetweb > > > 1/ 5 javaclient > > > 1/ 5 asok > > > 1/ 1 throttle > > > 0/ 0 refs > > > 1/ 5 xio > > > 1/ 5 compressor > > > 1/ 5 bluestore > > > 1/ 5 bluefs > > > 1/ 3 bdev > > > 1/ 5 kstore > > > 4/ 5 rocksdb > > > 4/ 5 leveldb > > > 4/ 5 memdb > > > 1/ 5 kinetic > > > 1/ 5 fuse > > > 1/ 5 mgr > > > 1/ 5 mgrc > > > 1/ 5 dpdk > > > 1/ 5 eventtrace > > > -2/-2 (syslog threshold) > > > -1/-1 (stderr threshold) > > > max_recent 10000 > > > max_new 1000 > > > log_file /var/log/ceph/ceph-osd.299.log > > > --- end dump of recent events --- > > > 2017-08-26 00:38:00.572479 7f14556b4d00 -1 *** Caught signal (Aborted) ** > > > in thread 7f14556b4d00 thread_name:ceph-osd > > > > > > ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous > > > (rc) > > > 1: (()+0xa21a01) [0x7f14560f0a01] > > > 2: (()+0xf370) [0x7f1452cfe370] > > > 3: (gsignal()+0x37) [0x7f1451d281d7] > > > 4: (abort()+0x148) [0x7f1451d298c8] > > > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > > const*)+0x284) [0x7f145612f594] > > > 6: (pi_compact_rep::add_interval(bool, PastIntervals::pg_interval_t > > > const&)+0x3b2) [0x7f1455e030b2] > > > 7: (PastIntervals::check_new_interval(int, int, std::vector<int, > > > std::allocator<int> > const&, std::vector<int, std::allocator<int> > > > > const&, > > > int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, > > > std::allocator<int> > const&, unsigned int, unsigned int, > > > std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, pg_t, > > > IsPGRecoverablePredicate*, PastIntervals*, std::ostream*)+0x380) > > > [0x7f1455de8ab0] > > > 8: (OSD::build_past_intervals_parallel()+0xa8f) [0x7f1455bbc71f] > > > 9: (OSD::load_pgs()+0x503) [0x7f1455bbef13] > > > 10: (OSD::init()+0x2179) [0x7f1455bd7779] > > > 11: (main()+0x2def) [0x7f1455add56f] > > > 12: (__libc_start_main()+0xf5) [0x7f1451d14b35] > > > 13: (()+0x4ac8a6) [0x7f1455b7b8a6] > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > > > to > > > interpret this. > > > > > > --- begin dump of recent events --- > > > 0> 2017-08-26 00:38:00.572479 7f14556b4d00 -1 *** Caught signal > > > (Aborted) > > > ** > > > in thread 7f14556b4d00 thread_name:ceph-osd > > > > > > ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous > > > (rc) > > > 1: (()+0xa21a01) [0x7f14560f0a01] > > > 2: (()+0xf370) [0x7f1452cfe370] > > > 3: (gsignal()+0x37) [0x7f1451d281d7] > > > 4: (abort()+0x148) [0x7f1451d298c8] > > > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > > > const*)+0x284) [0x7f145612f594] > > > 6: (pi_compact_rep::add_interval(bool, PastIntervals::pg_interval_t > > > const&)+0x3b2) [0x7f1455e030b2] > > > 7: (PastIntervals::check_new_interval(int, int, std::vector<int, > > > std::allocator<int> > const&, std::vector<int, std::allocator<int> > > > > const&, > > > int, int, std::vector<int, std::allocator<int> > const&, std::vector<int, > > > std::allocator<int> > const&, unsigned int, unsigned int, > > > std::shared_ptr<OSDMap const>, std::shared_ptr<OSDMap const>, pg_t, > > > IsPGRecoverablePredicate*, PastIntervals*, std::ostream*)+0x380) > > > [0x7f1455de8ab0] > > > 8: (OSD::build_past_intervals_parallel()+0xa8f) [0x7f1455bbc71f] > > > 9: (OSD::load_pgs()+0x503) [0x7f1455bbef13] > > > 10: (OSD::init()+0x2179) [0x7f1455bd7779] > > > 11: (main()+0x2def) [0x7f1455add56f] > > > 12: (__libc_start_main()+0xf5) [0x7f1451d14b35] > > > 13: (()+0x4ac8a6) [0x7f1455b7b8a6] > > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > > > to > > > interpret this. > > > > > > --- logging levels --- > > > 0/ 5 none > > > 0/ 1 lockdep > > > 0/ 1 context > > > 1/ 1 crush > > > 1/ 5 mds > > > 1/ 5 mds_balancer > > > 1/ 5 mds_locker > > > 1/ 5 mds_log > > > 1/ 5 mds_log_expire > > > 1/ 5 mds_migrator > > > 0/ 1 buffer > > > 0/ 1 timer > > > 0/ 1 filer > > > 0/ 1 striper > > > 0/ 1 objecter > > > 0/ 5 rados > > > 0/ 5 rbd > > > 0/ 5 rbd_mirror > > > 0/ 5 rbd_replay > > > 0/ 5 journaler > > > 0/ 5 objectcacher > > > 0/ 5 client > > > 20/20 osd > > > 0/ 5 optracker > > > 0/ 5 objclass > > > 1/ 3 filestore > > > 1/ 3 journal > > > 0/ 5 ms > > > 1/ 5 mon > > > 0/10 monc > > > 1/ 5 paxos > > > 0/ 5 tp > > > 1/ 5 auth > > > 1/ 5 crypto > > > 1/ 1 finisher > > > 1/ 5 heartbeatmap > > > 1/ 5 perfcounter > > > 1/ 5 rgw > > > 1/10 civetweb > > > 1/ 5 javaclient > > > 1/ 5 asok > > > 1/ 1 throttle > > > 0/ 0 refs > > > 1/ 5 xio > > > 1/ 5 compressor > > > 1/ 5 bluestore > > > 1/ 5 bluefs > > > 1/ 3 bdev > > > 1/ 5 kstore > > > 4/ 5 rocksdb > > > 4/ 5 leveldb > > > 4/ 5 memdb > > > 1/ 5 kinetic > > > 1/ 5 fuse > > > 1/ 5 mgr > > > 1/ 5 mgrc > > > 1/ 5 dpdk > > > 1/ 5 eventtrace > > > -2/-2 (syslog threshold) > > > -1/-1 (stderr threshold) > > > max_recent 10000 > > > max_new 1000 > > > log_file /var/log/ceph/ceph-osd.299.log > > > --- end dump of recent events --- > > > > > > > > > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-299/ --op info > > > --pgid > > > 19.f1e > > > { > > > "pgid": "19.f1e", > > > "last_update": "0'0", > > > "last_complete": "0'0", > > > "log_tail": "0'0", > > > "last_user_version": 0, > > > "last_backfill": "MAX", > > > "last_backfill_bitwise": 0, > > > "purged_snaps": [], > > > "history": { > > > "epoch_created": 1084817, > > > "epoch_pool_created": 1084817, > > > "last_epoch_started": 1085232, > > > "last_interval_started": 1085230, > > > "last_epoch_clean": 1050342, > > > "last_interval_clean": 1050342, > > > "last_epoch_split": 0, > > > "last_epoch_marked_full": 1061015, > > > "same_up_since": 1085230, > > > "same_interval_since": 1085230, > > > "same_primary_since": 1085230, > > > "last_scrub": "960114'865853", > > > "last_scrub_stamp": "2017-08-25 17:32:06.181006", > > > "last_deep_scrub": "952725'861179", > > > "last_deep_scrub_stamp": "2017-08-25 17:32:06.181006", > > > "last_clean_scrub_stamp": "2017-08-25 17:32:06.181006" > > > }, > > > "stats": { > > > "version": "0'0", > > > "reported_seq": "424", > > > "reported_epoch": "1085650", > > > "state": "active+undersized+degraded", > > > "last_fresh": "2017-08-25 18:52:46.520078", > > > "last_change": "2017-08-25 18:38:16.356266", > > > "last_active": "2017-08-25 18:52:46.520078", > > > "last_peered": "2017-08-25 18:52:46.520078", > > > "last_clean": "2017-08-25 17:32:06.181006", > > > "last_became_active": "2017-08-25 18:38:16.356266", > > > "last_became_peered": "2017-08-25 18:38:16.356266", > > > "last_unstale": "2017-08-25 18:52:46.520078", > > > "last_undegraded": "2017-08-25 18:38:16.304877", > > > "last_fullsized": "2017-08-25 18:38:16.304877", > > > "mapping_epoch": 1085230, > > > "log_start": "0'0", > > > "ondisk_log_start": "0'0", > > > "created": 1084817, > > > "last_epoch_clean": 1050342, > > > "parent": "0.0", > > > "parent_split_bits": 0, > > > "last_scrub": "960114'865853", > > > "last_scrub_stamp": "2017-08-25 17:32:06.181006", > > > "last_deep_scrub": "952725'861179", > > > "last_deep_scrub_stamp": "2017-08-25 17:32:06.181006", > > > "last_clean_scrub_stamp": "2017-08-25 17:32:06.181006", > > > "log_size": 0, > > > "ondisk_log_size": 0, > > > "stats_invalid": false, > > > "dirty_stats_invalid": false, > > > "omap_stats_invalid": false, > > > "hitset_stats_invalid": false, > > > "hitset_bytes_stats_invalid": false, > > > "pin_stats_invalid": false, > > > "stat_sum": { > > > "num_bytes": 0, > > > "num_objects": 0, > > > "num_object_clones": 0, > > > "num_object_copies": 0, > > > "num_objects_missing_on_primary": 0, > > > "num_objects_missing": 0, > > > "num_objects_degraded": 0, > > > "num_objects_misplaced": 0, > > > "num_objects_unfound": 0, > > > "num_objects_dirty": 0, > > > "num_whiteouts": 0, > > > "num_read": 0, > > > "num_read_kb": 0, > > > "num_write": 0, > > > "num_write_kb": 0, > > > "num_scrub_errors": 0, > > > "num_shallow_scrub_errors": 0, > > > "num_deep_scrub_errors": 0, > > > "num_objects_recovered": 0, > > > "num_bytes_recovered": 0, > > > "num_keys_recovered": 0, > > > "num_objects_omap": 0, > > > "num_objects_hit_set_archive": 0, > > > "num_bytes_hit_set_archive": 0, > > > "num_flush": 0, > > > "num_flush_kb": 0, > > > "num_evict": 0, > > > "num_evict_kb": 0, > > > "num_promote": 0, > > > "num_flush_mode_high": 0, > > > "num_flush_mode_low": 0, > > > "num_evict_mode_some": 0, > > > "num_evict_mode_full": 0, > > > "num_objects_pinned": 0, > > > "num_legacy_snapsets": 0 > > > }, > > > "up": [ > > > 299 > > > ], > > > "acting": [ > > > 299 > > > ], > > > "blocked_by": [], > > > "up_primary": 299, > > > "acting_primary": 299 > > > }, > > > "empty": 1, > > > "dne": 0, > > > "incomplete": 0, > > > "last_epoch_started": 1085232, > > > "hit_set_history": { > > > "current_last_update": "0'0", > > > "history": [] > > > } > > > } > > > > > > > > > ll /var/lib/ceph/osd/ceph-299/current/19.f1e_head/ > > > total 0 > > > -rw-r--r-- 1 root root 0 Aug 25 18:38 __head_00000F1E__13 > > > > > > ceph pg 19.f1e query > > > . > > > . > > > "blocked": "peering is blocked due to down osds", > > > "down_osds_we_would_probe": [ > > > 299 > > > ], > > > "peering_blocked_by": [ > > > { > > > "osd": 299, > > > "current_lost_at": 0, > > > "comment": "starting or marking this osd lost may let > > > us > > > proceed" > > > } > > > ] > > > . > > > . > > > . > > > > > > > > > > > > removing the pg with ceph-objectstore-tool did the trick. but i am not > > > sure if > > > that will happen to a pg with real data in it or not. > > > should i report this in the bug tracker? > > > > > > thanks > > > -- > > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html