So after much trail and tribulation, I ended up solving both problems on my own. The first one I had to rejigger the pools to balance out the objects per pg average. This isn’t my preferred method because
this could repeat again once the cluster gets loaded. On the second, I ended up removing previous optimizations that seemed to clear up the issue of osds going. I’m still at a loss why these were causing osds to go down, and I’m wondering if someone that knows
it better could explain it. Here are the removed options: [root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ~] # diff ceph.conf.old ceph-conf/ceph.conf 12,13d11 < mon_osd_down_out_interval = 30 < mon_osd_report_timeout = 300 18d15 < osd_journal_size = 10000 24,28d20 < max_open_files = 131072 < osd_max_backfills = 2 < osd_recovery_max_active = 2 < osd_recovery_op_priority = 1 < osd_client_op_priority = 63 30,32d21 < ms_dispatch_throttle_bytes = 1048576000 < objecter_inflight_op_bytes = 1048576000 < osd_deep_scrub_stride=5242880 36c25 < mon_pg_warn_max_object_skew = 10 Thanks, Matthew Stroud From: Matthew Stroud <mattstroud@xxxxxxxxxxxxx> The first and hopefully easy one: I have a situation where I have two pools that are rarely used (the third will be in use after I can get through these issues), but they
need to present at the whims of our cloud team. Is there a way I can turn off ‘2 pools have many more objects per pg than average’? What I have done to this point is played with ‘mon_pg_warn_max_object_skew’, but that didn’t remove the message. After googling and looking
through the docs, nothing stood out to me to resolve the issue. Technical info: [root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ~] # ceph health detail HEALTH_WARN 2 pools have many more objects per pg than average MANY_OBJECTS_PER_PG 2 pools have many more objects per pg than average pool images objects per pg (480) is more than 60 times cluster average (8) pool metrics objects per pg (336) is more than 42 times cluster average (8) The second: I’m seeing osds randomly getting marked as down, but then state they are still running. This issue wasn’t present before the upgrade. This
is a multipath setup but the paths appear healthy and the cluster isn’t really being utilized at the moment. Please let me know if you want more information: Ceph.log: 2018-01-25 14:56:29.011831 mon.mon01 mon.0 10.20.57.10:6789/0 823 : cluster [INF] osd.12 marked down after no beacon for 300.775605 seconds 2018-01-25 14:56:29.013280 mon.mon01 mon.0 10.20.57.10:6789/0 824 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN) 2018-01-25 14:56:32.034002 mon.mon01 mon.0 10.20.57.10:6789/0 830 : cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down) 2018-01-25 14:56:31.322228 osd.12 osd.12 10.20.57.14:6804/4163 1 : cluster [WRN] Monitor daemon marked osd.12 down, but it is still running Ceph-osd.12.log: 2018-01-25 14:56:00.606493 7facfde03700 4 rocksdb: (Original Log Time 2018/01/25-14:56:00.602100) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:360]
[default] Level-0 commit table #213 started 2018-01-25 14:56:00.606498 7facfde03700 4 rocksdb: (Original Log Time 2018/01/25-14:56:00.606406) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:383]
[default] Level-0 commit table #213: memtable #1 done 2018-01-25 14:56:00.606517 7facfde03700 4 rocksdb: (Original Log Time 2018/01/25-14:56:00.606437) EVENT_LOG_v1 {"time_micros": 1516917360606429,
"job": 29, "event": "flush_finished", "lsm_state": [2, 1, 1, 0, 0, 0, 0], "immutable_memtables": 0} 2018-01-25 14:56:00.606529 7facfde03700 4 rocksdb: (Original Log Time 2018/01/25-14:56:00.606466) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:132]
[default] Level summary: base level 1 max bytes base 268435456 files[2 1 1 0 0 0 0] max score 0.50 2018-01-25 14:56:00.606538 7facfde03700 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_files.cc:388]
[JOB 29] Try to delete WAL files size 252104127, prev total WAL file size 253684537, number of live WAL files 2. 2018-01-25 14:56:31.322223 7fad1262c700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.12 down, but it is still running 2018-01-25 14:56:31.322233 7fad1262c700 0 log_channel(cluster) log [DBG] : map e18531 wrongly marked me down at e18530 2018-01-25 14:56:31.322236 7fad1262c700 1 osd.12 18531 start_waiting_for_healthy 2018-01-25 14:56:31.327816 7fad0c620700 1 osd.12 pg_epoch: 18530 pg[14.8f( v 18432'17 (0'0,18432'17] local-lis/les=18521/18522 n=1 ec=18405/18405
lis/c 18521/18521 les/c/f 18522/18522/0 18530/18530/18530) [3,19] r=-1 lpr=18530 pi=[18521,18530)/1 luod=0'0 crt=18432'17 lcod 0'0 active] start_peering_interval up [12,3,19] -> [3,19], acting [12,3,19] -> [3,19], acting_primary 12 -> 3, up_primary 12 -> 3,
role 0 -> -1, features acting 2305244844532236283 upacting 2305244844532236283 2018-01-25 14:56:31.327851 7fad0be1f700 1 osd.12 pg_epoch: 18530 pg[14.9e( empty local-lis/les=18522/18523 n=0 ec=18405/18405 lis/c 18522/18522
les/c/f 18523/18523/0 18530/18530/18530) [15,10] r=-1 lpr=18530 pi=[18522,18530)/1 crt=0'0 active] start_peering_interval up [12,15,10] -> [15,10], acting [12,15,10] -> [15,10], acting_primary 12 -> 15, up_primary 12 -> 15, role 0 -> -1, features acting 2305244844532236283
upacting 2305244844532236283 2018-01-25 14:56:31.327918 7fad0c620700 1 osd.12 pg_epoch: 18531 pg[14.8f( v 18432'17 (0'0,18432'17] local-lis/les=18521/18522 n=1 ec=18405/18405
lis/c 18521/18521 les/c/f 18522/18522/0 18530/18530/18530) [3,19] r=-1 lpr=18530 pi=[18521,18530)/1 crt=18432'17 lcod 0'0 unknown NOTIFY] state<Start>: transitioning to Stray Ceph osd tree: [root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ceph-conf] # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 31.99658 root default -2 7.99915 host osd01 0 ssd 0.99989 osd.0 up 1.00000 1.00000 1 ssd 0.99989 osd.1 up 1.00000 1.00000 5 ssd 0.99989 osd.5 up 1.00000 1.00000 6 ssd 0.99989 osd.6 up 1.00000 1.00000 7 ssd 0.99989 osd.7 up 1.00000 1.00000 11 ssd 0.99989 osd.11 up 1.00000 1.00000 20 ssd 0.99989 osd.20 up 1.00000 1.00000 22 ssd 0.99989 osd.22 up 1.00000 1.00000 -3 7.99915 host osd02 12 ssd 0.99989 osd.12 up 1.00000 1.00000 18 ssd 0.99989 osd.18 up 1.00000 1.00000 23 ssd 0.99989 osd.23 up 1.00000 1.00000 26 ssd 0.99989 osd.26 up 1.00000 1.00000 27 ssd 0.99989 osd.27 up 1.00000 1.00000 28 ssd 0.99989 osd.28 up 1.00000 1.00000 29 ssd 0.99989 osd.29 up 1.00000 1.00000 30 ssd 0.99989 osd.30 up 1.00000 1.00000 -4 7.99915 host osd03 13 ssd 0.99989 osd.13 up 1.00000 1.00000 15 ssd 0.99989 osd.15 up 1.00000 1.00000 16 ssd 0.99989 osd.16 up 1.00000 1.00000 17 ssd 0.99989 osd.17 up 1.00000 1.00000 19 ssd 0.99989 osd.19 up 1.00000 1.00000 21 ssd 0.99989 osd.21 up 1.00000 1.00000 24 ssd 0.99989 osd.24 up 1.00000 1.00000 25 ssd 0.99989 osd.25 up 1.00000 1.00000 -5 7.99915 host osd04 2 ssd 0.99989 osd.2 up 1.00000 1.00000 3 ssd 0.99989 osd.3 up 1.00000 1.00000 4 ssd 0.99989 osd.4 up 1.00000 1.00000 8 ssd 0.99989 osd.8 up 1.00000 1.00000 9 ssd 0.99989 osd.9 up 1.00000 1.00000 10 ssd 0.99989 osd.10 up 1.00000 1.00000 14 ssd 0.99989 osd.14 up 1.00000 1.00000 31 ssd 0.99989 osd.31 up 1.00000 1.00000 Mon settings for down: [root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ceph-conf] # ceph --admin-daemon /var/run/ceph/ceph-mon.mon01.asok config show | grep -i down "mds_mon_shutdown_timeout": "5.000000", "mds_shutdown_check": "0", "mon_osd_adjust_down_out_interval": "true", "mon_osd_down_out_interval": "30", "mon_osd_down_out_subtree_limit": "rack", "mon_osd_min_down_reporters": "2", "mon_pg_check_down_all_threshold": "0.500000", "mon_warn_on_osd_down_out_interval_zero": "true", "osd_backoff_on_down": "true", "osd_debug_shutdown": "false", "osd_journal_flush_on_shutdown": "true", "osd_max_markdown_count": "5", "osd_max_markdown_period": "600", "osd_mon_shutdown_timeout": "5.000000", "osd_shutdown_pgref_assert": "false", CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify sender immediately by telephone or return email. Thank you. |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com