Re: Two issues remaining after luminous upgrade

Matthew Stroud <mattstroud@xxxxxxxxxxxxx> · Thu, 1 Feb 2018 16:09:01 +0000

So after much trail and tribulation, I ended up solving both problems on my own. The first one I had to rejigger the pools to balance out the objects per pg average. This isn’t my preferred method because
 this could repeat again once the cluster gets loaded.

On the second, I ended up removing previous optimizations that seemed to clear up the issue of osds going. I’m still at a loss why these were causing osds to go down, and I’m wondering if someone that knows
 it better could explain it. Here are the removed options:

[root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ~] # diff ceph.conf.old  ceph-conf/ceph.conf
12,13d11
< mon_osd_down_out_interval = 30
< mon_osd_report_timeout = 300
18d15
< osd_journal_size = 10000
24,28d20
< max_open_files = 131072
< osd_max_backfills = 2
< osd_recovery_max_active = 2
< osd_recovery_op_priority = 1
< osd_client_op_priority = 63
30,32d21
< ms_dispatch_throttle_bytes = 1048576000
< objecter_inflight_op_bytes = 1048576000
< osd_deep_scrub_stride=5242880
36c25
< mon_pg_warn_max_object_skew = 10

Thanks,
Matthew Stroud

From: Matthew Stroud <mattstroud@xxxxxxxxxxxxx>

Date: Thursday, January 25, 2018 at 3:15 PM

To: "ceph-users@xxxxxxxxxxxxxx" <ceph-users@xxxxxxxxxxxxxx>

Subject: Two issues remaining after luminous upgrade

The first and hopefully easy one:

I have a situation where I have two pools that are rarely used (the third will be in use after I can get through these issues), but they
 need to present at the whims of our cloud team. Is there a way I can turn off ‘2 pools have many more objects per pg than average’?

What I have done to this point is played with ‘mon_pg_warn_max_object_skew’, but that didn’t remove the message. After googling and looking
 through the docs, nothing stood out to me to resolve the issue.

Technical info:

[root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ~] # ceph health detail
HEALTH_WARN 2 pools have many more objects per pg than average
MANY_OBJECTS_PER_PG 2 pools have many more objects per pg than average
    pool images objects per pg (480) is more than 60 times cluster average (8)
    pool metrics objects per pg (336) is more than 42 times cluster average (8)

The second:

I’m seeing osds randomly getting marked as down, but then state they are still running. This issue wasn’t present before the upgrade. This
 is a multipath setup but the paths appear healthy and the cluster isn’t really being utilized at the moment. Please let me know if you want more information:

Ceph.log:

2018-01-25 14:56:29.011831 mon.mon01 mon.0 10.20.57.10:6789/0 823 : cluster [INF] osd.12 marked down after no beacon for 300.775605 seconds
2018-01-25 14:56:29.013280 mon.mon01 mon.0 10.20.57.10:6789/0 824 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
2018-01-25 14:56:32.034002 mon.mon01 mon.0 10.20.57.10:6789/0 830 : cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
2018-01-25 14:56:31.322228 osd.12 osd.12 10.20.57.14:6804/4163 1 : cluster [WRN] Monitor daemon marked osd.12 down, but it is still running

Ceph-osd.12.log:

2018-01-25 14:56:00.606493 7facfde03700  4 rocksdb: (Original Log Time 2018/01/25-14:56:00.602100) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:360]
 [default] Level-0 commit table #213 started
2018-01-25 14:56:00.606498 7facfde03700  4 rocksdb: (Original Log Time 2018/01/25-14:56:00.606406) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/memtable_list.cc:383]
 [default] Level-0 commit table #213: memtable #1 done
2018-01-25 14:56:00.606517 7facfde03700  4 rocksdb: (Original Log Time 2018/01/25-14:56:00.606437) EVENT_LOG_v1 {"time_micros": 1516917360606429,
 "job": 29, "event": "flush_finished", "lsm_state": [2, 1, 1, 0, 0, 0, 0], "immutable_memtables": 0}
2018-01-25 14:56:00.606529 7facfde03700  4 rocksdb: (Original Log Time 2018/01/25-14:56:00.606466) [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_compaction_flush.cc:132]
 [default] Level summary: base level 1 max bytes base 268435456 files[2 1 1 0 0 0 0] max score 0.50

2018-01-25 14:56:00.606538 7facfde03700  4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.2/rpm/el7/BUILD/ceph-12.2.2/src/rocksdb/db/db_impl_files.cc:388]
 [JOB 29] Try to delete WAL files size 252104127, prev total WAL file size 253684537, number of live WAL files 2.

2018-01-25 14:56:31.322223 7fad1262c700  0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.12 down, but it is still running
2018-01-25 14:56:31.322233 7fad1262c700  0 log_channel(cluster) log [DBG] : map e18531 wrongly marked me down at e18530
2018-01-25 14:56:31.322236 7fad1262c700  1 osd.12 18531 start_waiting_for_healthy
2018-01-25 14:56:31.327816 7fad0c620700  1 osd.12 pg_epoch: 18530 pg[14.8f( v 18432'17 (0'0,18432'17] local-lis/les=18521/18522 n=1 ec=18405/18405
 lis/c 18521/18521 les/c/f 18522/18522/0 18530/18530/18530) [3,19] r=-1 lpr=18530 pi=[18521,18530)/1 luod=0'0 crt=18432'17 lcod 0'0 active] start_peering_interval up [12,3,19] -> [3,19], acting [12,3,19] -> [3,19], acting_primary 12 -> 3, up_primary 12 -> 3,
 role 0 -> -1, features acting 2305244844532236283 upacting 2305244844532236283
2018-01-25 14:56:31.327851 7fad0be1f700  1 osd.12 pg_epoch: 18530 pg[14.9e( empty local-lis/les=18522/18523 n=0 ec=18405/18405 lis/c 18522/18522
 les/c/f 18523/18523/0 18530/18530/18530) [15,10] r=-1 lpr=18530 pi=[18522,18530)/1 crt=0'0 active] start_peering_interval up [12,15,10] -> [15,10], acting [12,15,10] -> [15,10], acting_primary 12 -> 15, up_primary 12 -> 15, role 0 -> -1, features acting 2305244844532236283
 upacting 2305244844532236283
2018-01-25 14:56:31.327918 7fad0c620700  1 osd.12 pg_epoch: 18531 pg[14.8f( v 18432'17 (0'0,18432'17] local-lis/les=18521/18522 n=1 ec=18405/18405
 lis/c 18521/18521 les/c/f 18522/18522/0 18530/18530/18530) [3,19] r=-1 lpr=18530 pi=[18521,18530)/1 crt=18432'17 lcod 0'0 unknown NOTIFY] state<Start>: transitioning to Stray

Ceph osd tree:

[root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ceph-conf] # ceph osd tree
ID CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF
-1       31.99658 root default
-2        7.99915     host osd01
0   ssd  0.99989         osd.0      up  1.00000 1.00000
1   ssd  0.99989         osd.1      up  1.00000 1.00000
5   ssd  0.99989         osd.5      up  1.00000 1.00000
6   ssd  0.99989         osd.6      up  1.00000 1.00000
7   ssd  0.99989         osd.7      up  1.00000 1.00000
11   ssd  0.99989         osd.11     up  1.00000 1.00000
20   ssd  0.99989         osd.20     up  1.00000 1.00000
22   ssd  0.99989         osd.22     up  1.00000 1.00000
-3        7.99915     host osd02
12   ssd  0.99989         osd.12     up  1.00000 1.00000
18   ssd  0.99989         osd.18     up  1.00000 1.00000
23   ssd  0.99989         osd.23     up  1.00000 1.00000
26   ssd  0.99989         osd.26     up  1.00000 1.00000
27   ssd  0.99989         osd.27     up  1.00000 1.00000
28   ssd  0.99989         osd.28     up  1.00000 1.00000
29   ssd  0.99989         osd.29     up  1.00000 1.00000
30   ssd  0.99989         osd.30     up  1.00000 1.00000
-4        7.99915     host osd03
13   ssd  0.99989         osd.13     up  1.00000 1.00000
15   ssd  0.99989         osd.15     up  1.00000 1.00000
16   ssd  0.99989         osd.16     up  1.00000 1.00000
17   ssd  0.99989         osd.17     up  1.00000 1.00000
19   ssd  0.99989         osd.19     up  1.00000 1.00000
21   ssd  0.99989         osd.21     up  1.00000 1.00000
24   ssd  0.99989         osd.24     up  1.00000 1.00000
25   ssd  0.99989         osd.25     up  1.00000 1.00000
-5        7.99915     host osd04
2   ssd  0.99989         osd.2      up  1.00000 1.00000
3   ssd  0.99989         osd.3      up  1.00000 1.00000
4   ssd  0.99989         osd.4      up  1.00000 1.00000
8   ssd  0.99989         osd.8      up  1.00000 1.00000
9   ssd  0.99989         osd.9      up  1.00000 1.00000
10   ssd  0.99989         osd.10     up  1.00000 1.00000
14   ssd  0.99989         osd.14     up  1.00000 1.00000
31   ssd  0.99989         osd.31     up  1.00000 1.00000

Mon settings for down:

[root@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx ceph-conf] # ceph --admin-daemon /var/run/ceph/ceph-mon.mon01.asok config show | grep -i down
    "mds_mon_shutdown_timeout": "5.000000",
    "mds_shutdown_check": "0",
    "mon_osd_adjust_down_out_interval": "true",
    "mon_osd_down_out_interval": "30",
    "mon_osd_down_out_subtree_limit": "rack",
    "mon_osd_min_down_reporters": "2",
    "mon_pg_check_down_all_threshold": "0.500000",
    "mon_warn_on_osd_down_out_interval_zero": "true",
    "osd_backoff_on_down": "true",
    "osd_debug_shutdown": "false",
    "osd_journal_flush_on_shutdown": "true",
    "osd_max_markdown_count": "5",
    "osd_max_markdown_period": "600",
    "osd_mon_shutdown_timeout": "5.000000",
    "osd_shutdown_pgref_assert": "false",

CONFIDENTIALITY NOTICE: This message is intended only for the use and review of the individual or entity to which it is addressed and may contain information that is privileged and confidential. If the reader of this message is not the intended recipient, or
 the employee or agent responsible for delivering the message solely to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in
 error, please notify sender immediately by telephone or return email. Thank you.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com