Hello Sage, On Thu, 8 Aug 2019 02:23:15 +0000 (UTC) Sage Weil wrote: > On Thu, 8 Aug 2019, Christian Balzer wrote: > > > > Hello again, > > > > Getting back to this: > > On Sun, 4 Aug 2019 10:47:27 +0900 Christian Balzer wrote: > > > > > Hello, > > > > > > preparing the first production bluestore, nautilus (latest) based cluster > > > I've run into the same things other people and myself ran into before. > > > > > > Firstly HW, 3 nodes with 12 SATA HDDs each, IT mode LSI 3008, wal/db on > > > 40GB SSD partitions. (boy do I hate the inability of ceph-volume to deal > > > with raw partitions). > > > SSDs aren't a bottleneck in any scenario. > > > Single E5-1650 v3 @ 3.50GHz, cpu isn't a bottleneck in any scenario, less > > > than 15% of a core per OSD. > > > > > > Connection is via 40GB/s infiniband, IPoIB, no issues here as numbers later > > > will show. > > > > > > Clients are KVMs on Epyc based compute nodes, maybe some more speed could > > > be squeezed out here with different VM configs, but the cpu isn't an issue > > > in the problem cases. > > > > > > > > > > > > 1. 4k random I/O can cause degraded PGs > > > I've run into the same/similar issue as Nathan Fish here: > > > https://www.spinics.net/lists/ceph-users/msg526 > > > During the first 2 tests with 4k random I/O I got shortly degraded PGs as > > > well, with no indication in CPU or SSD utilization accounting for this. > > > HDDs were of course busy at that time. > > > Wasn't able to reproduce this so far, but it leaves me less than > > > confident. > > > > > > > > This happened again yesterday when rsyncing 260GB of average 4MB files > > into a Ceph image backed VM. > > Given the nature of this rsync nothing on the ceph nodes was the least bit > > busy, the HDDs were all below 15% utilization, CPU bored, etc. > > > > Still we got: > > --- > > 2019-08-07 15:38:23.452580 osd.21 (osd.21) 651 : cluster [DBG] 1.125 starting backfill to osd.9 from (0'0,0'0] MAX to 1297'21584 > > 2019-08-07 15:38:24.454942 mon.ceph-05 (mon.0) 182756 : cluster [WRN] Health check failed: Reduced data availability: 2 pgs peering (PG_AVAILABILITY) > > 2019-08-07 15:38:25.396756 mon.ceph-05 (mon.0) 182757 : cluster [DBG] osdmap e1302: 36 total, 36 up, 36 in > > 2019-08-07 15:38:23.452026 osd.12 (osd.12) 767 : cluster [DBG] 1.105 starting backfill to osd.25 from (0'0,0'0] MAX to 1297'6782 > > --- > > Is the balancer enabled? Maybe it is adjusting the PG distribution a bit. > It is indeed and that would explain things, though I did run it manually a few times and the PGs are all within one of each other, so I didn't really expect any further adjustment needs as this is only having a single pool, RBD. Would be nice if it spoke up not just in the audit.log: --- 2019-08-07 15:38:21.092104 mon.ceph-05 (mon.0) 182680 : audit [INF] from='mgr.196195 10.0.8.25:0/960' entity='mgr.ceph-05' cmd=[{"item": "osd.0", "prefix": "osd crush weight-set reweight-compat", "weight": [2.504257053831929], "format": "json"}]: dispatch --- I turned it off now, as I don't expect significant variances going forward. Thanks, Christian > > Unfortunately all I have in the OSD log is this: > > --- > > 2019-08-07 15:38:23.461 7f155e71b700 1 osd.9 pg_epoch: 1299 pg[1.125( empty local-lis/les=0/0 n=0 ec=189/189 lis/c 1286/1286 les/c/f 1287/1287/0 1298/1299/189) [21,9,28]/[21,28,3] r=-1 lpr=1299 pi=[1286,1299)/1 crt=0'0 unknown mbc={}] state<Start>: transitioning to Stray > > 2019-08-07 15:38:24.353 7f155e71b700 1 osd.9 pg_epoch: 1301 pg[1.125( v 1297'21584 (1246'18584,1297'21584] local-lis/les=1299/1300 n=5 ec=189/189 lis/c 1299/1299 les/c/f 1300/1300/0 1298/1301/189) [21,9,28] r=1 lpr=1301 pi=[1299,1301)/1 luod=0'0 crt=1297'21584 active mbc={}] start_peering_interval up [21,9,28] -> [21,9,28], acting [21,28,3] -> [21,9,28], acting_primary 21 -> 21, up_primary 21 -> 21, role -1 -> 1, features acting 4611087854031667199 upacting 4611087854031667199 > > 2019-08-07 15:38:24.353 7f155e71b700 1 osd.9 pg_epoch: 1301 pg[1.125( v 1297'21584 (1246'18584,1297'21584] local-lis/les=1299/1300 n=5 ec=189/189 lis/c 1299/1299 les/c/f 1300/1300/0 1298/1301/189) [21,9,28] r=1 lpr=1301 pi=[1299,1301)/1 crt=1297'21584 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray > > --- > > > > How can I find out what happened here, given that it might not happen > > again anytime soon cranking up debug levels now is a tad late. > > In the past we had "problems" where the degraded count would increase in > cases where we were migrated PGs, even though there aren't actually any > objects with too few replicas. I think David Zafman ironed most/all > of these out, but perhaps they weren't all in Nautilus? I can't quite > remember. > > s > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Rakuten Mobile Inc. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx