Re: Bluestore caching oddities, again

Sage Weil <sage@xxxxxxxxxxxx> · Thu, 8 Aug 2019 02:23:15 +0000 (UTC)

On Thu, 8 Aug 2019, Christian Balzer wrote:
> 
> Hello again,
> 
> Getting back to this:
> On Sun, 4 Aug 2019 10:47:27 +0900 Christian Balzer wrote:
> 
> > Hello,
> > 
> > preparing the first production bluestore, nautilus (latest) based cluster
> > I've run into the same things other people and myself ran into before.
> > 
> > Firstly HW, 3 nodes with 12 SATA HDDs each, IT mode LSI 3008, wal/db on
> > 40GB SSD partitions. (boy do I hate the inability of ceph-volume to deal
> > with raw partitions).
> > SSDs aren't a bottleneck in any scenario.
> > Single E5-1650 v3 @ 3.50GHz, cpu isn't a bottleneck in any scenario, less
> > than 15% of a core per OSD.
> > 
> > Connection is via 40GB/s infiniband, IPoIB, no issues here as numbers later
> > will show.
> > 
> > Clients are KVMs on Epyc based compute nodes, maybe some more speed could
> > be squeezed out here with different VM configs, but the cpu isn't an issue
> > in the problem cases.
> > 
> > 
> > 
> > 1. 4k random I/O can cause degraded PGs
> > I've run into the same/similar issue as Nathan Fish here:
> > https://www.spinics.net/lists/ceph-users/msg526
> > During the first 2 tests with 4k random I/O I got shortly degraded PGs as
> > well, with no indication in CPU or SSD utilization accounting for this.
> > HDDs were of course busy at that time.
> > Wasn't able to reproduce this so far, but it leaves me less than
> > confident. 
> > 
> > 
> This happened again yesterday when rsyncing 260GB of average 4MB files
> into a Ceph image backed VM.
> Given the nature of this rsync nothing on the ceph nodes was the least bit
> busy, the HDDs were all below 15% utilization, CPU bored, etc.
> 
> Still we got:
> ---
> 2019-08-07 15:38:23.452580 osd.21 (osd.21) 651 : cluster [DBG] 1.125 starting backfill to osd.9 from (0'0,0'0] MAX to 1297'21584
> 2019-08-07 15:38:24.454942 mon.ceph-05 (mon.0) 182756 : cluster [WRN] Health check failed: Reduced data availability: 2 pgs peering (PG_AVAILABILITY)
> 2019-08-07 15:38:25.396756 mon.ceph-05 (mon.0) 182757 : cluster [DBG] osdmap e1302: 36 total, 36 up, 36 in
> 2019-08-07 15:38:23.452026 osd.12 (osd.12) 767 : cluster [DBG] 1.105 starting backfill to osd.25 from (0'0,0'0] MAX to 1297'6782
> ---

Is the balancer enabled?  Maybe it is adjusting the PG distribution a bit.

> Unfortunately all I have in the OSD log is this:
> ---
> 2019-08-07 15:38:23.461 7f155e71b700  1 osd.9 pg_epoch: 1299 pg[1.125( empty local-lis/les=0/0 n=0 ec=189/189 lis/c 1286/1286 les/c/f 1287/1287/0 1298/1299/189) [21,9,28]/[21,28,3] r=-1 lpr=1299 pi=[1286,1299)/1 crt=0'0 unknown mbc={}] state<Start>: transitioning to Stray
> 2019-08-07 15:38:24.353 7f155e71b700  1 osd.9 pg_epoch: 1301 pg[1.125( v 1297'21584 (1246'18584,1297'21584] local-lis/les=1299/1300 n=5 ec=189/189 lis/c 1299/1299 les/c/f 1300/1300/0 1298/1301/189) [21,9,28] r=1 lpr=1301 pi=[1299,1301)/1 luod=0'0 crt=1297'21584 active mbc={}] start_peering_interval up [21,9,28] -> [21,9,28], acting [21,28,3] -> [21,9,28], acting_primary 21 -> 21, up_primary 21 -> 21, role -1 -> 1, features acting 4611087854031667199 upacting 4611087854031667199
> 2019-08-07 15:38:24.353 7f155e71b700  1 osd.9 pg_epoch: 1301 pg[1.125( v 1297'21584 (1246'18584,1297'21584] local-lis/les=1299/1300 n=5 ec=189/189 lis/c 1299/1299 les/c/f 1300/1300/0 1298/1301/189) [21,9,28] r=1 lpr=1301 pi=[1299,1301)/1 crt=1297'21584 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
> ---
> 
> How can I find out what happened here, given that it might not happen
> again anytime soon cranking up debug levels now is a tad late.

In the past we had "problems" where the degraded count would increase in 
cases where we were migrated PGs, even though there aren't actually any 
objects with too few replicas.  I think David Zafman ironed most/all 
of these out, but perhaps they weren't all in Nautilus? I can't quite 
remember.

s
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx