Re: Bluestore caching oddities, again

Christian Balzer <chibi@xxxxxxx> · Thu, 8 Aug 2019 11:09:00 +0900

Hello again,

Getting back to this:
On Sun, 4 Aug 2019 10:47:27 +0900 Christian Balzer wrote:

> Hello,
> 
> preparing the first production bluestore, nautilus (latest) based cluster
> I've run into the same things other people and myself ran into before.
> 
> Firstly HW, 3 nodes with 12 SATA HDDs each, IT mode LSI 3008, wal/db on
> 40GB SSD partitions. (boy do I hate the inability of ceph-volume to deal
> with raw partitions).
> SSDs aren't a bottleneck in any scenario.
> Single E5-1650 v3 @ 3.50GHz, cpu isn't a bottleneck in any scenario, less
> than 15% of a core per OSD.
> 
> Connection is via 40GB/s infiniband, IPoIB, no issues here as numbers later
> will show.
> 
> Clients are KVMs on Epyc based compute nodes, maybe some more speed could
> be squeezed out here with different VM configs, but the cpu isn't an issue
> in the problem cases.
> 
> 
> 
> 1. 4k random I/O can cause degraded PGs
> I've run into the same/similar issue as Nathan Fish here:
> https://www.spinics.net/lists/ceph-users/msg526
> During the first 2 tests with 4k random I/O I got shortly degraded PGs as
> well, with no indication in CPU or SSD utilization accounting for this.
> HDDs were of course busy at that time.
> Wasn't able to reproduce this so far, but it leaves me less than
> confident. 
> 
> 
This happened again yesterday when rsyncing 260GB of average 4MB files
into a Ceph image backed VM.
Given the nature of this rsync nothing on the ceph nodes was the least bit
busy, the HDDs were all below 15% utilization, CPU bored, etc.

Still we got:
---
2019-08-07 15:38:23.452580 osd.21 (osd.21) 651 : cluster [DBG] 1.125 starting backfill to osd.9 from (0'0,0'0] MAX to 1297'21584
2019-08-07 15:38:24.454942 mon.ceph-05 (mon.0) 182756 : cluster [WRN] Health check failed: Reduced data availability: 2 pgs peering (PG_AVAILABILITY)
2019-08-07 15:38:25.396756 mon.ceph-05 (mon.0) 182757 : cluster [DBG] osdmap e1302: 36 total, 36 up, 36 in
2019-08-07 15:38:23.452026 osd.12 (osd.12) 767 : cluster [DBG] 1.105 starting backfill to osd.25 from (0'0,0'0] MAX to 1297'6782
---

Unfortunately all I have in the OSD log is this:
---
2019-08-07 15:38:23.461 7f155e71b700  1 osd.9 pg_epoch: 1299 pg[1.125( empty local-lis/les=0/0 n=0 ec=189/189 lis/c 1286/1286 les/c/f 1287/1287/0 1298/1299/189) [21,9,28]/[21,28,3] r=-1 lpr=1299 pi=[1286,1299)/1 crt=0'0 unknown mbc={}] state<Start>: transitioning to Stray
2019-08-07 15:38:24.353 7f155e71b700  1 osd.9 pg_epoch: 1301 pg[1.125( v 1297'21584 (1246'18584,1297'21584] local-lis/les=1299/1300 n=5 ec=189/189 lis/c 1299/1299 les/c/f 1300/1300/0 1298/1301/189) [21,9,28] r=1 lpr=1301 pi=[1299,1301)/1 luod=0'0 crt=1297'21584 active mbc={}] start_peering_interval up [21,9,28] -> [21,9,28], acting [21,28,3] -> [21,9,28], acting_primary 21 -> 21, up_primary 21 -> 21, role -1 -> 1, features acting 4611087854031667199 upacting 4611087854031667199
2019-08-07 15:38:24.353 7f155e71b700  1 osd.9 pg_epoch: 1301 pg[1.125( v 1297'21584 (1246'18584,1297'21584] local-lis/les=1299/1300 n=5 ec=189/189 lis/c 1299/1299 les/c/f 1300/1300/0 1298/1301/189) [21,9,28] r=1 lpr=1301 pi=[1299,1301)/1 crt=1297'21584 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
---

How can I find out what happened here, given that it might not happen
again anytime soon cranking up debug levels now is a tad late.

Thanks,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Rakuten Mobile Inc.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx