Hello Mark, after reading http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/ again i'm really confused how the behaviour is exactly under 12.2.8 regarding memory and 12.2.10. Also i stumpled upon "When tcmalloc and cache autotuning is enabled," - we're compiling against and using jemalloc. What happens in this case? Also i saw now - that 12.2.10 uses 1GB mem max while 12.2.8 uses 6-7GB Mem (with bluestore_cache_size = 1073741824). Greets, Stefan Am 17.01.19 um 22:59 schrieb Stefan Priebe - Profihost AG: > Hello Mark, > > for whatever reason i didn't get your mails - most probably you kicked > me out of CC/TO and only sent to the ML? I've only subscribed to a daily > digest. (changed that for now) > > So i'm very sorry to answer so late. > > My messages might sound a bit confuse as it isn't easy reproduced and we > tried a lot to find out what's going on. > > As 12.2.10 does not contain the pg hard limit i don't suspect it is > related to it. > > What i can tell right now is: > > 1.) Under 12.2.8 we've set bluestore_cache_size = 1073741824 > > 2.) While upgrading to 12.2.10 we replaced it with osd_memory_target = > 1073741824 > > 3.) i also tried 12.2.10 without setting osd_memory_target or > bluestore_cache_size > > 4.) it's not kernel related - for some unknown reason it worked for some > hours with a newer kernel but gave problems again later > > 5.) a backfill with 12.2.10 of 6x 2TB SSDs took about 14 hours using > 12.2.10 while it took 2 hours with 12.2.8 > > 6.) with 12.2.10 i have a constant rate of 100% read i/o (400-500MB/s) > on most of my bluestore OSDs - while on 12.2.8 i've 100kb - 2MB/s max > read on 12.2.8. > > 7.) upgrades on small clusters or fresh installs seem to work fine. (no > idea why or it is related to cluste size) > > That's currently all i know. > > Thanks a lot! > > Greets, > Stefan > Am 16.01.19 um 20:56 schrieb Stefan Priebe - Profihost AG: >> i reverted the whole cluster back to 12.2.8 - recovery speed also >> dropped from 300-400MB/s to 20MB/s on 12.2.10. So something is really >> broken. >> >> Greets, >> Stefan >> Am 16.01.19 um 16:00 schrieb Stefan Priebe - Profihost AG: >>> This is not the case with 12.2.8 - it happens with 12.2.9 as well. After >>> boot all pgs are instantly active - not inactive pgs at least not >>> noticable in ceph -s. >>> >>> With 12.2.9 or 12.2.10 or eben current upstream/luminous it takes >>> minutes until all pgs are active again. >>> >>> Greets, >>> Stefan >>> Am 16.01.19 um 15:22 schrieb Stefan Priebe - Profihost AG: >>>> Hello, >>>> >>>> while digging into this further i saw that it takes ages until all pgs >>>> are active. After starting the OSD 3% of all pgs are inactive and it >>>> takes minutes after they're active. >>>> >>>> The log of the OSD is full of: >>>> >>>> >>>> 2019-01-16 15:19:13.568527 7fecbf7da700 0 osd.33 pg_epoch: 1318479 >>>> pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747 >>>> 21,1318474'61584855] local-lis/les=1318472/1318473 n=1912 >>>> ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131 >>>> 8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1 >>>> rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec >>>> overing+degraded m=184 snaptrimq=[ec1a0~1,ec808~1] >>>> mbc={255={(2+0)=185,(3+0)=2}}] _update_calc_stats ml 185 upset size 3 up 2 >>>> 2019-01-16 15:19:13.568637 7fecbf7da700 0 osd.33 pg_epoch: 1318479 >>>> pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747 >>>> 21,1318474'61584855] local-lis/les=1318472/1318473 n=1912 >>>> ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131 >>>> 8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1 >>>> rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec >>>> overing+degraded m=184 snaptrimq=[ec1a0~1,ec808~1] >>>> mbc={255={(2+0)=185,(3+0)=2}}] _update_calc_stats ml 2 upset size 3 up 3 >>>> 2019-01-16 15:19:15.909327 7fecbf7da700 0 osd.33 pg_epoch: 1318479 >>>> pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747 >>>> 21,1318474'61584855] local-lis/les=1318472/1318473 n=1912 >>>> ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131 >>>> 8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1 >>>> rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec >>>> overing+degraded m=183 snaptrimq=[ec1a0~1,ec808~1] >>>> mbc={255={(2+0)=184,(3+0)=3}}] _update_calc_stats ml 184 upset size 3 up 2 >>>> 2019-01-16 15:19:15.909446 7fecbf7da700 0 osd.33 pg_epoch: 1318479 >>>> pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747 >>>> 21,1318474'61584855] local-lis/les=1318472/1318473 n=1912 >>>> ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131 >>>> 8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1 >>>> rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec >>>> overing+degraded m=183 snaptrimq=[ec1a0~1,ec808~1] >>>> mbc={255={(2+0)=184,(3+0)=3}}] _update_calc_stats ml 3 upset size 3 up 3 >>>> 2019-01-16 15:19:23.503231 7fecb97ff700 0 osd.33 pg_epoch: 1318479 >>>> pg[5.563( v 1318474'61584855 lc 1318356'61576253 (1318287'615747 >>>> 21,1318474'61584855] local-lis/les=1318472/1318473 n=1912 >>>> ec=133405/133405 lis/c 1318472/1278145 les/c/f 1318473/1278148/1211861 131 >>>> 8472/1318472/1318472) [33,3,22] r=0 lpr=1318472 pi=[1278145,1318472)/1 >>>> rops=4 crt=1318474'61584855 mlcod 1318356'61576253 active+rec >>>> overing+degraded m=183 snaptrimq=[ec1a0~1,ec808~1] >>>> mbc={255={(2+0)=183,(3+0)=3}}] _update_calc_stats ml 183 upset size 3 up 2 >>>> >>>> Greets, >>>> Stefan >>>> Am 16.01.19 um 09:12 schrieb Stefan Priebe - Profihost AG: >>>>> Hi, >>>>> >>>>> no ok it was not. Bug still present. It was only working because the >>>>> osdmap was so far away that it has started backfill instead of recovery. >>>>> >>>>> So it happens only in the recovery case. >>>>> >>>>> Greets, >>>>> Stefan >>>>> >>>>> Am 15.01.19 um 16:02 schrieb Stefan Priebe - Profihost AG: >>>>>> >>>>>> Am 15.01.19 um 12:45 schrieb Marc Roos: >>>>>>> >>>>>>> I upgraded this weekend from 12.2.8 to 12.2.10 without such issues >>>>>>> (osd's are idle) >>>>>> >>>>>> >>>>>> it turns out this was a kernel bug. Updating to a newer kernel - has >>>>>> solved this issue. >>>>>> >>>>>> Greets, >>>>>> Stefan >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Stefan Priebe - Profihost AG [mailto:s.priebe@xxxxxxxxxxxx] >>>>>>> Sent: 15 January 2019 10:26 >>>>>>> To: ceph-users@xxxxxxxxxxxxxx >>>>>>> Cc: n.fahldieck@xxxxxxxxxxxx >>>>>>> Subject: Re: slow requests and high i/o / read rate on >>>>>>> bluestore osds after upgrade 12.2.8 -> 12.2.10 >>>>>>> >>>>>>> Hello list, >>>>>>> >>>>>>> i also tested current upstream/luminous branch and it happens as well. A >>>>>>> clean install works fine. It only happens on upgraded bluestore osds. >>>>>>> >>>>>>> Greets, >>>>>>> Stefan >>>>>>> >>>>>>> Am 14.01.19 um 20:35 schrieb Stefan Priebe - Profihost AG: >>>>>>>> while trying to upgrade a cluster from 12.2.8 to 12.2.10 i'm >>>>>>> experience >>>>>>>> issues with bluestore osds - so i canceled the upgrade and all >>>>>>> bluestore >>>>>>>> osds are stopped now. >>>>>>>> >>>>>>>> After starting a bluestore osd i'm seeing a lot of slow requests >>>>>>> caused >>>>>>>> by very high read rates. >>>>>>>> >>>>>>>> >>>>>>>> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s >>>>>>>> avgrq-sz avgqu-sz await r_await w_await svctm %util >>>>>>>> sda 45,00 187,00 767,00 39,00 482040,00 8660,00 >>>>>>>> 1217,62 58,16 74,60 73,85 89,23 1,24 100,00 >>>>>>>> >>>>>>>> it reads permanently with 500MB/s from the disk and can't service >>>>>>> client >>>>>>>> requests. Overall client read rate is at 10.9MiB/s rd >>>>>>>> >>>>>>>> I can't reproduce this with 12.2.8. Is this a known bug / regression? >>>>>>>> >>>>>>>> Greets, >>>>>>>> Stefan >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>>>> > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com