Re: osd daemons still reading disks at full speed while there is no pool activity

Eugen Block <eblock@xxxxxx> · Wed, 03 Nov 2021 11:41:28 +0000

Hi,

I don't have an explanation but I remember having a similar issue a  
year ago or so. IIRC a simple OSD restart fixed that, so I never got  
to the bottom of it. Have you tried to restart OSD daemons?

Zitat von Nikola Ciprich <nikola.ciprich@xxxxxxxxxxx>:

Hello fellow ceph users,

I'm trying to catch ghost here.. On one of our clusters, 6 nodes,
14.2.15, EC pool 4+2, 6*32 SATA bluestore OSDs we got into very strange
state.

The cluster is clean (except for pgs not deep-scrubbed in time warning,
since we've disabled scrubbing while investigating), there is absolutely
no activity on EC pool, but according to atop, all OSDs are still reading
furiously, without any apparent reason. even when increasing osd loglevel,
I don't see anything interesting, except for occasional
2021-11-03 12:04:52.664 7fb8652e3700  5 osd.0 9347 heartbeat  
osd_stat(store_statfs(0xb80056c0000/0x26b570000/0xe8d7fc00000, data  
0x2f0ddd813e8/0x30b0ee60000, compress 0x0/0x0/0x0, omap 0x98b706,  
meta 0x26abe48fa), peers  
[1,26,27,34,36,40,44,49,52,55,57,65,69,75,76,78,82,83,87,93,96,97,104,105,107,108,111,112,114,120,121,122,123,135,136,137,143,147,154,156,157,169,171,187,192,196,200,204,208,212,217,218,220,222,224,226,227] op hist  
[])
and also compactions stats.

trying to sequentially read data from the pool leads to very poor  
performance (ie 8MB/s)

We've had very similar problem on different cluster (replicated, no EC), when
osdmaps were not pruned correctly, but I checked and those seem to  
be OK, it's just
OSD are still reading something and I'm unable to find out what.

here's output of crush for one node, others are pretty similar:

 -1       2803.19824        - 2.7 PiB  609 TiB  607 TiB 1.9 GiB  1.9  
TiB 2.1 PiB 21.78 1.01   -        root sata
 -2        467.19971        - 466 TiB  102 TiB  101 TiB 320 MiB  328  
GiB 364 TiB 21.83 1.01   -            host spbstdv1a-sata
  0   hdd   14.59999  1.00000  15 TiB  3.1 TiB  3.0 TiB 9.5 MiB  9.7  
GiB  12 TiB 20.98 0.97  51     up         osd.0
  1   hdd   14.59999  1.00000  15 TiB  2.4 TiB  2.4 TiB 7.4 MiB  7.7  
GiB  12 TiB 16.34 0.76  50     up         osd.1
  2   hdd   14.59999  1.00000  15 TiB  3.5 TiB  3.5 TiB  11 MiB   11  
GiB  11 TiB 24.33 1.13  51     up         osd.2
  3   hdd   14.59999  1.00000  15 TiB  2.9 TiB  2.8 TiB 9.3 MiB  9.1  
GiB  12 TiB 19.58 0.91  48     up         osd.3
  4   hdd   14.59999  1.00000  15 TiB  3.3 TiB  3.3 TiB  11 MiB   11  
GiB  11 TiB 22.94 1.06  51     up         osd.4
  5   hdd   14.59999  1.00000  15 TiB  3.5 TiB  3.5 TiB  12 MiB   12  
GiB  11 TiB 23.94 1.11  50     up         osd.5
  6   hdd   14.59999  1.00000  15 TiB  2.8 TiB  2.8 TiB 9.6 MiB  9.6  
GiB  12 TiB 19.11 0.89  49     up         osd.6
  7   hdd   14.59999  1.00000  15 TiB  3.4 TiB  3.4 TiB 4.9 MiB   11  
GiB  11 TiB 23.68 1.10  50     up         osd.7
  8   hdd   14.59998  1.00000  15 TiB  3.2 TiB  3.2 TiB  10 MiB   10  
GiB  11 TiB 22.18 1.03  51     up         osd.8
  9   hdd   14.59999  1.00000  15 TiB  3.4 TiB  3.4 TiB 4.9 MiB   11  
GiB  11 TiB 23.52 1.09  50     up         osd.9
 10   hdd   14.59999  1.00000  15 TiB  2.7 TiB  2.6 TiB 8.5 MiB  8.5  
GiB  12 TiB 18.25 0.85  50     up         osd.10
 11   hdd   14.59999  1.00000  15 TiB  3.4 TiB  3.3 TiB  10 MiB   11  
GiB  11 TiB 23.02 1.07  51     up         osd.11
 12   hdd   14.59999  1.00000  15 TiB  2.8 TiB  2.8 TiB  10 MiB  9.7  
GiB  12 TiB 19.53 0.91  49     up         osd.12
 13   hdd   14.59999  1.00000  15 TiB  3.7 TiB  3.7 TiB  11 MiB   12  
GiB  11 TiB 25.62 1.19  49     up         osd.13
 14   hdd   14.59999  1.00000  15 TiB  2.6 TiB  2.6 TiB 8.2 MiB  8.3  
GiB  12 TiB 17.65 0.82  53     up         osd.14
 15   hdd   14.59999  1.00000  15 TiB  2.5 TiB  2.5 TiB 7.6 MiB  7.8  
GiB  12 TiB 17.42 0.81  50     up         osd.15
 16   hdd   14.59999  1.00000  15 TiB  3.5 TiB  3.5 TiB  11 MiB   11  
GiB  11 TiB 24.37 1.13  50     up         osd.16
 17   hdd   14.59999  1.00000  15 TiB  3.5 TiB  3.5 TiB  12 MiB   12  
GiB  11 TiB 24.09 1.12  52     up         osd.17
 18   hdd   14.59999  1.00000  15 TiB  2.4 TiB  2.4 TiB 6.9 MiB  7.5  
GiB  12 TiB 16.79 0.78  49     up         osd.18
 19   hdd   14.59999  1.00000  15 TiB  3.3 TiB  3.3 TiB 9.9 MiB   10  
GiB  11 TiB 22.91 1.06  50     up         osd.19
 20   hdd   14.59999  1.00000  15 TiB  3.6 TiB  3.6 TiB  12 MiB   12  
GiB  11 TiB 25.02 1.16  49     up         osd.20
 21   hdd   14.59999  1.00000  15 TiB  3.4 TiB  3.4 TiB  14 MiB   12  
GiB  11 TiB 23.45 1.09  51     up         osd.21
 22   hdd   14.59999  1.00000  15 TiB  3.3 TiB  3.3 TiB  12 MiB   11  
GiB  11 TiB 22.64 1.05  51     up         osd.22
 23   hdd   14.59999  1.00000  15 TiB  2.9 TiB  2.8 TiB 9.2 MiB  9.3  
GiB  12 TiB 19.59 0.91  51     up         osd.23
 24   hdd   14.59999  1.00000  15 TiB  3.4 TiB  3.3 TiB  12 MiB   11  
GiB  11 TiB 23.04 1.07  50     up         osd.24
 25   hdd   14.59999  1.00000  15 TiB  3.1 TiB  3.1 TiB  10 MiB  9.9  
GiB  11 TiB 21.61 1.00  50     up         osd.25
162   hdd   14.59999  1.00000  15 TiB  3.2 TiB  3.2 TiB  10 MiB   10  
GiB  11 TiB 21.76 1.01  50     up         osd.162
163   hdd   14.59999  1.00000  15 TiB  3.4 TiB  3.4 TiB  11 MiB   11  
GiB  11 TiB 23.60 1.09  50     up         osd.163
164   hdd   14.59999  1.00000  15 TiB  3.5 TiB  3.5 TiB  12 MiB   11  
GiB  11 TiB 24.38 1.13  51     up         osd.164
165   hdd   14.59999  1.00000  15 TiB  2.9 TiB  2.9 TiB 9.1 MiB  9.5  
GiB  12 TiB 20.18 0.94  50     up         osd.165
166   hdd   14.59999  1.00000  15 TiB  3.3 TiB  3.3 TiB  11 MiB   11  
GiB  11 TiB 22.62 1.05  50     up         osd.166
167   hdd   14.59999  1.00000  15 TiB  3.5 TiB  3.5 TiB  12 MiB   12  
GiB  11 TiB 24.36 1.13  52     up         osd.167

most of OSD settings are defaults, cache autotune, memory_target 4GB etc.

there is absolutely no activity on this (or any related) pool, just  
on one replicated, on different
drives, there are about 30MB/s writes. al lboxes are almost idle,  
have enough RAM. unfortunately
OSDs do not use any fast storage for WAL or any DB.

anyone met similar problem? Or somebody has hint on how to debug  
what are OSDs reading all the time?

I'd be very grateful

with best regards

nikola ciprich

--
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28.rijna 168, 709 00 Ostrava

tel.:   +420 591 166 214
fax:    +420 596 621 273
mobil:  +420 777 093 799
www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@xxxxxxxxxxx
-------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx