HDD OSD 100% busy reading OMAP keys RGW

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On a cluster running RGW only I'm running into BlueStore 12.2.11 OSDs
being 100% busy sometimes.

This cluster has 85k stale indexes (stale-instances list) and I've been
slowly trying to remove them.

I noticed that regularly OSDs read their HDD heavily and that device
then becomes 100% busy. (iostat)

$ radosgw-admin reshard stale-instances list > stale.json
$ cat stale.json|jq -r '.[]'|wc -l

I increased debug_bluefs and debug_bluestore to 10 and I found:

2019-02-14 05:11:18.417097 7f627732d700 10
bluestore(/var/lib/ceph/osd/ceph-266) omap_get_header 13.205_head oid
#13:a05231a1:::.dir.ams02.36062237.821.79:head# = 0
2019-02-14 05:11:18.417127 7f627732d700 10
bluestore(/var/lib/ceph/osd/ceph-266) get_omap_iterator 13.205_head
#13:a05231a1:::.dir.ams02.36062237.821.79:head#
2019-02-14 05:11:18.417133 7f627732d700 10
bluestore(/var/lib/ceph/osd/ceph-266) get_omap_iterator has_omap = 1

2019-02-14 05:11:18.417169 7f627732d700 10 bluefs _read_random h
0x560cb77c5080 0x17a8cd0~fba from file(ino 71562 size 0x2d96a43 mtime
2019-02-14 02:52:16.370746 bdev 1 allocated 2e00000 extents
[1:0x3228f00000+2e00000])
2019-02-14 05:11:23.129645 7f627732d700 10 bluefs _read_random h
0x560c14167780 0x17bb6b7~f52 from file(ino 68900 size 0x41919ef mtime
2019-02-01 01:19:59.216218 bdev 1 allocated 4200000 extents
[1:0x8b31a00000+200000,1:0x8b31e00000+e00000,1:0x8b32d00000+1700000,1:0x8b3ce00000+1b00000])
2019-02-14 05:11:23.144550 7f627732d700 10 bluefs _read_random h
0x560c14c86b80 0x96d020~ef3 from file(ino 67189 size 0x419b603 mtime
2019-02-01 00:45:12.743836 bdev 1 allocated 4200000 extents
[1:0x53da9a00000+4200000])

2019-02-14 05:11:23.149958 7f627732d700 10
bluestore(/var/lib/ceph/osd/ceph-266) omap_get_header 13.e8_head oid
#13:171bcbd3:::.dir.ams02.39023047.682.114:head# = 0
2019-02-14 05:11:23.149975 7f627732d700 10
bluestore(/var/lib/ceph/osd/ceph-266) get_omap_iterator 13.e8_head
#13:171bcbd3:::.dir.ams02.39023047.682.114:head#
2019-02-14 05:11:23.149981 7f627732d700 10
bluestore(/var/lib/ceph/osd/ceph-266) get_omap_iterator has_omap = 1

2019-02-14 05:11:23.150012 7f627732d700 10 bluefs _read_random h
0x560c14e42500 0x1a18670~ff0 from file(ino 71519 size 0x417a60f mtime
2019-02-14 02:51:35.125629 bdev 1 allocated 4200000 extents
[1:0x1c30d00000+4200000])
2019-02-14 05:11:23.155679 7f627732d700 10 bluefs _read_random h
0x560c1c1ab980 0xedad4c~fde from file(ino 71391 size 0x25d4a89 mtime
2019-02-13 22:25:22.801676 bdev 1 allocated 2600000 extents
[1:0x38b00000+2600000])
2019-02-14 05:11:23.158995 7f627732d700 10 bluefs _read_random h
0x560c1c1ab980 0xedbd2a~fba from file(ino 71391 size 0x25d4a89 mtime
2019-02-13 22:25:22.801676 bdev 1 allocated 2600000 extents
[1:0x38b00000+2600000])
2019-02-14 05:11:23.159233 7f627732d700 10 bluefs _read_random h
0x560c14c86b80 0x96df13~fca from file(ino 67189 size 0x419b603 mtime
2019-02-01 00:45:12.743836 bdev 1 allocated 4200000 extents
[1:0x53da9a00000+4200000])
2019-02-14 05:11:23.159456 7f627732d700 10 bluefs _read_random h
0x560c14c86b80 0x96eedd~f1b from file(ino 67189 size 0x419b603 mtime
2019-02-01 00:45:12.743836 bdev 1 allocated 4200000 extents
[1:0x53da9a00000+4200000])
2019-02-14 05:11:23.159639 7f627732d700 10 bluefs _read_random h
0x560c14c86b80 0x96fdf8~eba from file(ino 67189 size 0x419b603 mtime
2019-02-01 00:45:12.743836 bdev 1 allocated 4200000 extents
[1:0x53da9a00000+4200000])

After this the _random_read just continues for thousands of lines and
the OSD becomes very slow. Slow requests, heartbeat timeouts and even
OSDs being marked as down.

So I tried to list the omap keys:

$ rados -p .rgw.buckets.index listomapkeys .dir.ams02.36062237.821.79
$ rados -p .rgw.buckets.index listomapkeys .dir.ams02.39023047.682.114

.dir.ams02.36062237.821.79: <1s
.dir.ams02.39023047.682.114: ~30s

Both objects are on the same OSD (266), but if I list the omapkeys for
the last Object the disk jumps to 100% busy and stays there for ~1min.

I've seen this before with RBD and SSD-backed OSDs where on a omap key
list the disks would jump to 100% busy and cause slow requests.

This case seems very similar to that one.

I can keep triggering it in this case by listing the omap keys for
object .dir.ams02.39023047.682.114, which doesn't have any keys though.

Has anybody seen this before?

Some information:

- Ceph 12.2.11
- BlueStore (default memory target of 4G)
- RGW-only use-case
- WAL+DB+DATA on HDD

Thanks,

Wido

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux