Re: Slow requests blocked. No rebalancing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

to reduce impact on clients during migration I would set the OSD's primary-affinity to 0 beforehand. This should prevent the slow requests, at least this setting has helped us a lot with problematic OSDs.

Regards
Eugen


Zitat von Jaime Ibar <jaime@xxxxxxxxxxxx>:

Hi all,

we recently upgrade from Jewel 10.2.10 to Luminous 12.2.7, now we're trying to migrate the

osd's to Bluestore following this document[0], however when I mark the osd as out,

I'm getting warnings similar to these ones

2018-09-20 09:32:46.079630 mon.dri-ceph01 [WRN] Health check failed: 2 slow requests are blocked > 32 sec. Implicated osds 16,28 (REQUEST_SLOW) 2018-09-20 09:32:52.841123 mon.dri-ceph01 [WRN] Health check update: 7 slow requests are blocked > 32 sec. Implicated osds 10,16,28,32,59 (REQUEST_SLOW) 2018-09-20 09:32:57.842230 mon.dri-ceph01 [WRN] Health check update: 15 slow requests are blocked > 32 sec. Implicated osds 10,16,28,31,32,59,78,80 (REQUEST_SLOW)

2018-09-20 09:32:58.851142 mon.dri-ceph01 [WRN] Health check update: 244944/40100780 objects misplaced (0.611%) (OBJECT_MISPLACED) 2018-09-20 09:32:58.851160 mon.dri-ceph01 [WRN] Health check update: 249 PGs pending on creation (PENDING_CREATING_PGS)

which prevent ceph start rebalancing and the vm's running on ceph start hanging and we have to mark the osd back in.

I tried to reweight the osd to 0.90 in order to minimize the impact on the cluster but the warnings are the same.

I tried to increased these settings to

mds cache memory limit = 2147483648
rocksdb cache size = 2147483648

but with no luck, same warnings.

We also have cephfs for storing files from different projects(no directory fragmentation enabled).

The problem here is that if one osd dies, all the services will be blocked as ceph won't be able to

start rebalancing.

The cluster is

- 3 mons

- 3 mds(running on the same hosts as the mons). 2 mds active and 1 standby

- 3 mgr(running on the same hosts as the mons)

- 6 servers, 12 osd's each.

- 1GB private network


Does anyone know how to fix or where the problem could be?

Thanks a lot in advance.

Jaime


[0] http://docs.ceph.com/docs/luminous/rados/operations/bluestore-migration/

--

Jaime Ibar
High Performance & Research Computing, IS Services
Lloyd Building, Trinity College Dublin, Dublin 2, Ireland.
http://www.tchpc.tcd.ie/  |jaime@xxxxxxxxxxxx
Tel: +353-1-896-3725



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux