OSD's load_pgs takes a lot of time

Michał Szymański <michalszymanski91@xxxxxxxxx> · Tue, 26 Aug 2014 09:43:15 +0200

I have noticed that sometimes it takes a lot of time for an OSD to go
back up and in. From what i can see in the logs it is stuck on
load_pgs for a while:
2014-08-21 15:32:04.711048 7fba11569780  0 osd.1 139 load_pgs
2014-08-21 15:32:04.712512 7fba11569780 10 osd.1 139 load_pgs
3.165_TEMP clearing temp
2014-08-21 15:32:19.648610 7fba11569780 10 osd.1 139 load_pgs
3.13b_TEMP clearing temp
2014-08-21 15:32:34.674773 7fba11569780 10 osd.1 139 load_pgs
3.36b_TEMP clearing temp

It happens when you restart an OSD while there was an ongoing recovery
in the cluster. The process isn't neither IO nor CPU heavy at that
time, and judging by strace output it mostly does futex calls and a
little IO on PGs. I am using Ceph 0.80.5.

Have anybody noticed this behavior? Isn't it possible to clear temp faster?

-- 
Pozdrawiam
Michał Szymański
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html