Reducing backfilling/recovery long tail

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sam & Sage,

In the context of http://tracker.ceph.com/issues/9566 I'm inclined to think the best solution would be that the AsyncReserver choose a PG instead of just picking the next one in the list when there is a free slot. It would always choose a PG that must move to/from an OSDs for which there are more PGs waiting in the AsyncRerserver than any other OSD. The sort involved does not seem too expensive.

Calculating priorities before adding the PG to the AsyncReserver seems wrong because the state of the system will change significantly while the PG is waiting to be processed. For instance the first PGs to be added have a low priority while the next have increasing priorities when they accumulate. If reservations are canceled because the OSD map changed again (maybe another OSD is decommissioned before recovery of the first one completes), you may end up having high priorities for PGs that are no longer associated with busy OSDs. That could backfire and create even more frequent long tails.

What do you think ?

Cheers

-- 
Loïc Dachary, Artisan Logiciel Libre

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux