Hi all,
On one server with a cache tier on Samsung PM983 SSDs for an EC base
tier on HDDs, I find the cache tier stops flushing or evicting when the
cache tier is near full. With quite some gdb-debugging, I find the
problem may be with the throttling mechanism. When the write traffic is
high, the cache tier quickly fills its maximum request count and
throttles further requests. Then flush stops because copy-from requests
are throttled by the cache tier OSD. Ironically, the 256 requests
already accepted by the cache tier cannot proceed, either, because the
cache tier is full and cannot flush/evict.
While we may advise cache tier should not go full, this deadlock
situation is not entirely comprehensible to me because a full cache
usually can flush/evict as long as the base tier has space.
I wonder whether there has been some specific reasons for this behavior.
My test environment is with version 15.2.17 but the code in 17.2.2
appears to handle this part of logic in the same way.
Cheers,
lin
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx