Downstream IO circuit-breaker in RGW ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(Please retain the CC list in your replies)

Our Ceph deployment is a S3 service with an SSD index pool, and HDD
data pool. We often see service outages due to blocked requests
against latent OSDs, mostly at the index pool.

I have been looking at code-changes in the RGW IO path that fence-off
latent OSDs or fast-fail IOs targeted to such OSDs; ie. something like
a circuit breaker pattern. A "retry-after" header is inserted in user
responses for such failed user requests.

The above circuit-breaker uses local knowledge at each RGW, ie. there
is no central state about latent OSDs at the MON or elsewhere -- maybe
this is something that can be piggy-backed on the OSD map maintained
by the MON, or pushed to the ceph-mgr.

Any thoughts or suggestions on the above ?

(I was not sure about the folks to target this mail to, please
re-direct as appropriate.)

-- 
Rolland
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux