I think Greg (who appears to be a ceph
committer) basically said he was interested in looking at it, if
only you had the pool that failed this way.
Why not try to reproduce it, and make a log of your procedure so he can reproduce it too? What caused the slow requests... copy on write from snapshots? A bad disk? exclusive-lock with 2 clients writing at the same time maybe? I'd be interested in a solution too... like why can't idle disks (non-full disk queue) mean that the osd op or whatever queue can still fill with requests not related to the blocked pg/objects? I would love for ceph to handle this better. I suspect some issues I have are related to this (slow requests on one VM can freeze others [likely blame the osd], even requiring kill -9 [likely blame client librbd]). On 03/22/17 16:18, Alejandro Comisario wrote:
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com