On Fri, Nov 18, 2016 at 11:53 AM, Iain Buclaw <ibuclaw@xxxxxxxxx> wrote: > Hi, > > Follow up from the suggestion to use any of the following options: > > - client_mount_timeout > - rados_mon_op_timeout > - rados_osd_op_timeout > > To mitigate the waiting time being blocked on requests. Is there > really no other way around this? > > If two OSDs go down that between them have the both copies of an > object, it would be nice to have clients fail *immediately*. I've > tried reducing the rados_osd_op_timeout setting to 0.5, but when > things go wrong, it still results in the collapse of the cluster and > all reads from it. Can you be more specific about what is happening when you set rados_osd_op_timeout? You're not seeing timeouts at all, operations are blocking instead? If you can provide a short librados program that demonstrates an op blocking indefinitely even when a timeout is set, that would be useful. John > > Reducing the rados_osd_op_timeout down to 0.05 seems like a sure way > to cause more false positives. But in reality, if an OSD operation > can't serve in 150ms, then it's missed the train by over an hour. > > -- > Iain Buclaw > > *(p < e ? p++ : p) = (c & 0x0f) + '0'; > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com