On 16 August 2016 at 17:13, Wido den Hollander <wido@xxxxxxxx> wrote: > >> Op 16 augustus 2016 om 15:59 schreef Iain Buclaw <ibuclaw@xxxxxxxxx>: >> >> >> The desired behaviour for me would be for the client to get an instant >> "not found" response from stat() operations. For write() to recreate >> unfound objects. And for missing placement groups to be recreated on >> an OSD that isn't overloaded. Halting the entire cluster when 96% of >> it can still be accessed is just not workable, I'm afraid. >> > > Well, you can't make Ceph do that, but you can make librados do such a thing. > > I'm using the OSD and MON timeout settings in libvirt for example: http://libvirt.org/git/?p=libvirt.git;a=blob;f=src/storage/storage_backend_rbd.c;h=9665fbca3a18fbfc7e4caec3ee8e991e13513275;hb=HEAD#l157 > > You can set these options: > - client_mount_timeout > - rados_mon_op_timeout > - rados_osd_op_timeout > > Where I think only the last two should be sufficient in your case. > > You wel get ETIMEDOUT back as error when a operation times out. > > Wido > This seems to be fine. Now what to do when a DR situation happens. pgmap v592589: 4096 pgs, 1 pools, 1889 GB data, 244 Mobjects 2485 GB used, 10691 GB / 13263 GB avail 3902 active+clean 128 creating 66 incomplete These PGs just never seem to finish creating. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com