Trent Lloyd writes: > Jens-Christian Fischer <jens-christian.fischer@...> writes: >> >> I think we (i.e. Christian) found the problem: >> We created a test VM with 9 mounted RBD volumes (no NFS server). As soon as > he hit all disks, we started to experience these 120 second timeouts. We > realized that the QEMU process on the hypervisor is opening a TCP connection > to every OSD for every mounted volume - exceeding the 1024 FD limit. >> >> So no deep scrubbing etc, but simply to many connections… > Have seen mention of similar from CERN in their presentations, found this > post on a quick google.. might help? > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-December/026187.html Yes, that's exactly the problem that we had. We solved it by setting max_files to 8191 in /etc/libvirt/qemu.conf on all compute hosts. Once that was applied, we were able to live-migrate running instances for them to enjoy the increased limit. -- Simon. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com