Hi all, Not sure actually where does this bug belong to - OpenStack or Ceph - but writing here in humble hope that anyone faced that issue also. I configured test OpenStack instance with Glance images stored in Ceph 0.94.3. Nova has local storage. But when I'm trying to launch instance from large image stored in Ceph - it fails to spawn with such an error in nova-conductor.log: 2015-09-04 11:52:35.076 3605449 ERROR nova.scheduler.utils [req-c6af3eca-f166-45bd-8edc-b8cfadeb0d0b 82c1f134605e4ee49f65015dda96c79a 448cc6119e514398ac2793d043d4fa02 - - -] [instance: 18c9f1d5-50e8-426f-94d5-167f43129ea6] Error from last host: slpeah005 (node slpeah005.cloud): [u'Traceback (most recent call last):\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2220, in _do_build_and_run_instance\n filter_properties)\n', u' File "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2363, in _build_and_run_instance\n instance_uuid=instance.uuid, reason=six.text_type(e))\n', u'RescheduledException: Build of instance 18c9f1d5-50e8-426f-94d5-167f43129ea6 was re-scheduled: [Errno 32] Corrupt image download. Checksum was 625d0686a50f6b64e57b1facbc042248 expected 4a7de2fbbd01be5c6a9e114df145b027\n'] So nova tries 3 different hosts with the same error messages on every single one and then fails to spawn an instance. I've tried Cirros little image and it works fine with it. Issue happens with large images like 10Gb in size. I also managed to look into /var/lib/nova/instances/_base folder and found out that image is actually being downloaded but at some moment the download process interrupts for some unknown reason and instance gets deleted. I looked at the syslog and found many messages like that: Sep 4 12:51:37 slpeah003 ceph-osd: 2015-09-04 12:51:37.735094 7f092dfd1700 -1 osd.3 3025 heartbeat_check: no reply from osd.22 since back 2015-09-04 12:51:31.834203 front 2015-09-04 12:51:31.834203 (cutoff 2015-09-04 12:51:32.735011) Sep 4 12:51:37 slpeah003 ceph-osd: 2015-09-04 12:51:37.735099 7f092dfd1700 -1 osd.3 3025 heartbeat_check: no reply from osd.23 since back 2015-09-04 12:51:31.834203 front 2015-09-04 12:51:31.834203 (cutoff 2015-09-04 12:51:32.735011) Sep 4 12:51:37 slpeah003 ceph-osd: 2015-09-04 12:51:37.735104 7f092dfd1700 -1 osd.3 3025 heartbeat_check: no reply from osd.24 since back 2015-09-04 12:51:31.834203 front 2015-09-04 12:51:31.834203 (cutoff 2015-09-04 12:51:32.735011) Sep 4 12:51:37 slpeah003 ceph-osd: 2015-09-04 12:51:37.735108 7f092dfd1700 -1 osd.3 3025 heartbeat_check: no reply from osd.26 since back 2015-09-04 12:51:31.834203 front 2015-09-04 12:51:31.834203 (cutoff 2015-09-04 12:51:32.735011) Sep 4 12:51:37 slpeah003 ceph-osd: 2015-09-04 12:51:37.735118 7f092dfd1700 -1 osd.3 3025 heartbeat_check: no reply from osd.27 since back 2015-09-04 12:51:31.834203 front 2015-09-04 12:51:31.834203 (cutoff 2015-09-04 12:51:32.735011) I've also tried to monitor nova-compute process file descriptors number but it is never more than 102. ("echo /proc/NOVA_COMPUTE_PID/fd/* | wc -w" like Jan advised in this ML). It also seems like problem appeared only in 0.94.3, in 0.94.2 everything worked just fine! Would be very grateful for any help! Vasily. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com