Re: Nova fails to download image from Glance backed with Ceph

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for response!

The free space on /var/lib/nova/instances is very large on every compute host.
Glance image-download works as expected.

2015-09-04 21:27 GMT+08:00 Jan Schermer <jan@xxxxxxxxxxx>:
> Didn't you run out of space? Happened to me when a customer tried to create a 1TB image...
>
> Z.
>
>> On 04 Sep 2015, at 15:15, Sebastien Han <seb@xxxxxxxxxx> wrote:
>>
>> Just to take away a possible issue from infra (LBs etc).
>> Did you try to download the image on the compute node? Something like rbd export?
>>
>>> On 04 Sep 2015, at 11:56, Vasiliy Angapov <angapov@xxxxxxxxx> wrote:
>>>
>>> Hi all,
>>>
>>> Not sure actually where does this bug belong to - OpenStack or Ceph -
>>> but writing here in humble hope that anyone faced that issue also.
>>>
>>> I configured test OpenStack instance with Glance images stored in Ceph
>>> 0.94.3. Nova has local storage.
>>> But when I'm trying to launch instance from large image stored in Ceph
>>> - it fails to spawn with such an error in nova-conductor.log:
>>>
>>> 2015-09-04 11:52:35.076 3605449 ERROR nova.scheduler.utils
>>> [req-c6af3eca-f166-45bd-8edc-b8cfadeb0d0b
>>> 82c1f134605e4ee49f65015dda96c79a 448cc6119e514398ac2793d043d4fa02 - -
>>> -] [instance: 18c9f1d5-50e8-426f-94d5-167f43129ea6] Error from last
>>> host: slpeah005 (node slpeah005.cloud): [u'Traceback (most recent call
>>> last):\n', u'  File
>>> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2220,
>>> in _do_build_and_run_instance\n    filter_properties)\n', u'  File
>>> "/usr/lib/python2.7/site-packages/nova/compute/manager.py", line 2363,
>>> in _build_and_run_instance\n    instance_uuid=instance.uuid,
>>> reason=six.text_type(e))\n', u'RescheduledException: Build of instance
>>> 18c9f1d5-50e8-426f-94d5-167f43129ea6 was re-scheduled: [Errno 32]
>>> Corrupt image download. Checksum was 625d0686a50f6b64e57b1facbc042248
>>> expected 4a7de2fbbd01be5c6a9e114df145b027\n']
>>>
>>> So nova tries 3 different hosts with the same error messages on every
>>> single one and then fails to spawn an instance.
>>> I've tried Cirros little image and it works fine with it. Issue
>>> happens with large images like 10Gb in size.
>>> I also managed to look into /var/lib/nova/instances/_base folder and
>>> found out that image is actually being downloaded but at some moment
>>> the download process interrupts for some unknown reason and instance
>>> gets deleted.
>>>
>>> I looked at the syslog and found many messages like that:
>>> Sep  4 12:51:37 slpeah003 ceph-osd: 2015-09-04 12:51:37.735094
>>> 7f092dfd1700 -1 osd.3 3025 heartbeat_check: no reply from osd.22 since
>>> back 2015-09-04 12:51:31.834203 front 2015-09-04 12:51:31.834203
>>> (cutoff 2015-09-04 12:51:32.735011)
>>> Sep  4 12:51:37 slpeah003 ceph-osd: 2015-09-04 12:51:37.735099
>>> 7f092dfd1700 -1 osd.3 3025 heartbeat_check: no reply from osd.23 since
>>> back 2015-09-04 12:51:31.834203 front 2015-09-04 12:51:31.834203
>>> (cutoff 2015-09-04 12:51:32.735011)
>>> Sep  4 12:51:37 slpeah003 ceph-osd: 2015-09-04 12:51:37.735104
>>> 7f092dfd1700 -1 osd.3 3025 heartbeat_check: no reply from osd.24 since
>>> back 2015-09-04 12:51:31.834203 front 2015-09-04 12:51:31.834203
>>> (cutoff 2015-09-04 12:51:32.735011)
>>> Sep  4 12:51:37 slpeah003 ceph-osd: 2015-09-04 12:51:37.735108
>>> 7f092dfd1700 -1 osd.3 3025 heartbeat_check: no reply from osd.26 since
>>> back 2015-09-04 12:51:31.834203 front 2015-09-04 12:51:31.834203
>>> (cutoff 2015-09-04 12:51:32.735011)
>>> Sep  4 12:51:37 slpeah003 ceph-osd: 2015-09-04 12:51:37.735118
>>> 7f092dfd1700 -1 osd.3 3025 heartbeat_check: no reply from osd.27 since
>>> back 2015-09-04 12:51:31.834203 front 2015-09-04 12:51:31.834203
>>> (cutoff 2015-09-04 12:51:32.735011)
>>>
>>> I've also tried to monitor nova-compute process file descriptors
>>> number but it is never more than 102. ("echo
>>> /proc/NOVA_COMPUTE_PID/fd/* | wc -w" like Jan advised in this ML).
>>> It also seems like problem appeared only in 0.94.3, in 0.94.2
>>> everything worked just fine!
>>>
>>> Would be very grateful for any help!
>>>
>>> Vasily.
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> Cheers.
>> ––––
>> Sébastien Han
>> Senior Cloud Architect
>>
>> "Always give 100%. Unless you're giving blood."
>>
>> Mail: seb@xxxxxxxxxx
>> Address: 11 bis, rue Roquépine - 75008 Paris
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux