Re: RBD boot from volume weirdness in OpenStack

Travis Rhoden <trhoden@xxxxxxxxx> · Thu, 25 Oct 2012 12:27:33 -0400

Josh,

Do you mind if I ask you a few follow-up questions?  I can ask on the
OpenStack ML if needed, but I think you are the most knowledgeable
person for these...

1. To get "efficient volumes from images" (i.e. volumes that are a COW
copy of the image), do the images and volumes need to live in the same
pool?  I have glance configured to use a pool called "glanceimages",
and nova-volume/Cinder uses a second pool called "nova-volume".  Is
this always going to prevent the COW process from working?  If I check
out my volume, I see this:

# rbd -p nova-volume info volume-8c30ee47-5ec3-4600-b332-1bdc2a650837
rbd image 'volume-8c30ee47-5ec3-4600-b332-1bdc2a650837':
	size 220 MB in 55 objects
	order 22 (4096 KB objects)
	block_name_prefix: rb.0.1f04.4ba87ea2
	parent:  (pool -1)

If the COW process is actually working, I think I'll see a parent
other than (pool -1), correct?

I had split glance/cinder into different RADOS pools because I figured
it would give me more flexibility (could set different rep size/crush
rules) and potentially more security (use different cephx
clients/keys.  Glance keys aren't on nova-compute nodes, only glance
node).  But this isn't a strict requirement.

2. Do you know if "raw" is the only disk format accepted for
boot-from-volume?  I did the whole "create volume from image" step,
and my source image was a qcow2.  But when I do the boot-from-volume,
the -disk line contains format=raw.  Not sure how to control that
anymore -- there is no metadata attached to the volume that indicates
if it is qcow2 vs raw.  I'll have to dig into the code and see if
looks for anything.  Thought you might know...

3.  I edited my libvirt XML to saw raw instead of qcow2, and the VM
started to boot!  Hooray!  boot-from-volume over RBD.  But then
console.log shows stuff like:

Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... done.
[    1.044112] EXT4-fs (vda1): mounted filesystem with ordered data
mode. Opts: (null)
Begin: Running /scripts/local-bottom ... [    1.052379] FDC 0 is a S82078B
done.
done.
Begin: Running /scripts/init-bottom ... done.
[    1.156951] Refined TSC clocksource calibration: 2266.803 MHz.
[    1.796114] end_request: I/O error, dev vda, sector 16065
[    1.800018] Buffer I/O error on device vda1, logical block 0
[    1.800018] lost page write due to I/O error on vda1
[    1.805294] EXT4-fs (vda1): re-mounted. Opts: (null)
cloud-init start-local running: Thu, 25 Oct 2012 16:06:34 +0000. up
2.86 seconds^M
no instance data found in start-local^M
[    3.802465] end_request: I/O error, dev vda, sector 1257161
[    3.803629] Buffer I/O error on device vda1, logical block 155137
[    3.804020] Buffer I/O error on device vda1, logical block 155138
....

And that just continues on and obviously the VM is unusable.  Any
thoughts on why that might happen.  You ever run into this during your
testing?

I'm thinking that I probably need to not use UEC images for this -- It
tries to go in and resize the file system and stuff like that.  I
should probably just make a bunch of fixed images (10G, 20G, etc.) and
make volumes from those.  Right now, I'm not even positive that the
RBD has even been formatted with a filesystem.

Regards,

 - Travis

On Thu, Oct 25, 2012 at 11:51 AM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:
> Awesome, thanks Josh.  I mispoke -- my client was 0.48.1.  glad
> upgrading to 0.48.2 will do the trick!  thanks again.
>
> On Thu, Oct 25, 2012 at 11:42 AM, Josh Durgin <josh.durgin@xxxxxxxxxxx> wrote:
>> On 2012-10-25 08:22, Travis Rhoden wrote:
>>>
>>> I've been trying to take advantage of the code additions made by Josh
>>> Durgin to OpenStack Folsom for combining  boot-from-volume and Ceph
>>> RBD.  First off, nice work Josh!  I'm hoping you folks can help me out
>>> with something strange I am seeing.  The question may be more
>>> OpenStack related than Ceph, though, but hear me out first.
>>>
>>> I created a new volume (to use for boot-from-volume) from an existing
>>> image like so:
>>>
>>> #cinder create --display-name uec-test-vol --image-id
>>> 699137a2-a864-4a87-98fa-1684d7677044 5
>>>
>>> This completes just fine.
>>>
>>> Later I try to boot from it, that fails.  Cutting to the chase, here is
>>> why:
>>>
>>> kvm: -drive
>>>
>>>
>>> file=rbd:nova-volume/volume-9f4e4b70-7fbb-4d81-b912-b1c6fcf86c8b,if=none,id=drive-virtio-disk0,format=raw,cache=none:
>>> error reading header from volume-9f4e4b70-7fbb-4d81-b912-b1c6fcf86c8b
>>> kvm: -drive
>>>
>>>
>>> file=rbd:nova-volume/volume-9f4e4b70-7fbb-4d81-b912-b1c6fcf86c8b,if=none,id=drive-virtio-disk0,format=raw,cache=none:
>>> could not open disk image
>>> rbd:nova-volume/volume-9f4e4b70-7fbb-4d81-b912-b1c6fcf86c8b: No such
>>> file or directory
>>>
>>> It's weird that creating the volume was successful, but that KVM can't
>>> read it.  Poking around a bit more, it was clear why:
>>>
>>> # rbd -n client.novavolume --pool nova-volume ls
>>> <returns nothing>
>>>
>>> # rbd -n client.novavolume ls
>>> volume-9f4e4b70-7fbb-4d81-b912-b1c6fcf86c8b
>>>
>>> Okay, the volume is the "rbd" pool!  That's really weird, though.
>>> Here is the my nova.conf entries:
>>> volume_driver=nova.volume.driver.RBDDriver
>>> rbd_pool=nova-volume
>>> rbd_user=novavolume
>>>
>>>
>>> AND, here are the log entries from nova-volume.log (cleaned up a little):
>>>
>>> rbd create --pool nova-volume --size 5120
>>> volume-9f4e4b70-7fbb-4d81-b912-b1c6fcf86c8b
>>> rbd rm --pool nova-volume volume-9f4e4b70-7fbb-4d81-b912-b1c6fcf86c8b
>>> rbd import --pool nova-volume /tmp/tmplQUwzt
>>> volume-9f4e4b70-7fbb-4d81-b912-b1c6fcf86c8b
>>>
>>> I'm not sure why it goes create/delete/import, but regardless all of
>>> that worked.  More importantly, all these commands used --pool
>>> nova-volume.  So how the heck did that RBD end up in the "rbd" pool
>>> instead of the "nova-volume" pool?  Any ideas?
>>>
>>> Before I hit "send", I figured I should at least test this myself.  Watch
>>> this:
>>>
>>> #rbd create -n client.novavolume --pool nova-volume --size 1024 test
>>> # rbd ls -n client.novavolume --pool nova-volume
>>> test
>>> # rbd export -n client.novavolume --pool nova-volume test /tmp/test
>>> Exporting image: 100% complete...done.
>>> # rbd rm -n client.novavolume --pool nova-volume test
>>> Removing image: 100% complete...done.
>>> # rbd import -n client.novavolume --pool nova-volume /tmp/test test
>>> Importing image: 100% complete...done.
>>> # rbd ls -n client.novavolume --pool nova-volume
>>>
>>> # rbd ls -n client.novavolume --pool rbd
>>> test
>>>
>>>
>>> So it seems that "rbd import" doesn't honor the --pool argument?
>>
>>
>> This was true in 0.48, but it should have been fixed in 0.48.2 (and 0.52).
>> I'll add a note about this to the docs.
>>
>>
>>> I am using 0.53 on the backend, but my client is 0.48.2.  I'll upgrade
>>> that and see if that makes a different.
>>
>>
>> The ceph-common package in particular should be 0.48.2 or >=0.52.
>>
>>>  - Travis
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html