Re: kernel errors, timeouts and qemu-img usage

Christoph Raible <c.raible@xxxxxxxxxxxxxxxxxxxx> · Wed, 04 May 2011 11:33:51 +0200

Am 03.05.2011 18:43, schrieb Tommi Virtanen:
On Tue, May 03, 2011 at 01:50:44PM +0200, Christoph Raible wrote:
First I alwas get on  ceph -w following "error":

"[WRN] message from mon2 was stamped 12.271440s in the future clocks
not synchronized"

But i have synchronized my clocks 1 min befor with the same
ntp-server..

Just running ntp doesn't mean your clocks are synced. For example, it
will refuse to synchronize automatically if the gap is too large.

Here's how you demonstrate your clocks are good:

[0 tv@dreamer ~]$ host pool.ntp.org
pool.ntp.org has address 204.235.61.9
pool.ntp.org has address 66.219.59.208
pool.ntp.org has address 169.229.70.95
[0 tv@dreamer ~]$ ssh sepia32.ceph.dreamhost.com ntpdate -q 204.235.61.9
server 204.235.61.9, stratum 2, offset -31.351031, delay 0.09187
  3 May 09:25:27 ntpdate[8303]: step time server 204.235.61.9 offset -31.351031 sec
[0 tv@dreamer ~]$ ssh sepia80.ceph.dreamhost.com ntpdate -q 204.235.61.9
server 204.235.61.9, stratum 2, offset 0.000159, delay 0.09181
  3 May 09:24:59 ntpdate[373]: adjust time server 204.235.61.9 offset 0.000159 sec
[0 tv@dreamer ~]$

See how one of the clocks is more than 30 seconds off, and the other
one is near-perfect.

I synchronized with ntpdate but there is an other error. I will look at 
this. If i got a solution I will report ;)

----------------------------

The second error is, that I can't create / start an qemu-image on
the ceph-filesystem. I want to start a kvm virtual machine with the
virt-manager.

I create an image with

   "qemu-img create -f qcow2 Platte-qcow2.img 10G"

When I chose those image an want to start a virtual machine with
that image. The virtual machine never starts. It hangs on look for
the "harddisk"

Creating an Image with virt-manager doesn't work. There is after 2-3
minutes a timout and I have to kill the virt-manager job.

Are there some experiences with this?

Are you using rbd, or just qcow2 images in files stored in a Ceph
mount?

I only use qcow2 Images stored in a Ceph mount. I can't use rbd because 
I have to copy Images from one location to Ceph mount.

If rbd, please provide more details on what exactly you did.

If just qcow2 files on ceph, then this seems to be very similar to the
problems you reported below; your setup seems unable to handle heavy
IO, for some reason.

No that's not the problem after I resolved the problem with the I/O I 
also can't connect / use any qemu / kvm images. I tested with following 
image types qcow2, raw & img

-----------------------------

The third error I got is the following shown in the /var/log/messages file:

http://pastebin.com/dnwVRf5F

Are those timeouts normal?

They look somewhat similar to the issues I've seen with more than MDS
and a write-heavy workload. At this point you probably don't want two
MDSes active. All of my problems went away when I started testing
against clusters with just one MDS.

I have now testet with only one MDS and it works fine :D that's strange, 
it looks like a Problem between the MDS synchronisation.

-----------------------------

The last error I got for today is the following:

http://pastebin.com/UmrCRuhq

This happend when I was creating a dummy file with:

   dd if=/dev/zero of=meineDatei count=5000000

This one looks like the underlying filesystem cannot handle the write
load, and makes the OSD daemon hang.

Your ceph.conf says "osd data = /data/osd$id", but your partition list
earlier claimed /dev/sda6 is "ceph fs mounted to /mnt/data". I'm
assuming you these are supposed to be the same, and you're using ext4.

I'm sorry I have posted the wrong link to my ceph.conf :( This is the 
right (I updated both ;) )

http://pastebin.com/VanmWmX5

I don't recall seeing many people having this kind of problems with
ext4. You might want to check what happens if you shut off ceph,
and try that dd directly to the underlying disk. If that works well,
please check back and we can continue figuring that one out.

This works fine  with an average speed of 40 MB/s (not full speed ;) )

BTW, your config says "devs = /dev/sda1".. The actual config option is
"btrfs devs", so that should be ignored completely, but it seems
there's some confusion in the air.

--
Vorstand/Board of Management:
Dr. Bernd Finkbeiner, Dr. Roland Niemeier, 
Dr. Arno Steitz, Dr. Ingrid Zech
Vorsitzender des Aufsichtsrats/
Chairman of the Supervisory Board:
Philippe Miltin
Sitz/Registered Office: Tuebingen
Registergericht/Registration Court: Stuttgart
Registernummer/Commercial Register No.: HRB 382196 

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html