osd: terminate called after throwing an instance of 'std::bad_alloc'

Andre Noll <maan@xxxxxxxxxxxxxxx> · Mon, 31 May 2010 14:51:56 +0200

Hi,

we're trying to give ceph a try on our compute cluster Initial stress
tests passed without problems, but over the weekend a couple of cosd
processes died and now access to the ceph mount point blocks and
mounting the ceph dir fails with

	mount: 192.168.1.141:6789,192.168.1.145:6789,192.168.1.150:6789:/: can't read superblock

Attempts to restart the cosd on the affected storage nodes fails with

	# /usr/local/bin/cosd -f -i 6 -c /etc/ceph/ceph.conf
	 ** WARNING: Ceph is still under heavy development, and is only suitable for **
	 **          testing and review.  Do not trust it with important data.       **
	starting osd6 at 0.0.0.0:6800/2685 osd_data /var/ceph/osd6 /var/ceph/osd6/journal
	terminate called after throwing an instance of 'std::bad_alloc'
	  what():  std::bad_alloc
	Aborted

Stracing the cosd process shows that it calls mmap() with silly values
for the "fd" and the "length" parameter:

	mmap(NULL, 18446744073709436928, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)

I briefly looked at the source code and noticed that raw_mmap_pages()
in include/buffer.h of seems to call mmap() with an unsigned int
rather than with a size_t as the second (length) parameter. Since

	18446744073709436928 = 0xfffffffffffe4000

this looks like an integer overflow. But maybe it is just uninitialized
garbage.

I've tried the v0.20.2 and the testing branch of the ceph git
repo. Both versions of cosd show the same behaviour.

Our ceph file system 5.5T large, we have 7 cosds, 3 cmons and 3 cmds,
see the ceph.conf below for details.

Any idea how to get back the data? If you need further debugging info,
don't hesitate to ask.

Thanks
Andre
---

[global]
	; enable secure authentication
	auth supported = cephx
	osd journal size = 100    ; measured in MB 

; You need at least one monitor. You need at least three if you want to
; tolerate any node failures. Always create an odd number.
[mon]
	mon data = /var/ceph/mon$id
	; some minimal logging (just message traffic) to aid debugging
	debug ms = 1
[mon0]
	host = node141
	mon addr = 192.168.1.141:6789
[mon1]
	host = node145
	mon addr = 192.168.1.145:6789
[mon2]
	host = node150
	mon addr = 192.168.1.150:6789

; You need at least one mds. Define two to get a standby.
[mds]
	; where the mds keeps it's secret encryption keys
	keyring = /var/ceph/keyring.$name
[mds0]
	host = node141
[mds1]
	host = node145
[mds2]
	host = node150

; osd
;  You need at least one.  Two if you want data to be replicated.
;  Define as many as you like.
[osd]
	; This is where the btrfs volume will be mounted.
	osd data = /var/ceph/osd$id

	; Ideally, make this a separate disk or partition.  A few GB
 	; is usually enough; more if you have fast disks.  You can use
 	; a file under the osd data dir if need be
 	; (e.g. /data/osd$id/journal), but it will be slower than a
 	; separate disk or partition.
	osd journal = /var/ceph/osd$id/journal

[osd0]
	host = node141
[osd1]
	host = node145
[osd2]
	host = node150
[osd3]
	host = node146
[osd4]
	host = node147
[osd5]
	host = node149
[osd6]
	host = node142
-- 
The only person who always got his work done by Friday was Robinson Crusoe
Attachment:
signature.asc

Description: Digital signature