Hi Andre, On Mon, 31 May 2010, Andre Noll wrote: > we're trying to give ceph a try on our compute cluster Initial stress > tests passed without problems, Cool! > but over the weekend a couple of cosd processes died and now access to > the ceph mount point blocks and mounting the ceph dir fails with Hmm :( > Stracing the cosd process shows that it calls mmap() with silly values > for the "fd" and the "length" parameter: > > mmap(NULL, 18446744073709436928, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory) So, the mmap code in buffer.h is actually never called, so my guess is that posix_memalign() or some other library implementation is doing it. Can you get a stack trace? Either look at the core file with gdb or run cosd via gdb? Alternatively, the osd startup log with debug osd = 20 debug filestore = 20 in the [osd] section of ceph.conf would help narrow it down. There is probably a missing check in the osd mount/initialization code but it's hard to guess where. Thanks! sage > > I briefly looked at the source code and noticed that raw_mmap_pages() > in include/buffer.h of seems to call mmap() with an unsigned int > rather than with a size_t as the second (length) parameter. Since > > 18446744073709436928 = 0xfffffffffffe4000 > > this looks like an integer overflow. But maybe it is just uninitialized > garbage. > > I've tried the v0.20.2 and the testing branch of the ceph git > repo. Both versions of cosd show the same behaviour. > > Our ceph file system 5.5T large, we have 7 cosds, 3 cmons and 3 cmds, > see the ceph.conf below for details. > > Any idea how to get back the data? If you need further debugging info, > don't hesitate to ask. > > Thanks > Andre > --- > > [global] > ; enable secure authentication > auth supported = cephx > osd journal size = 100 ; measured in MB > > ; You need at least one monitor. You need at least three if you want to > ; tolerate any node failures. Always create an odd number. > [mon] > mon data = /var/ceph/mon$id > ; some minimal logging (just message traffic) to aid debugging > debug ms = 1 > [mon0] > host = node141 > mon addr = 192.168.1.141:6789 > [mon1] > host = node145 > mon addr = 192.168.1.145:6789 > [mon2] > host = node150 > mon addr = 192.168.1.150:6789 > > ; You need at least one mds. Define two to get a standby. > [mds] > ; where the mds keeps it's secret encryption keys > keyring = /var/ceph/keyring.$name > [mds0] > host = node141 > [mds1] > host = node145 > [mds2] > host = node150 > > ; osd > ; You need at least one. Two if you want data to be replicated. > ; Define as many as you like. > [osd] > ; This is where the btrfs volume will be mounted. > osd data = /var/ceph/osd$id > > ; Ideally, make this a separate disk or partition. A few GB > ; is usually enough; more if you have fast disks. You can use > ; a file under the osd data dir if need be > ; (e.g. /data/osd$id/journal), but it will be slower than a > ; separate disk or partition. > osd journal = /var/ceph/osd$id/journal > > [osd0] > host = node141 > [osd1] > host = node145 > [osd2] > host = node150 > [osd3] > host = node146 > [osd4] > host = node147 > [osd5] > host = node149 > [osd6] > host = node142 > -- > The only person who always got his work done by Friday was Robinson Crusoe > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html