Re: pg 0.xxxx on [] is laggy

Sage Weil <sage@xxxxxxxxxxxx> · Tue, 29 Jun 2010 08:58:46 -0700 (PDT)

Hi Sebastien,

On Tue, 29 Jun 2010, Sébastien Paolacci wrote:
> First of all, many thanks for this wonderful piece of software that
> actually looks very promising. It's a segment that imho definitely
> lack of credible open source alternatives to the (sometimes infamous
> and inefficient) proprietary systems.

Thanks!

> So I've just pulled the unstable branch from last Sunday and are few
> outcomes (local vm, sorry for that, but as for a first try...):
> 
> - Build: transparent, which is actually not so common for an unstable
> branch of a said to not be mature project ;). Thanks.
> 
> - Config: it's a bit difficult to understand the real meaning of all
> the available options (debian and suse dedicated pages are however
> very helpful), so documentation is sparse, as expected, and I should
> have start by reading the code anyway (so my bad at the end).
> 
> - First setup attempt left me with a "mon fs missing 'whoami'.. did
> you run mkcephfs?" (see end of mail) I just echoed a "0" in
> "/data/mom0/whoami" and it did startup.

It looks like when you ran /etc/init.d/ceph is tried to start 
/usr/bin/cmon, although I notice lots of /usr/local in your mkcephfs 
output.  Do you by chance build from source and then 'make install', and 
then also install a .deb or .rpm?  The "mon fs missing 'whoami'" is an old 
error message that no longer appears in the 'unstable' branch, so there is 
an old binary or old source involved somewhere.

> - cfuse -m 127.0.01:5678/ /mnt/ceph is eating all my memory and
> crashes with a bad_alloc

Fixed this.. there was an ip address parsing error.  The trailing '/' 
shouldn't be there, and wasn't getting ignored.

> - cfuse /mnt/ceph is however working as expected. Creating files and
> browsing /mnt/ceph content provide with the desired result, dbench -D
> /mnt/ceph/ -t 10 2 however seems to endless wait for completion. On
> the cfuse side, I'm getting (what seems to be) an endless serie of "pg
> 0.xxx on [] is laggy"

That means the OSD isn't responding for some request(s).  Did cosd start?  
Does a 'ceph -s' show some osds are 'up'?  If cosd crashed, the output log 
or gdb backtrack would be helpful.

> root@debian-vm1:/home/seb# cat /etc/ceph/ceph.conf | grep -v '^;'
> [global]
>        pid file = /var/run/ceph/$name.pid
>        debug ms = 10
I wouldn't put this in [global] or you will clutter up output from things 
like 'ceph -s'.
> [mon]
	debug ms = 1    ; is usually enough msgr output
>        mon data = /data/mon$id
> [mon0]
>        host = debian-vm1
>        mon addr = 127.0.0.1:6789
> [mds]
	debug ms = 1    ; is usually enough msgr output
> [mds0]
>        host = debian-vm1
> [osd]
	debug ms = 1    ; is usually enough msgr output
>        sudo = true
>        osd data = /data/osd$id
>        osd journal = /data/osd$id/journal
>        osd journal size = 128
>        filestore journal writeahead = true
> [osd0]
>        host = debian-vm1

sage