Re: HELP Ceph Errors won't allow vm to start

Oliver Dzombic <info@xxxxxxxxxxxxxxxxx> · Tue, 29 Mar 2016 13:24:53 +0200

Hi Dan,

the full root partition is the very first thing you have to solve.

This >can< be responsible for the missbehaviour, but it is for >sure< a
general problem you >need< to solve.

So:

1. Clean /
2. Restart the server
3. Check if its working, and if not, what are the exact error messages

If its working, great.

If not, tell us what the daemons/vm's tell you on start up incl. the logs.

Good luck !

-- 
Mit freundlichen Gruessen / Best regards

Oliver Dzombic
IP-Interactive

mailto:info@xxxxxxxxxxxxxxxxx

Anschrift:

IP Interactive UG ( haftungsbeschraenkt )
Zum Sonnenberg 1-3
63571 Gelnhausen

HRB 93402 beim Amtsgericht Hanau
Geschäftsführung: Oliver Dzombic

Steuer Nr.: 35 236 3622 1
UST ID: DE274086107

Am 29.03.2016 um 11:54 schrieb Dan Moses:
> 
> Our setup matches this one exactly for Proxmox and Ceph  https://pve.proxmox.com/wiki/Ceph_Server.  The brand of SSDs may not be the same but they are the same sizes or larger and are Enterprise quality.
> 
> Filesystem            Size  Used Avail Use% Mounted on
> udev                   10M     0   10M   0% /dev
> tmpfs                  13G  1.1G   12G   9% /run
> /dev/dm-0              28G   28G     0 100% /
> tmpfs                  32G   66M   32G   1% /dev/shm
> tmpfs                 5.0M     0  5.0M   0% /run/lock
> tmpfs                  32G     0   32G   0% /sys/fs/cgroup
> /dev/mapper/pve-data   56G  655M   55G   2% /var/lib/vz
> tmpfs                 100K     0  100K   0% /run/lxcfs/controllers
> cgmfs                 100K     0  100K   0% /run/cgmanager/fs
> /dev/fuse              30M   28K   30M   1% /etc/pve
> /dev/sdc1             3.7T  1.3T  2.4T  36% /var/lib/ceph/osd/ceph-0
> /dev/sdd1             3.7T  765G  2.9T  21% /var/lib/ceph/osd/ceph-1
> /dev/sde1             3.7T  663G  3.0T  18% /var/lib/ceph/osd/ceph-2
> /dev/sdf1             3.7T  677G  3.0T  19% /var/lib/ceph/osd/ceph-3
> 
> I can see /dev/dm-0   shows 100% full.  Would this cause the error since this is just a vm ?   Please advise what we can do to resolve this.
> 
> 
> --------
> Hi Dan,
> 
> Various proxmox daemons don't look happy on startup also.
> 
> Are you using a single samsung SSD for your OSD journals on this host?
> Is that SSD ok?
> 
> Brian
> 
> 
> On Tue, Mar 29, 2016 at 5:22 AM, Dan Moses <dan@xxxxxxxxxxxxxxxxxxx> wrote:
>> Any suggestions to fix this issue?  We are using Ceph with proxmox and vms
>> won’t start due to these Ceph Errors.
>>
>>
>>
>> This in turn prevents any vm from starting up. This is a live server, please
>> advise.
>>
>> Mar 28 22:01:22 pm3 systemd[1]: Unit ceph.service entered failed state.
>> Mar 28 22:09:00 pm3 systemd[1]: Unit ceph-mon.2.1459218879.795083638.service
>> entered failed state.
>> Mar 28 22:10:49 pm3 console-setup[1642]: failed.
>> Mar 28 22:10:49 pm3 kernel: [ 2.605140] ata6.00: READ LOG DMA EXT failed,
>> trying unqueued
>> Mar 28 22:10:49 pm3 kernel: [ 2.605167] ata6.00: failed to get NCQ Send/Recv
>> Log Emask 0x1
>> Mar 28 22:10:49 pm3 kernel: [ 2.605456] ata6.00: failed to get NCQ Send/Recv
>> Log Emask 0x1
>> Mar 28 22:10:49 pm3 pmxcfs[1795]: [quorum] crit: quorum_initialize failed: 2
>> Mar 28 22:10:49 pm3 pmxcfs[1795]: [confdb] crit: cmap_initialize failed: 2
>> Mar 28 22:10:49 pm3 pmxcfs[1795]: [dcdb] crit: cpg_initialize failed: 2
>> Mar 28 22:10:49 pm3 pmxcfs[1795]: [status] crit: cpg_initialize failed: 2
>> Mar 28 22:10:49 pm3 pvecm[1798]: ipcc_send_rec failed: Connection refused
>> Mar 28 22:10:49 pm3 pvecm[1798]: ipcc_send_rec failed: Connection refused
>> Mar 28 22:10:49 pm3 pvecm[1798]: ipcc_send_rec failed: Connection refused
>> Mar 28 22:11:20 pm3 ceph[1891]: failed: 'timeout 30 /usr/bin/ceph -c
>> /etc/ceph/ceph.conf --name=osd.5 --keyring=/var/lib/ceph/osd/ceph-5/keyring
>> osd crush create-or-move -- 5 3.64 host=pm3 root=default'
>> Mar 28 22:11:20 pm3 ceph[1891]: ceph-disk: Error: ceph osd start failed:
>> Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start',
>> 'osd.5']' returned non-zero exit status 1
>> Mar 28 22:11:50 pm3 ceph[1891]: failed: 'timeout 30 /usr/bin/ceph -c
>> /etc/ceph/ceph.conf --name=osd.7 --keyring=/var/lib/ceph/osd/ceph-7/keyring
>> osd crush create-or-move -- 7 3.64 host=pm3 root=default'
>> Mar 28 22:11:50 pm3 ceph[1891]: ceph-disk: Error: ceph osd start failed:
>> Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start',
>> 'osd.7']' returned non-zero exit status 1
>> Mar 28 22:12:21 pm3 ceph[1891]: failed: 'timeout 30 /usr/bin/ceph -c
>> /etc/ceph/ceph.conf --name=osd.9 --keyring=/var/lib/ceph/osd/ceph-9/keyring
>> osd crush create-or-move -- 9 3.64 host=pm3 root=default'
>> Mar 28 22:12:21 pm3 ceph[1891]: ceph-disk: Error: ceph osd start failed:
>> Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start',
>> 'osd.9']' returned non-zero exit status 1
>> Mar 28 22:12:51 pm3 ceph[1891]: failed: 'timeout 30 /usr/bin/ceph -c
>> /etc/ceph/ceph.conf --name=osd.11
>> --keyring=/var/lib/ceph/osd/ceph-11/keyring osd crush create-or-move -- 11
>> 3.64 host=pm3 root=default'
>> Mar 28 22:12:51 pm3 ceph[1891]: ceph-disk: Error: ceph osd start failed:
>> Command '['/usr/sbin/service', 'ceph', '--cluster', 'ceph', 'start',
>> 'osd.11']' returned non-zero exit status 1
>> Mar 28 22:12:51 pm3 ceph[1891]: ceph-disk: Error: One or more partitions
>> failed to activate
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com