Re: Ceph 10.2.11 - Status not working

Brad Hubbard <bhubbard@xxxxxxxxxx> · Tue, 18 Dec 2018 12:16:14 +1000

On Tue, Dec 18, 2018 at 10:23 AM Mike O'Connor <mike@xxxxxxxxxx> wrote:
>
> Hi All
>
> I have a ceph cluster which has been working with out issues for about 2
> years now, it was upgrade about 6 month ago to 10.2.11
>
> root@blade3:/var/lib/ceph/mon# ceph status
> 2018-12-18 10:42:39.242217 7ff770471700  0 -- 10.1.5.203:0/1608630285 >>
> 10.1.5.207:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
> c=0x7ff768001f90).fault
> 2018-12-18 10:42:45.242745 7ff770471700  0 -- 10.1.5.203:0/1608630285 >>
> 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7ff768002410).fault
> 2018-12-18 10:42:51.243230 7ff770471700  0 -- 10.1.5.203:0/1608630285 >>
> 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7ff768002f40).fault
> 2018-12-18 10:42:54.243452 7ff770572700  0 -- 10.1.5.203:0/1608630285 >>
> 10.1.5.205:6789/0 pipe(0x7ff768000c80 sd=4 :0 s=1 pgs=0 cs=0 l=1
> c=0x7ff768008060).fault
> 2018-12-18 10:42:57.243715 7ff770471700  0 -- 10.1.5.203:0/1608630285 >>
> 10.1.5.207:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7ff768003580).fault
> 2018-12-18 10:43:03.244280 7ff7781b9700  0 -- 10.1.5.203:0/1608630285 >>
> 10.1.5.205:6789/0 pipe(0x7ff7680051e0 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7ff768003670).fault
>
> All system can ping each other. I simple can not see why its failing.
>
>
> ceph.conf
>
> [global]
>      auth client required = cephx
>      auth cluster required = cephx
>      auth service required = cephx
>      cluster network = 10.1.5.0/24
>      filestore xattr use omap = true
>      fsid = 42a0f015-76da-4f47-b506-da5cdacd030f
>      keyring = /etc/pve/priv/$cluster.$name.keyring
>      osd journal size = 5120
>      osd pool default min size = 1
>      public network = 10.1.5.0/24
>      mon_pg_warn_max_per_osd = 0
>
> [client]
>      rbd cache = true
> [osd]
>      keyring = /var/lib/ceph/osd/ceph-$id/keyring
>      osd max backfills = 1
>      osd recovery max active = 1
>      osd_disk_threads = 1
>      osd_disk_thread_ioprio_class = idle
>      osd_disk_thread_ioprio_priority = 7
> [mon.2]
>      host = blade5
>      mon addr = 10.1.5.205:6789
> [mon.1]
>      host = blade3
>      mon addr = 10.1.5.203:6789
> [mon.3]
>      host = blade7
>      mon addr = 10.1.5.207:6789
> [mon.0]
>      host = blade1
>      mon addr = 10.1.5.201:6789
> [mds]
>          mds data = /var/lib/ceph/mds/mds.$id
>          keyring = /var/lib/ceph/mds/mds.$id/mds.$id.keyring
> [mds.0]
>          host = blade1
> [mds.1]
>          host = blade3
> [mds.2]
>          host = blade5
> [mds.3]
>          host = blade7
>
>
> Any ideas ? more information ?

The system on which you are running the "ceph" client, blade3
(10.1.5.203) is trying to contact monitors on 10.1.5.207 (blade7) port
6789 and 10.1.5.205 (blade5) port 6789. You need to check the ceph-mon
binary is running on blade7 and blade5 and that they are listening on
port 6789 and that that port is accessible from blade3. The simplest
explanation is the MONs are not running. The next simplest is their is
a firewall interfering with blade3's ability to connect to port 6789
on those machines. Check the above and see what you find.

-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com