Re: osd down (for 2 about 2 minutes) error after adding a new host to my cluster

Gregory Farnum <greg@xxxxxxxxxxx> · Thu, 10 Jan 2013 10:32:25 -0800



On Tue, Jan 8, 2013 at 1:31 PM, Isaac Otsiabah <zmoo76b@xxxxxxxxx> wrote:
>
>
> Hi Greg, it appears to be a timing issue because with the flag (debug ms=1) turned on, the system ran slower and became harder to fail. I ran it several times and finally got it to fail on (osd.0) using default crush map. The attached tar file contains log files  for all components on g8ct plus the ceph.conf.
>
> I started with a 1-node cluster on host g8ct (osd.0, osd.1, osd.2)  and then added host g13ct (osd.3, osd.4, osd.5)
>
>
>
>  id    weight  type name       up/down reweight
> -1      6       root default
> -3      6               rack unknownrack
> -2      3                       host g8ct
> 0       1                               osd.0   down    1
> 1       1                               osd.1   up      1
> 2       1                               osd.2   up      1
> -4      3                       host g13ct
> 3       1                               osd.3   up      1
> 4       1                               osd.4   up      1
> 5       1                               osd.5   up      1
>
>
>
> The error messages are in ceph.log and ceph-osd.0.log:
>
> ceph.log:2013-01-08 05:41:38.080470 osd.0 192.168.0.124:6801/25571 3 : [ERR] map e15 had wrong cluster addr (192.168.0.124:6802/25571 != my 192.168.1.124:6802/25571)
> ceph-osd.0.log:2013-01-08 05:41:38.080458 7f06757fa710  0 log [ERR] : map e15 had wrong cluster addr (192.168.0.124:6802/25571 != my 192.168.1.124:6802/25571)

Thanks. I had a brief look through these logs on Tuesday and want to
spend more time with them because they have some odd stuff in them. It
*looks* like the OSD is starting out using a single IP for both the
public and cluster networks and then switching over at some point,
which is...odd.
Knowing more details about how your network is actually set up would
be very helpful.
-Greg
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html