Re: all oas crush on start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 17, 2013 at 4:40 AM, Vladislav Gorbunov <vadikgo@xxxxxxxxx> wrote:
> Sorry, not send to ceph-users later.
>
> I check mon.1 log and found that cluster was not in HEALTH_OK when set
> ruleset to iscsi:
> 2013-07-14 15:52:15.715871 7fe8a852a700  0 log [INF] : pgmap
> v16861121: 19296 pgs: 19052 active+clean, 73
> active+remapped+wait_backfill, 171 active+remapped+b
> ackfilling; 9023 GB data, 18074 GB used, 95096 GB / 110 TB avail;
> 21245KB/s rd, 1892KB/s wr, 443op/s; 49203/4696557 degraded (1.048%)
> 2
> 2013-07-14 15:52:15.870389 7fe8a852a700  0 mon.1@0(leader) e23
> handle_command mon_command(osd pool set iscsi crush_ruleset 3 v 0) v1
> ...
> 2013-07-14 15:52:35.930465 7fe8a852a700  1 mon.1@0(leader).osd e77415
> prepare_failure osd.2 10.166.10.27:6801/12007 from osd.56
> 10.166.10.29:6896/18516 is reporting failure:1
> 2013-07-14 15:52:35.930641 7fe8a852a700  0 log [DBG] : osd.2
> 10.166.10.27:6801/12007 reported failed by osd.56
> 10.166.10.29:6896/18516
>

Okay, I think you need to back up and provide a simple timeline of
what you did and what you know about the cluster state at that time.
I'm particularly interested about anything you did after the OSDs
started crashing, but I want to know about what happened before as
well.

> Could this be an indicator of distribution the bad map to cluster's
> osd servers by osd.56? This means that you can not change the crushmap
> of the cluster if it not in HEALTH_OK or you lost all cluster?

You can absolutely change the crush map on a cluster which is in an
unhealthy state. That's not the problem, at least not on its own.

> full log at https://dl.dropboxusercontent.com/u/2296931/ceph/ceph-mon.1.log.bak.zip
> (1.7MB)
>
>>If a bad map somehow got distributed to the OSDs then cleaning it up
> is unfortunately going to take a lot of work without any well-defined
> processes.
> This means that all data was lost?

*If* that actually happened somehow (it shouldn't be able to happen,
generally), then depending on how much time and money you are willing
to invest you might have lost it, yes.

As I look at your monitor log, it looks like it crashed whenever you
tried to inject the crush map at about 2013-07-14 16:54:57. Is that
when your OSDs started crashing, or was something wrong with them
before that?
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux