Re: [Help: pool not responding] Now osd crash

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,
probably I have restarted osd too many times or I have put in/out osd too many times but now I get this:

root@proxmox-zotac:~# /usr/bin/ceph-osd -i 1 --pid-file /var/run/ceph/osd.1.pid -c /etc/ceph/ceph.conf --cluster ceph -f   
starting osd.1 at :/0 osd_data /var/lib/ceph/osd/ceph-1 /var/lib/ceph/osd/ceph-1/journal
osd/PG.cc: In function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, ceph::bufferlist*)' thread 7f7fd358e880 time 2016-03-09 00:08:09.193975
osd/PG.cc: 2868: FAILED assert(r > 0)
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0xc03c46]
2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x4ab) [0x7c616b]
3: (OSD::load_pgs()+0xa20) [0x6a9170]
4: (OSD::init()+0xc84) [0x6ac204]
5: (main()+0x2839) [0x632459]
6: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
7: /usr/bin/ceph-osd() [0x64c087]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
2016-03-09 00:08:09.196669 7f7fd358e880 -1 osd/PG.cc: In function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, ceph::bufferlist*)' thread 7f7fd358e880 time 2016-03-09 00:08:09.193975
osd/PG.cc: 2868: FAILED assert(r > 0)

ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0xc03c46]
2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x4ab) [0x7c616b]
3: (OSD::load_pgs()+0xa20) [0x6a9170]
4: (OSD::init()+0xc84) [0x6ac204]
5: (main()+0x2839) [0x632459]
6: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
7: /usr/bin/ceph-osd() [0x64c087]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

    0> 2016-03-09 00:08:09.196669 7f7fd358e880 -1 osd/PG.cc: In function 'static int PG::peek_map_epoch(ObjectStore*, spg_t, epoch_t*, ceph::bufferlist*)' thread 7f7fd358e880 time 2016-03-09 00:08:09.193975
osd/PG.cc: 2868: FAILED assert(r > 0)

ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x76) [0xc03c46]
2: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x4ab) [0x7c616b]
3: (OSD::load_pgs()+0xa20) [0x6a9170]
4: (OSD::init()+0xc84) [0x6ac204]
5: (main()+0x2839) [0x632459]
6: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
7: /usr/bin/ceph-osd() [0x64c087]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

terminate called after throwing an instance of 'ceph::FailedAssertion'
*** Caught signal (Aborted) **
in thread 7f7fd358e880
ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0xb04503]
2: (()+0xf8d0) [0x7f7fd24268d0]
3: (gsignal()+0x37) [0x7f7fd08c7067]
4: (abort()+0x148) [0x7f7fd08c8448]
5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f7fd11b4b3d]
6: (()+0x5ebb6) [0x7f7fd11b2bb6]
7: (()+0x5ec01) [0x7f7fd11b2c01]
8: (()+0x5ee19) [0x7f7fd11b2e19]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x247) [0xc03e17]
10: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x4ab) [0x7c616b]
11: (OSD::load_pgs()+0xa20) [0x6a9170]
12: (OSD::init()+0xc84) [0x6ac204]
13: (main()+0x2839) [0x632459]
14: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
15: /usr/bin/ceph-osd() [0x64c087]
2016-03-09 00:08:09.203630 7f7fd358e880 -1 *** Caught signal (Aborted) **
in thread 7f7fd358e880

ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0xb04503]
2: (()+0xf8d0) [0x7f7fd24268d0]
3: (gsignal()+0x37) [0x7f7fd08c7067]
4: (abort()+0x148) [0x7f7fd08c8448]
5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f7fd11b4b3d]
6: (()+0x5ebb6) [0x7f7fd11b2bb6]
7: (()+0x5ec01) [0x7f7fd11b2c01]
8: (()+0x5ee19) [0x7f7fd11b2e19]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x247) [0xc03e17]
10: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x4ab) [0x7c616b]
11: (OSD::load_pgs()+0xa20) [0x6a9170]
12: (OSD::init()+0xc84) [0x6ac204]
13: (main()+0x2839) [0x632459]
14: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
15: /usr/bin/ceph-osd() [0x64c087]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

    0> 2016-03-09 00:08:09.203630 7f7fd358e880 -1 *** Caught signal (Aborted) **
in thread 7f7fd358e880

ceph version 0.94.6 (e832001feaf8c176593e0325c8298e3f16dfb403)
1: /usr/bin/ceph-osd() [0xb04503]
2: (()+0xf8d0) [0x7f7fd24268d0]
3: (gsignal()+0x37) [0x7f7fd08c7067]
4: (abort()+0x148) [0x7f7fd08c8448]
5: (__gnu_cxx::__verbose_terminate_handler()+0x15d) [0x7f7fd11b4b3d]
6: (()+0x5ebb6) [0x7f7fd11b2bb6]
7: (()+0x5ec01) [0x7f7fd11b2c01]
8: (()+0x5ee19) [0x7f7fd11b2e19]
9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x247) [0xc03e17]
10: (PG::peek_map_epoch(ObjectStore*, spg_t, unsigned int*, ceph::buffer::list*)+0x4ab) [0x7c616b]
11: (OSD::load_pgs()+0xa20) [0x6a9170]
12: (OSD::init()+0xc84) [0x6ac204]
13: (main()+0x2839) [0x632459]
14: (__libc_start_main()+0xf5) [0x7f7fd08b3b45]
15: /usr/bin/ceph-osd() [0x64c087]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


2016-03-02 9:38 GMT+01:00 Mario Giammarco <mgiammarco@xxxxxxxxx>:
Here it is:

 cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca
     health HEALTH_WARN
            4 pgs incomplete
            4 pgs stuck inactive
            4 pgs stuck unclean
            1 requests are blocked > 32 sec
     monmap e8: 3 mons at {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0}
            election epoch 840, quorum 0,1,2 0,1,2
     osdmap e2405: 3 osds: 3 up, 3 in
      pgmap v5904430: 288 pgs, 4 pools, 391 GB data, 100 kobjects
            1090 GB used, 4481 GB / 5571 GB avail
                 284 active+clean
                   4 incomplete
  client io 4008 B/s rd, 446 kB/s wr, 23 op/s


2016-03-02 9:31 GMT+01:00 Shinobu Kinjo <skinjo@xxxxxxxxxx>:
Is "ceph -s" still showing you same output?

>     cluster ac7bc476-3a02-453d-8e5c-606ab6f022ca
>      health HEALTH_WARN
>             4 pgs incomplete
>             4 pgs stuck inactive
>             4 pgs stuck unclean
>      monmap e8: 3 mons at
> {0=10.1.0.12:6789/0,1=10.1.0.14:6789/0,2=10.1.0.17:6789/0}
>             election epoch 832, quorum 0,1,2 0,1,2
>      osdmap e2400: 3 osds: 3 up, 3 in
>       pgmap v5883297: 288 pgs, 4 pools, 391 GB data, 100 kobjects
>             1090 GB used, 4481 GB / 5571 GB avail
>                  284 active+clean
>                    4 incomplete

Cheers,
S

----- Original Message -----
From: "Mario Giammarco" <mgiammarco@xxxxxxxxx>
To: "Lionel Bouton" <lionel-subscription@xxxxxxxxxxx>
Cc: "Shinobu Kinjo" <skinjo@xxxxxxxxxx>, ceph-users@xxxxxxxxxxxxxx
Sent: Wednesday, March 2, 2016 4:27:15 PM
Subject: Re: Help: pool not responding

Tried to set min_size=1 but unfortunately nothing has changed.
Thanks for the idea.

2016-02-29 22:56 GMT+01:00 Lionel Bouton <lionel-subscription@xxxxxxxxxxx>:

> Le 29/02/2016 22:50, Shinobu Kinjo a écrit :
>
> the fact that they are optimized for benchmarks and certainly not
> Ceph OSD usage patterns (with or without internal journal).
>
> Are you assuming that SSHD is causing the issue?
> If you could elaborate on this more, it would be helpful.
>
>
> Probably not (unless they reveal themselves extremely unreliable with Ceph
> OSD usage patterns which would be surprising to me).
>
> For incomplete PG the documentation seems good enough for what should be
> done :
> http://docs.ceph.com/docs/master/rados/operations/pg-states/
>
> The relevant text:
>
> *Incomplete* Ceph detects that a placement group is missing information
> about writes that may have occurred, or does not have any healthy copies.
> If you see this state, try to start any failed OSDs that may contain the
> needed information or temporarily adjust min_size to allow recovery.
>
> We don't have the full history but the most probable cause of these
> incomplete PGs is that min_size is set to 2 or 3 and at some time the 4
> incomplete pgs didn't have as many replica as the min_size value. So if
> setting min_size to 2 isn't enough setting it to 1 should unfreeze them.
>
> Lionel
>


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux