Re: Can't start osd- one osd alway be down.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Craig  Lewis, 

My pool have 300TB DATA, I can't recreate a new pool, then copying data by "ceph cp pool" (take very long time).

I upgraded Ceph to Giant (0.86), but still error :((

I think my proplem is "objects misplaced (0.320%)"

# ceph pg 23.96 query
             "num_objects_missing_on_primary": 0,
              "num_objects_degraded": 0,
              "num_objects_misplaced": 79,

  cluster xxxxxx-xxxxx-xxxxx-xxxxx
     health HEALTH_WARN 225 pgs degraded; 2 pgs repair; 225 pgs stuck degraded; 263 pgs stuck unclean; 225 pgs stuck undersized; 225 pgs undersized; recovery 308759/54799506 objects degraded (0.563%); 175270/54799506 objects misplaced (0.320%); 1/130 in osds are down;
            flags noout,nodeep-scrub
      pgmap v28905830: 14973 pgs, 23 pools, 70255 GB data, 17838 kobjects
            206 TB used, 245 TB / 452 TB avail
            308759/54799506 objects degraded (0.563%); 175270/54799506 objects misplaced (0.320%)
               14708 active+clean
                  38 active+remapped
                 225 active+undersized+degraded                                                                                                         
  client io 35068 kB/s rd, 71815 kB/s wr, 4956 op/s

- Checking in ceph log:

2014-10-28 15:33:59.733177 7f6a7f1ab700  5 osd.21 pg_epoch: 103718 pg[23.96( v 103713'171086 (103609'167229,103713'171086] local-les=103715 n=85 ec=25000 les/c 103715/103710 103714/103714/103236) [92,21,78] r=1 lpr=103714 pi=100280-103713/118 luod=0'0 crt=103713'171086 active] enter Started/ReplicaActive/RepNotRecovering

Then logging many failed log: (on many objects eg: c03fe096/rbd_data.5348922ae8944a.000000000000306b,..)

2014-10-28 15:33:59.343435 7f6a7e1a9700  5 -- op tracker -- seq: 1793, time: 2014-10-28 15:33:59.343435, event: done, op: MOSDPGPush(23.96 103718 [PushOp(c03fe096/rbd_data.5348922ae8944a.000000000000306b/head//24, version: 103622'283374, data_included: [0~4194304], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(c03fe096/rbd_data.5348922ae8944a.000000000000306b/head//24@103622'283374, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false)),PushOp(4120f096/rbd_data.7a63d32ae8944a.0000000000000083/head//24, version: 103679'295624, data_included: [0~4194304], data_size: 0, omap_header_size: 0, omap_entries_size: 0, attrset_size: 2, recovery_info: ObjectRecoveryInfo(4120f096/rbd_data.7a63d32ae8944a.0000000000000083/head//24@103679'295624, copy_subset: [0~4194304], clone_subset: {}), after_progress: ObjectRecoveryProgress(!first, data_recovered_to:4194304, data_complete:true, omap_recovered_to:, omap_complete:true), before_progress: ObjectRecoveryProgress(first, data_recovered_to:0, data_complete:false, omap_recovered_to:, omap_complete:false))])

Thanks!

--
Tuan
HaNoi-VietNam

 

On 2014-10-28 01:35, Craig Lewis wrote:

My experience is that once you hit this bug, those PGs are gone.  I tried marking the primary OSD OUT, which caused this problem to move to the new primary OSD.  Luckily for me, my affected PGs were using replication state in the secondary cluster.  I ended up deleting the whole pool and recreating it.
 
Which pools are 7 and 23?  It's possible that it's something that easy to replace.
 


On Fri, Oct 24, 2014 at 9:26 PM, Ta Ba Tuan <tuantb@xxxxxxxxxx> wrote:
Hi Craig, Thanks for replying.
When i started that osd, Ceph Log from "ceph -w" warns pgs 7.9d8 23.596, 23.9c6, 23.63 can't recovery as pasted log.

Those pgs are "active+degraded" state.
#ceph pg map 7.9d8
osdmap e102808 pg 7.9d8 (7.9d8) -> up [93,49] acting [93,49]  (When start osd.21 then pg 7.9d8 and three remain pgs  to changed to state "active+recovering") . osd.21 still down after following logs:

 

 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux