Help fixing clobbered OSD's

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I haven't had much sleep and have accidentally started an OSD on a mount point mapped to two disks containing OSD data. I think this was the case, I'm unable to explain how it happened or if this was even the cause. Yeh.. that tired...

What I think happened was OSD.9's disk was mounted over OSD.15 disk. OSD.15 may or may not have been running at the time. OSD.15 now has the error - 

   -33> 2013-03-18 21:42:38.610114 7f8048773760  5 filestore(/srv/ceph/osd/osd.15) test_mount basedir /srv/ceph/osd/osd.15 journal /dev/sdd4
   -32> 2013-03-18 21:42:38.610153 7f8048773760  1 -- 0.0.0.0:6860/22287 messenger.start
   -31> 2013-03-18 21:42:38.610181 7f8048773760  1 -- :/0 messenger.start
   -30> 2013-03-18 21:42:38.610196 7f8048773760  1 -- 0.0.0.0:6862/22287 messenger.start
   -29> 2013-03-18 21:42:38.610207 7f8048773760  1 -- 0.0.0.0:6861/22287 messenger.start
   -28> 2013-03-18 21:42:38.610299 7f8048773760  2 osd.15 0 mounting /srv/ceph/osd/osd.15 /dev/sdd4
   -27> 2013-03-18 21:42:38.610309 7f8048773760  5 filestore(/srv/ceph/osd/osd.15) basedir /srv/ceph/osd/osd.15 journal /dev/sdd4
   -26> 2013-03-18 21:42:38.610325 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) mount fsid is 71dbf00f-ae22-4366-b610-064107e26697
   -25> 2013-03-18 21:42:38.727408 7f8048773760  0 filestore(/srv/ceph/osd/osd.15) mount FIEMAP ioctl is supported and appears to work
   -24> 2013-03-18 21:42:38.727423 7f8048773760  0 filestore(/srv/ceph/osd/osd.15) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
   -23> 2013-03-18 21:42:38.727829 7f8048773760  0 filestore(/srv/ceph/osd/osd.15) mount did NOT detect btrfs
   -22> 2013-03-18 21:42:38.852287 7f8048773760  0 filestore(/srv/ceph/osd/osd.15) mount syscall(SYS_syncfs, fd) fully supported
   -21> 2013-03-18 21:42:38.852379 7f8048773760  0 filestore(/srv/ceph/osd/osd.15) mount found snaps <>
   -20> 2013-03-18 21:42:38.852401 7f8048773760  5 filestore(/srv/ceph/osd/osd.15) mount op_seq is 25638742
   -19> 2013-03-18 21:42:38.986099 7f8048773760 20 filestore (init)dbobjectmap: seq is 1
   -18> 2013-03-18 21:42:38.986123 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) open_journal at /dev/sdd4
   -17> 2013-03-18 21:42:38.986150 7f8048773760  0 filestore(/srv/ceph/osd/osd.15) mount: enabling WRITEAHEAD journal mode: btrfs not detected
   -16> 2013-03-18 21:42:38.986154 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) list_collections
   -15> 2013-03-18 21:42:38.989878 7f8044d1a700 20 filestore(/srv/ceph/osd/osd.15) sync_entry waiting for max_interval 5.000000
   -14> 2013-03-18 21:42:38.993422 7f8048773760  0 journal  kernel version is 3.6.9
   -13> 2013-03-18 21:42:39.012659 7f8048773760  0 journal  kernel version is 3.6.9
   -12> 2013-03-18 21:42:39.060070 7f80277fe700 20 filestore(/srv/ceph/osd/osd.15) flusher_entry start
   -11> 2013-03-18 21:42:39.060091 7f80277fe700 20 filestore(/srv/ceph/osd/osd.15) flusher_entry sleeping
   -10> 2013-03-18 21:42:39.060091 7f8048773760  2 osd.15 0 boot
    -9> 2013-03-18 21:42:39.060104 7f8048773760 15 filestore(/srv/ceph/osd/osd.15) read meta/23c2fcde/osd_superblock/0//-1 0~0
    -8> 2013-03-18 21:42:39.060503 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) FileStore::read meta/23c2fcde/osd_superblock/0//-1 0~332/332
    -7> 2013-03-18 21:42:39.060587 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) stat meta/16ef7597/infos/head//-1 = 0 (size 0)
    -6> 2013-03-18 21:42:39.060622 7f8048773760 15 filestore(/srv/ceph/osd/osd.15) read meta/4edc6dd9/osdmap.33122/0//-1 0~0
    -5> 2013-03-18 21:42:39.061131 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) FileStore::read meta/4edc6dd9/osdmap.33122/0//-1 0~117079/117079
    -4> 2013-03-18 21:42:39.063115 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) list_collections
    -3> 2013-03-18 21:42:39.064753 7f8048773760 15 filestore(/srv/ceph/osd/osd.15) collection_getattr /srv/ceph/osd/osd.15/current/0.39_head 'info'
    -2> 2013-03-18 21:42:39.064780 7f8048773760 10 filestore(/srv/ceph/osd/osd.15) collection_getattr /srv/ceph/osd/osd.15/current/0.39_head 'info' = 1
    -1> 2013-03-18 21:42:39.064798 7f8048773760 15 filestore(/srv/ceph/osd/osd.15) omap_get_values meta/16ef7597/infos/head//-1
     0> 2013-03-18 21:42:39.066873 7f8048773760 -1 osd/PG.cc: In function 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::bufferlist*)' t$
osd/PG.cc: 2393: FAILED assert(values.size() == 1)

 ceph version 0.58 (ba3f91e7504867a52a83399d60917e3414e8c3e2)
 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&, ceph::buffer::list*)+0x469) [0x680199]
 2: (OSD::load_pgs()+0x1909) [0x6219d9]
 3: (OSD::init()+0xd07) [0x634e27]
 4: (main()+0x2deb) [0x5640cb]
 5: (__libc_start_main()+0xfd) [0x308921ecdd]
 6: ceph-osd() [0x560f29]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.


OSD.9's error is now -

013-03-18 21:44:08.652017 7f3547fff700 20 filestore(/srv/ceph/osd/osd.9) flusher_entry start
2013-03-18 21:44:08.652147 7f3547fff700 20 filestore(/srv/ceph/osd/osd.9) flusher_entry sleeping
2013-03-18 21:44:08.652333 7f35557f9760  5ilestore(/srv/ceph/osd/osd.9) moumount /srv/ceph/osd/osd.9
2013-03-18 21:44:08.652374 7f3547fff700 20 filestore(/srv/ceph/osd/osd.9) flusher_entry awoke
2013-03-18 21:44:08.652386 7f3547fff700 20 filestore(/srv/ceph/osd/osd.9) flusher_entry finish
2013-03-18 21:44:08.652404 7f354f7fe700 20 filestore(/srv/ceph/osd/osd.9) sync_entry force_sync set
2013-03-18 21:44:08.653267 7f35557f9760  5 filestore(/srv/ceph/osd/osd.9) test_mount basedir /srv/ceph/osd/osd.9 journal /dev/sda10
2013-03-18 21:44:08.653543 7f35557f9760  5 filestore(/srv/ceph/osd/osd.9) basedir /srv/ceph/osd/osd.9 journal /dev/sda10
2013-03-18 21:44:08.653559 7f35557f9760 10 filestore(/srv/ceph/osd/osd.9) mount fsid is f6ca54bf-d38e-4618-b411-a11b7c94bacb
2013-03-18 21:44:08.776538 7f35557f9760  0 filestore(/srv/ceph/osd/osd.9) mount FIEMAP ioctl is supported and appears to work
2013-03-18 21:44:08.776555 7f35557f9760  0ilestore(/srv/ceph/osd/osd.9) momount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-03-18 21:44:08.776955 7f35557f9760  0 filestore(/srv/ceph/osd/osd.9) mount did NOT detect btrfs
2013-03-18 21:44:08.951303 7f35557f9760  0 filestore(/srv/ceph/osd/osd.9) mount syscall(SYS_syncfs, fd) fully supported
2013-03-18 21:44:08.951390 7f35557f9760  0 filestore(/srv/ceph/osd/osd.9) mount found snaps <>
2013-03-18 21:44:08.951414 7f35557f9760  5 filestore(/srv/ceph/osd/osd.9) mount op_seq is 27929237
2013-03-18 21:44:09.151817 7f35557f9760 20 filestore (init)dbobjectmap: seq is 1
2013-03-18 21:44:09.151837 7f35557f9760 10 filestore(/srv/ceph/osd/osd.9) open_journal at /dev/sda10
2013-03-18 21:44:09.151863 7f35557f9760  0 filestore(/srv/ceph/osd/osd.9) mount: enabling WRITEAHEAD journal mode: btrfs not detected
2013-03-18 21:44:09.151874 7f35557f9760 10 filestore(/srv/ceph/osd/osd.9) list_collections
2013-03-18 21:44:09.155418 7f354d7fa700 20 filestore(/srv/ceph/osd/osd.9) sync_entry waiting for max_interval 5.000000
2013-03-18 21:44:09.159408 7f35557f976 0 journal  kernel version is 3.6.9
2013-03-18 21:44:09.163808 7f35557f976 0 journal  kernel version is 3.6.9
2013-03-18 21:44:09.168595 7f3533fff700 20 filestore(/srv/ceph/osd/osd.9) flusher_entry start
2013-03-18 21:44:09.168620 7f3533fff700 20 filestore(/srv/ceph/osd/osd.9) flusher_entry sleeping
2013-03-18 21:44:09.168703 7f35557f9760 15 filestore(/srv/ceph/osd/osd.9) read meta/23c2fcde/osd_superblock/0//-1 0~0
2013-03-18 21:44:09.168822 7f35557f9760 10 filestore(/srv/ceph/osd/osd.9) FileStore::read meta/23c2fcde/osd_superblock/0//-1 0~332/332
2013-03-18 21:44:09.168846 7f35557f9760 -1 osd.9 0 read_superblock superblock says osd.15, but i (think i) am osd.9
2013-03-18 21:44:09.168851 7f35557f9760 -1 osd.9 0 OSD::init() : unable to read osd superblock
2013-03-18 21:44:09.168855 7f35557f9760  5 filestore(/srv/ceph/osd/osd.9) umount /srv/ceph/osd/osd.9
2013-03-18 21:44:09.168911 7f3533fff700 20 filestore(/srv/ceph/osd/osd.9) flusher_entry awoke
2013-03-18 21:44:09.168920 7f3533fff700 20 filestore(/srv/ceph/osd/osd.9) flusher_entry finish
2013-03-18 21:44:09.168956 7f354d7fa700 20 filestore(/srv/ceph/osd/osd.9) sync_entry force_sync set
2013-03-18 21:44:09.169820 7f35557f9760 -1 ^[[0;31m ** ERROR: osd init failed: (22) Invalid argument^[[0m

OSD.9 originally had an error of -

2013-03-18 18:41:59.526134 7f11d1cab760  0 ceph version 0.58 (ba3f91e7504867a52a83399d60917e3414e8c3e2), process ceph-osd, pid 12536
2013-03-18 18:41:59.613435 7f11d1cab760  0 filestore(/srv/ceph/osd/osd.9) mount FIEMAP ioctl is supported and appears to work
2013-03-18 18:41:59.613451 7f11d1cab760  0 filestore(/srv/ceph/osd/osd.9) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-03-18 18:41:59.613879 7f11d1cab760  0 filestore(/srv/ceph/osd/osd.9) mount did NOT detect btrfs
2013-03-18 18:41:59.696583 7f11d1cab760  0 filestore(/srv/ceph/osd/osd.9) mount syscall(SYS_syncfs, fd) fully supported
2013-03-18 18:41:59.697124 7f11d1cab760  0 filestore(/srv/ceph/osd/osd.9) mount found snaps <>
2013-03-18 18:41:59.731863 7f11d1cab760 -1 filestore(/srv/ceph/osd/osd.9) Error initializing leveldb: IO error: /srv/ceph/osd/osd.9/current/omap/MANIFEST-008335: No$

2013-03-18 18:41:59.731920 7f11d1cab760 -1 ^[[0;31m ** ERROR: error converting store /srv/ceph/osd/osd.9: (1) Operation not permitted^[[0m
2013-03-18 18:45:04.169338 7f625a845760  0 ceph version 0.58 (ba3f91e7504867a52a83399d60917e3414e8c3e2), process ceph-osd, pid 16404
2013-03-18 18:45:04.611473 7f625a845760  0 filestore(/srv/ceph/osd/osd.9) mount FIEMAP ioctl is supported and appears to work
2013-03-18 18:45:04.611490 7f625a845760  0 filestore(/srv/ceph/osd/osd.9) mount FIEMAP ioctl is disabled via 'filestore fiemap' config option
2013-03-18 18:45:04.611907 7f625a845760  0 filestore(/srv/ceph/osd/osd.9) mount did NOT detect btrfs
2013-03-18 18:45:04.777939 7f625a845760  0 filestore(/srv/ceph/osd/osd.9) mount syscall(SYS_syncfs, fd) fully supported
2013-03-18 18:45:04.778060 7f625a845760  0 filestore(/srv/ceph/osd/osd.9) mount found snaps <>
2013-03-18 18:45:04.778546 7f625a845760 -1 filestore(/srv/ceph/osd/osd.9) Error initializing leveldb: IO error: /srv/ceph/osd/osd.9/current/omap/MANIFEST-008335: No$

2013-03-18 18:45:04.778606 7f625a845760 -1 ^[[0;31m ** ERROR: error converting store /srv/ceph/osd/osd.9: (1) Operation not permitted^[[0m

What I think has happened is that OSD.9 has had it's metadata clobbered by OSD.15. In doing so OSD.15 has killed itself.

Question is if it's possible to restore the OSD's without destroying them? The data appears intact and the metadata looks OK from a glace but the levelDB and superblock has definitely taken a hit.

Thanks in advance
-Matt
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux