Re: different omap format in one cluster (.sst + .ldb) - new installed OSD-node don't start any OSD

Udo Lembke <ulembke@xxxxxxxxxxxx> · Thu, 23 Jul 2015 13:51:17 +0200

Hi,
I use ceph 0.94 from wheezy repro (deb http://eu.ceph.com/debian-hammer wheezy main) inside jessie.
0.94.1 are installable without trouble, but an upgrade to 0.94.2 don't work correctly:
dpkg -l | grep ceph
ii  ceph                               0.94.1-1~bpo70+1              amd64        distributed storage and file system
ii  ceph-common                        0.94.2-1~bpo70+1              amd64        common utilities to mount and interact
with a ceph storage cluster
ii  ceph-fs-common                     0.94.2-1~bpo70+1              amd64        common utilities to mount and interact
with a ceph file system
ii  ceph-fuse                          0.94.2-1~bpo70+1              amd64        FUSE-based client for the Ceph
distributed file system
ii  ceph-mds                           0.94.2-1~bpo70+1              amd64        metadata server for the ceph
distributed file system
ii  libcephfs1                         0.94.2-1~bpo70+1              amd64        Ceph distributed file system client
library
ii  python-cephfs                      0.94.2-1~bpo70+1              amd64        Python libraries for the Ceph
libcephfs library

This is the reason, why I switched back to wheezy (and clean 0.94.2) but than all OSDs on that node failed to start.
Switching back to the jessie-system-disk don't solve this ploblem, because only 3 OSDs started again...

My conclusion is, if now die one of my (partly brocken) jessie osd-node (like failed system ssd) I need less than an
hour for a new system (wheezy), around two ours to reinitilize all OSDs (format new, install ceph) and around two days
to refill the whole node.

Udo

Am 23.07.2015 13:21, schrieb Haomai Wang:
> Do you use upstream ceph version previously? Or do you shutdown
> running ceph-osd when upgrading osd?
> 
> How many osds meet this problems?
> 
> This assert failure means that osd detects a upgraded pg meta object
> but failed to read(or lack of 1 key) meta keys from object.
> 
> On Thu, Jul 23, 2015 at 7:03 PM, Udo Lembke <ulembke@xxxxxxxxxxxx> wrote:
>> Am 21.07.2015 12:06, schrieb Udo Lembke:
>>> Hi all,
>>> ...
>>>
>>> Normaly I would say, if one OSD-Node die, I simply reinstall the OS and ceph and I'm back again... but this looks bad
>>> for me.
>>> Unfortunality the system also don't start 9 OSDs as I switched back to the old system-disk... (only three of the big
>>> OSDs are running well)
>>>
>>> What is the best solution for that? Empty one node (crush weight 0), fresh reinstall OS/ceph, reinitialise all OSDs?
>>> This will take a long long time, because we use 173TB in this cluster...
>>>
>>>
>>
>> Hi,
>> answer myself if anybody has similiar issues and find the posting.
>>
>> Empty the whole nodes takes too long.
>> I used the puppet wheezy system and have to recreate all OSDs (in this case I need to empty the first blocks of the
>> journal before create the OSD again).
>>
>>
>> Udo
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com