Re: ceph-osd failure following 0.92 -> 0.94 upgrade

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If you dig into the list archives I think somebody else went through
this when the issue was discovered and recovered successfully. But I
don't know the details. :)
-Greg

On Thu, Apr 9, 2015 at 3:38 PM, Dirk Grunwald
<Dirk.Grunwald@xxxxxxxxxxxx> wrote:
> Aha. That would have been useful to see -- I saw the notice about 0.93, but
> not that.
>
> when I roll back to v0.92, I get a different error (see below)
>
> This doesn't seem very happy - any suggestions?
>
>
> root@zfs2:~/XYZZY/v92# ceph-osd -d -i 4 --flush-journal
> 2015-04-09 16:31:44.756113 7f987f822900  0 ceph version 0.92
> (00a3ac3b67d93860e7f0b6e07319f11b14d0fec0), process ceph-osd, pid 12605
> 2015-04-09 16:31:44.758743 7f987f822900  0
> filestore(/var/lib/ceph/osd/ceph-4) backend btrfs (magic 0x9123683e)
> 2015-04-09 16:31:44.807613 7f987f822900  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: FIEMAP
> ioctl is supported and appears to work
> 2015-04-09 16:31:44.807673 7f987f822900  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config opt\
> ion
> 2015-04-09 16:31:45.148028 7f987f822900  0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: syncfs(2)
> syscall fully supported (by glibc and kernel)
> 2015-04-09 16:31:45.148163 7f987f822900  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: CLONE_RANGE
> ioctl is supported
> 2015-04-09 16:31:45.923009 7f987f822900  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: SNAP_CREATE
> is supported
> 2015-04-09 16:31:45.923673 7f987f822900  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: SNAP_DESTROY
> is supported
> 2015-04-09 16:31:45.923979 7f987f822900  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: START_SYNC
> is supported (transid 372081)
> 2015-04-09 16:31:46.381367 7f987f822900  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: WAIT_SYNC is
> supported
> 2015-04-09 16:31:46.724449 7f987f822900  0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature:
> SNAP_CREATE_V2 is supported
> 2015-04-09 16:31:47.473175 7f987f822900  0
> filestore(/var/lib/ceph/osd/ceph-4) mount: enabling PARALLEL journal mode:
> fs, checkpoint is enabled
>  HDIO_DRIVE_CMD(identify) failed: Invalid argument
> 2015-04-09 16:31:47.495711 7f987f822900  1 journal _open
> /var/lib/ceph/osd/ceph-4/journal fd 16: 1072693248 bytes, block size 4096
> bytes, directio = 1, aio = 1
> terminate called after throwing an instance of
> 'ceph::buffer::malformed_input'
>   what():  buffer::malformed_input: __PRETTY_FUNCTION__ unknown encoding
> version > 8
> *** Caught signal (Aborted) **
>  in thread 7f987f822900
>  ceph version 0.92 (00a3ac3b67d93860e7f0b6e07319f11b14d0fec0)
>  1: ceph-osd() [0xac511a]
>  2: (()+0x10340) [0x7f987e4da340]
>  3: (gsignal()+0x39) [0x7f987c979cc9]
>  4: (abort()+0x148) [0x7f987c97d0d8]
>
>
> On Thu, Apr 9, 2015 at 3:22 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>> On Thu, Apr 9, 2015 at 2:05 PM, Dirk Grunwald
>> <Dirk.Grunwald@xxxxxxxxxxxx> wrote:
>> > Ceph cluster, U14.10 base system, OSD's using BTRFS, journal on same
>> > disk as
>> > partition
>> > (done using ceph-deploy)
>> >
>> > I had been running 0.92 without (significant) issue. I upgraded
>> > to Hammer (0.94) be modifying /etc/apt/sources.list, apt-get update,
>> > apt-get
>> > upgrade
>> >
>> > Upgraded and restarted ceph-mon and then ceph-osd
>> >
>> > Most of the 50 OSD's are in a failure cycle with the error
>> > "os/Transaction.cc: 504: FAILED assert(ops == data.ops)"
>> >
>> > Right now, the entire cluster is useless because of this.
>> >
>> > Any suggestions?
>>
>> It looks like maybe it's under the v80.x section instead of general
>> upgrading, but the release notes include:
>>
>> * If you are upgrading specifically from v0.92, you must stop all OSD
>>   daemons and flush their journals (``ceph-osd -i NNN
>>   --flush-journal``) before upgrading.  There was a transaction
>>   encoding bug in v0.92 that broke compatibility.  Upgrading from v0.93,
>>   v0.91, or anything earlier is safe.
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux