The solution to prevent this now (hours long) fix on my part was buried in material
labeled as "upgrade form 0.80x giant".
issue to the forefront, like the single 0.93 issue called out.
On Thu, Apr 9, 2015 at 5:34 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
If you dig into the list archives I think somebody else went through
this when the issue was discovered and recovered successfully. But I
don't know the details. :)
-Greg
On Thu, Apr 9, 2015 at 3:38 PM, Dirk Grunwald
<Dirk.Grunwald@xxxxxxxxxxxx> wrote:
> Aha. That would have been useful to see -- I saw the notice about 0.93, but
> not that.
>
> when I roll back to v0.92, I get a different error (see below)
>
> This doesn't seem very happy - any suggestions?
>
>
> root@zfs2:~/XYZZY/v92# ceph-osd -d -i 4 --flush-journal
> 2015-04-09 16:31:44.756113 7f987f822900 0 ceph version 0.92
> (00a3ac3b67d93860e7f0b6e07319f11b14d0fec0), process ceph-osd, pid 12605
> 2015-04-09 16:31:44.758743 7f987f822900 0
> filestore(/var/lib/ceph/osd/ceph-4) backend btrfs (magic 0x9123683e)
> 2015-04-09 16:31:44.807613 7f987f822900 0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: FIEMAP
> ioctl is supported and appears to work
> 2015-04-09 16:31:44.807673 7f987f822900 0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: FIEMAP
> ioctl is disabled via 'filestore fiemap' config opt\
> ion
> 2015-04-09 16:31:45.148028 7f987f822900 0
> genericfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_features: syncfs(2)
> syscall fully supported (by glibc and kernel)
> 2015-04-09 16:31:45.148163 7f987f822900 0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: CLONE_RANGE
> ioctl is supported
> 2015-04-09 16:31:45.923009 7f987f822900 0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: SNAP_CREATE
> is supported
> 2015-04-09 16:31:45.923673 7f987f822900 0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: SNAP_DESTROY
> is supported
> 2015-04-09 16:31:45.923979 7f987f822900 0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: START_SYNC
> is supported (transid 372081)
> 2015-04-09 16:31:46.381367 7f987f822900 0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature: WAIT_SYNC is
> supported
> 2015-04-09 16:31:46.724449 7f987f822900 0
> btrfsfilestorebackend(/var/lib/ceph/osd/ceph-4) detect_feature:
> SNAP_CREATE_V2 is supported
> 2015-04-09 16:31:47.473175 7f987f822900 0
> filestore(/var/lib/ceph/osd/ceph-4) mount: enabling PARALLEL journal mode:
> fs, checkpoint is enabled
> HDIO_DRIVE_CMD(identify) failed: Invalid argument
> 2015-04-09 16:31:47.495711 7f987f822900 1 journal _open
> /var/lib/ceph/osd/ceph-4/journal fd 16: 1072693248 bytes, block size 4096
> bytes, directio = 1, aio = 1
> terminate called after throwing an instance of
> 'ceph::buffer::malformed_input'
> what(): buffer::malformed_input: __PRETTY_FUNCTION__ unknown encoding
> version > 8
> *** Caught signal (Aborted) **
> in thread 7f987f822900
> ceph version 0.92 (00a3ac3b67d93860e7f0b6e07319f11b14d0fec0)
> 1: ceph-osd() [0xac511a]
> 2: (()+0x10340) [0x7f987e4da340]
> 3: (gsignal()+0x39) [0x7f987c979cc9]
> 4: (abort()+0x148) [0x7f987c97d0d8]
>
>
> On Thu, Apr 9, 2015 at 3:22 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>
>> On Thu, Apr 9, 2015 at 2:05 PM, Dirk Grunwald
>> <Dirk.Grunwald@xxxxxxxxxxxx> wrote:
>> > Ceph cluster, U14.10 base system, OSD's using BTRFS, journal on same
>> > disk as
>> > partition
>> > (done using ceph-deploy)
>> >
>> > I had been running 0.92 without (significant) issue. I upgraded
>> > to Hammer (0.94) be modifying /etc/apt/sources.list, apt-get update,
>> > apt-get
>> > upgrade
>> >
>> > Upgraded and restarted ceph-mon and then ceph-osd
>> >
>> > Most of the 50 OSD's are in a failure cycle with the error
>> > "os/Transaction.cc: 504: FAILED assert(ops == data.ops)"
>> >
>> > Right now, the entire cluster is useless because of this.
>> >
>> > Any suggestions?
>>
>> It looks like maybe it's under the v80.x section instead of general
>> upgrading, but the release notes include:
>>
>> * If you are upgrading specifically from v0.92, you must stop all OSD
>> daemons and flush their journals (``ceph-osd -i NNN
>> --flush-journal``) before upgrading. There was a transaction
>> encoding bug in v0.92 that broke compatibility. Upgrading from v0.93,
>> v0.91, or anything earlier is safe.
>>
>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com