Re: v12.2.7 Luminous released

Nicolas Huillard <nhuillard@xxxxxxxxxxx> · Wed, 18 Jul 2018 16:47:10 +0200

Hi all,

This is just to report that I just upgraded smoothly from 12.2.6 to
12.2.7 (bluestore only, bitten by the "damaged mds" consequence of the
bad checksum on mds journal 0x200).
This was a really bad problem for CephFS. Hopefully, that cluster was
not in production yet (that's why I didn't ask myself too many
questions before upgrading to 12.2.6).

Many many many thanks to all who provided help, and to the brave
developers who might not have had fun those days too ;-)

Le mardi 17 juillet 2018 à 18:28 +0200, Abhishek Lekshmanan a écrit :
> This is the seventh bugfix release of Luminous v12.2.x long term
> stable release series. This release contains several fixes for
> regressions in the v12.2.6 and v12.2.5 releases.  We recommend that
> all users upgrade. 
> 
> *NOTE* The v12.2.6 release has serious known regressions, while
> 12.2.6
> wasn't formally announced in the mailing lists or blog, the packages
> were built and available on download.ceph.com since last week. If you
> installed this release, please see the upgrade procedure below.
> 
> *NOTE* The v12.2.5 release has a potential data corruption issue with
> erasure coded pools. If you ran v12.2.5 with erasure coding, please
> see
> below.
> 
> The full blog post alongwith the complete changelog is published at
> the
> official ceph blog at https://ceph.com/releases/12-2-7-luminous-relea
> sed/
> 
> Upgrading from v12.2.6
> ----------------------
> 
> v12.2.6 included an incomplete backport of an optimization for
> BlueStore OSDs that avoids maintaining both the per-object checksum
> and the internal BlueStore checksum.  Due to the accidental omission
> of a critical follow-on patch, v12.2.6 corrupts (fails to update) the
> stored per-object checksum value for some objects.  This can result
> in
> an EIO error when trying to read those objects.
> 
> #. If your cluster uses FileStore only, no special action is
> required.
>    This problem only affects clusters with BlueStore.
> 
> #. If your cluster has only BlueStore OSDs (no FileStore), then you
>    should enable the following OSD option::
> 
>      osd skip data digest = true
> 
>    This will avoid setting and start ignoring the full-object digests
>    whenever the primary for a PG is BlueStore.
> 
> #. If you have a mix of BlueStore and FileStore OSDs, then you should
>    enable the following OSD option::
> 
>      osd distrust data digest = true
> 
>    This will avoid setting and start ignoring the full-object digests
>    in all cases.  This weakens the data integrity checks for
>    FileStore (although those checks were always only opportunistic).
> 
> If your cluster includes BlueStore OSDs and was affected, deep scrubs
> will generate errors about mismatched CRCs for affected objects.
> Currently the repair operation does not know how to correct them
> (since all replicas do not match the expected checksum it does not
> know how to proceed).  These warnings are harmless in the sense that
> IO is not affected and the replicas are all still in sync.  The
> number
> of affected objects is likely to drop (possibly to zero) on their own
> over time as those objects are modified.  We expect to include a
> scrub
> improvement in v12.2.8 to clean up any remaining objects.
> 
> Additionally, see the notes below, which apply to both v12.2.5 and
> v12.2.6.
> 
> Upgrading from v12.2.5 or v12.2.6
> ---------------------------------
> 
> If you used v12.2.5 or v12.2.6 in combination with erasure coded
> pools, there is a small risk of corruption under certain workloads.
> Specifically, when:
> 
> * An erasure coded pool is in use
> * The pool is busy with successful writes
> * The pool is also busy with updates that result in an error result
> to
>   the librados user.  RGW garbage collection is the most common
>   example of this (it sends delete operations on objects that don't
>   always exist.)
> * Some OSDs are reasonably busy.  One known example of such load is
>   FileStore splitting, although in principle any load on the cluster
>   could also trigger the behavior.
> * One or more OSDs restarts.
> 
> This combination can trigger an OSD crash and possibly leave PGs in a
> state
> where they fail to peer.
> 
> Notably, upgrading a cluster involves OSD restarts and as such may
> increase the risk of encountering this bug.  For this reason, for
> clusters with erasure coded pools, we recommend the following upgrade
> procedure to minimize risk:
> 
> 1. Install the v12.2.7 packages.
> 2. Temporarily quiesce IO to cluster::
> 
>      ceph osd pause
> 
> 3. Restart all OSDs and wait for all PGs to become active.
> 4. Resume IO::
> 
>      ceph osd unpause
> 
> This will cause an availability outage for the duration of the OSD
> restarts.  If this in unacceptable, an *more risky* alternative is to
> disable RGW garbage collection (the primary known cause of these
> rados
> operations) for the duration of the upgrade::
> 
> 1. Set ``rgw_enable_gc_threads = false`` in ceph.conf
> 2. Restart all radosgw daemons
> 3. Upgrade and restart all OSDs
> 4. Remove ``rgw_enable_gc_threads = false`` from ceph.conf
> 5. Restart all radosgw daemons
> 
> Upgrading from other versions
> -----------------------------
> 
> If your cluster did not run v12.2.5 or v12.2.6 then none of the above
> issues apply to you and you should upgrade normally.
> 
> v12.2.7 Changelog
> -----------------
> 
> * mon/AuthMonitor: improve error message (issue#21765, pr#22963,
> Douglas Fuller)
> * osd/PG: do not blindly roll forward to log.head (issue#24597,
> pr#22976, Sage Weil)
> * osd/PrimaryLogPG: rebuild attrs from clients (issue#24768 ,
> pr#22962, Sage Weil)
> * osd: work around data digest problems in 12.2.6 (version 2)
> (issue#24922, pr#23055, Sage Weil)
> * rgw: objects in cache never refresh after rgw_cache_expiry_interval
> (issue#24346, pr#22369, Casey Bodley, Matt Benjamin)
> 
> Notable changes in v12.2.6 Luminous
> ===================================
> 
> :note: This is a broken release with serious known regressions.  Do
> not
> install it. The release notes below are to help track the changes
> that
> went in 12.2.6 and hence a part of 12.2.7
> 
> 
> - *Auth*:
> 
>   * In 12.2.4 and earlier releases, keyring caps were not checked for
> validity,
>     so the caps string could be anything. As of 12.2.6, caps strings
> are
>     validated and providing a keyring with an invalid caps string to,
> e.g.,
>     "ceph auth add" will result in an error.
>   * CVE 2018-1128: auth: cephx authorizer subject to replay attack
> (issue#24836, Sage Weil)
>   * CVE 2018-1129: auth: cephx signature check is weak (issue#24837,
> Sage Weil)
>   * CVE 2018-10861: mon: auth checks not correct for pool ops
> (issue#24838, Jason Dillaman)
> 
> 
> - The config-key interface can store arbitrary binary blobs but JSON
>   can only express printable strings.  If binary blobs are present,
>   the 'ceph config-key dump' command will show them as something like
>   ``<<< binary blob of length N >>>``.
> 
> The full changelog for 12.2.6 is published in the release blog.
> 
> Getting ceph:
> * Git at git://github.com/ceph/ceph.git
> * Tarball at http://download.ceph.com/tarballs/ceph-12.2.7.tar.gz
> * For packages, see http://docs.ceph.com/docs/master/install/get-pack
> ages/
> * Release git sha1: 3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5
> 
-- 
Nicolas Huillard
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com