Re: v12.2.8 Luminous released

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a friendly reminder that multi-active MDS clusters must be
reduced to only 1 active during upgrades [1].

In the case of v12.2.8, the CEPH_MDS_PROTOCOL version has changed so
if you try to upgrade one MDS it will get stuck in the resolve state,
logging:

conn(0x55e3d9671000 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH
pgs=0 cs=0 l=0).handle_connect_reply connect protocol version
mismatch, my 31 != 30

Cheers, Dan

[1] http://docs.ceph.com/docs/luminous/cephfs/upgrading/

On Wed, Sep 5, 2018 at 4:20 PM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
>
> Thanks for the release!
>
> We've updated some test clusters (rbd, cephfs) and it looks good so far.
>
> -- dan
>
>
> On Tue, Sep 4, 2018 at 6:30 PM Abhishek Lekshmanan <abhishek@xxxxxxxx> wrote:
> >
> >
> > We're glad to announce the next point release in the Luminous v12.2.X
> > stable release series. This release contains a range of bugfixes and
> > stability improvements across all the components of ceph. For detailed
> > release notes with links to tracker issues and pull requests, refer to
> > the blog post at http://ceph.com/releases/v12-2-8-released/
> >
> > Upgrade Notes from previous luminous releases
> > ---------------------------------------------
> >
> > When upgrading from v12.2.5 or v12.2.6 please note that upgrade caveats from
> > 12.2.5 will apply to any _newer_ luminous version including 12.2.8. Please read
> > the notes at https://ceph.com/releases/12-2-7-luminous-released/#upgrading-from-v12-2-6
> >
> > For the cluster that installed the broken 12.2.6 release, 12.2.7 fixed the
> > regression and introduced a workaround option `osd distrust data digest = true`,
> > but 12.2.7 clusters still generated health warnings like ::
> >
> >   [ERR] 11.288 shard 207: soid
> >   11:1155c332:::rbd_data.207dce238e1f29.0000000000000527:head data_digest
> >   0xc8997a5b != data_digest 0x2ca15853
> >
> >
> > 12.2.8 improves the deep scrub code to automatically repair these
> > inconsistencies. Once the entire cluster has been upgraded and then fully deep
> > scrubbed, and all such inconsistencies are resolved; it will be safe to disable
> > the `osd distrust data digest = true` workaround option.
> >
> > Changelog
> > ---------
> > * bluestore: set correctly shard for existed Collection (issue#24761, pr#22860, Jianpeng Ma)
> > * build/ops: Boost system library is no longer required to compile and link example librados program (issue#25054, pr#23202, Nathan Cutler)
> > * build/ops: Bring back diff -y for non-FreeBSD (issue#24396, issue#21664, pr#22848, Sage Weil, David Zafman)
> > * build/ops: install-deps.sh fails on newest openSUSE Leap (issue#25064, pr#23179, Kyr Shatskyy)
> > * build/ops: Mimic build fails with -DWITH_RADOSGW=0 (issue#24437, pr#22864, Dan Mick)
> > * build/ops: order rbdmap.service before remote-fs-pre.target (issue#24713, pr#22844, Ilya Dryomov)
> > * build/ops: rpm: silence osd block chown (issue#25152, pr#23313, Dan van der Ster)
> > * cephfs-journal-tool: Fix purging when importing an zero-length journal (issue#24239, pr#22980, yupeng chen, zhongyan gu)
> > * cephfs: MDSMonitor: uncommitted state exposed to clients/mdss (issue#23768, pr#23013, Patrick Donnelly)
> > * ceph-fuse mount failed because no mds (issue#22205, pr#22895, liyan)
> > * ceph-volume add a __release__ string, to help version-conditional calls (issue#25170, pr#23331, Alfredo Deza)
> > * ceph-volume: adds test for `ceph-volume lvm list /dev/sda` (issue#24784, issue#24957, pr#23350, Andrew Schoen)
> > * ceph-volume: do not use stdin in luminous (issue#25173, issue#23260, pr#23367, Alfredo Deza)
> > * ceph-volume enable the ceph-osd during lvm activation (issue#24152, pr#23394, Dan van der Ster, Alfredo Deza)
> > * ceph-volume expand on the LVM API to create multiple LVs at different sizes (issue#24020, pr#23395, Alfredo Deza)
> > * ceph-volume lvm.activate conditional mon-config on prime-osd-dir (issue#25216, pr#23397, Alfredo Deza)
> > * ceph-volume lvm.batch remove non-existent sys_api property (issue#34310, pr#23811, Alfredo Deza)
> > * ceph-volume lvm.listing only include devices if they exist (issue#24952, pr#23150, Alfredo Deza)
> > * ceph-volume: process.call with stdin in Python 3 fix (issue#24993, pr#23238, Alfredo Deza)
> > * ceph-volume: PVolumes.get() should return one PV when using name or uuid (issue#24784, pr#23329, Andrew Schoen)
> > * ceph-volume: refuse to zap mapper devices (issue#24504, pr#23374, Andrew Schoen)
> > * ceph-volume: tests.functional inherit SSH_ARGS from ansible (issue#34311, pr#23813, Alfredo Deza)
> > * ceph-volume tests/functional run lvm list after OSD provisioning (issue#24961, pr#23147, Alfredo Deza)
> > * ceph-volume: unmount lvs correctly before zapping (issue#24796, pr#23128, Andrew Schoen)
> > * ceph-volume: update batch documentation to explain filestore strategies (issue#34309, pr#23825, Alfredo Deza)
> > * change default filestore_merge_threshold to -10 (issue#24686, pr#22814, Douglas Fuller)
> > * client: add inst to asok status output (issue#24724, pr#23107, Patrick Donnelly)
> > * client: fixup parallel calls to ceph_ll_lookup_inode() in NFS FASL (issue#22683, pr#23012, huanwen ren)
> > * client: increase verbosity level for log messages in helper methods (issue#21014, pr#23014, Rishabh Dave)
> > * client:  update inode fields according to issued caps (issue#24269, pr#22783, "Yan, Zheng")
> > * common: Abort in OSDMap::decode() during qa/standalone/erasure-code/test-erasure-eio.sh (issue#23492, pr#23025, Sage Weil)
> > * common/DecayCounter: set last_decay to current time when decoding decay counter (issue#24440, pr#22779, Zhi Zhang)
> > * doc: ceph-bluestore-tool manpage not getting rendered correctly (issue#24800, pr#23177, Nathan Cutler)
> > * filestore: add pgid in filestore pg dir split log message (issue#24878, pr#23454, Vikhyat Umrao)
> > * let "ceph status" use base 10 when printing numbers not sizes (issue#22095, pr#22680, Jan Fajerski, Kefu Chai)
> > * librados: fix buffer overflow for aio_exec python binding (issue#23964, pr#22708, Aleksei Gutikov)
> > * librbd: force 'invalid object map' flag on-disk update (issue#24434, pr#22753, Mykola Golub)
> > * librbd: utilize the journal disabled policy when removing images (issue#23512, pr#23595, Jason Dillaman)
> > * mds: don't report slow request for blocked filelock request (issue#22428, pr#22782, "Yan, Zheng")
> > * mds: dump recent events on respawn (issue#24853, pr#23213, Patrick Donnelly)
> > * mds: handle discontinuous mdsmap (issue#24856, pr#23169, "Yan, Zheng")
> > * mds: increase debug level for dropped client cap msg (issue#24855, pr#23214, Patrick Donnelly)
> > * mds: low wrlock efficiency due to dirfrags traversal (issue#24467, pr#22885, Xuehan Xu)
> > * mds: print mdsmap processed at low debug level (issue#24852, pr#23212, Patrick Donnelly)
> > * mds: scrub doesn't always return JSON results (issue#23958, pr#23222, Venky Shankar)
> > * mds: unset deleted vars in shutdown_pass (issue#23766, pr#23015, Patrick Donnelly)
> > * mgr: add units to performance counters (issue#22747, pr#23266, Ernesto Puerta, Rubab Syed)
> > * mgr: ceph osd safe-to-destroy crashes the mgr (issue#23249, pr#22806, Sage Weil)
> > * mgr/MgrClient: Protect daemon_health_metrics (issue#23352, pr#23459, Kjetil Joergensen, Brad Hubbard)
> > * mon: Add option to view IP addresses of clients in output of 'ceph features' (issue#21315, pr#22773, Paul Emmerich)
> > * mon/HealthMonitor: do not send MMonHealthChecks to pre-luminous mon (issue#24481, pr#22655, Sage Weil)
> > * os/bluestore: fix flush_commit locking (issue#21480, pr#22904, Sage Weil)
> > * os/bluestore: fix incomplete faulty range marking when doing compression (issue#21480, pr#22909, Igor Fedotov)
> > * os/bluestore: fix races on SharedBlob::coll in ~SharedBlob (issue#24859, pr#23064, Radoslaw Zarzynski)
> > * osdc: Fix the wrong BufferHead offset (issue#24484, pr#22865, dongdong tao)
> > * osd: do_sparse_read(): Verify checksum earlier so we will try to repair and missed backport (issue#24875, pr#23379, xie xingguo, David Zafman)
> > * osd: eternal stuck PG in 'unfound_recovery' (issue#24373, pr#22546, Sage Weil)
> > * osd: may get empty info at recovery (issue#24588, pr#22862, Sage Weil)
> > * osd/OSDMap: CRUSH_TUNABLES5 added in jewel, not kraken (issue#25057, pr#23227, Sage Weil)
> > * osd/Session: fix invalid iterator dereference in Sessoin::have_backoff() (issue#24486, pr#22729, Sage Weil)
> > * pjd: cd: too many arguments (issue#24307, pr#22883, Neha Ojha)
> > * PurgeQueue sometimes ignores Journaler errors (issue#24533, pr#22811, John Spray)
> > * pybind: pybind/mgr/mgr_module: make rados handle available to all modules (issue#24788, issue#25102, pr#23235, Ernesto Puerta, Sage Weil)
> > * pybind: Python bindings use iteritems method which is not Python 3 compatible (issue#24779, pr#22918, Nathan Cutler, Kefu Chai)
> > * pybind: rados.pyx: make all exceptions accept keyword arguments (issue#24033, pr#22979, Rishabh Dave)
> > * rbd: fix issues in IEC unit handling (issue#26927, issue#26928, pr#23776, Jason Dillaman)
> > * repeated eviction of idle client until some IO happens (issue#24052, pr#22780, "Yan, Zheng")
> > * rgw: add curl_low_speed_limit and curl_low_speed_time config to avoid the thread hangs in data sync (issue#25019, pr#23144, Mark Kogan, Zhang Shaowen)
> > * rgw: add unit test for cls bi list command (issue#24483, pr#22846, Orit Wasserman, Xinying Song)
> > * rgw: do not ignore EEXIST in RGWPutObj::execute (issue#22790, pr#23207, Matt Benjamin)
> > * rgw: fail to recover index from crash luminous backport (issue#24640, issue#24280, pr#23130, Tianshan Qu)
> > * rgw: fix gc may cause a large number of read traffic (issue#24767, pr#22984, Xin Liao)
> > * rgw: fix the bug of radowgw-admin zonegroup set requires realm (issue#21583, pr#22767, lvshanchun)
> > * rgw: have a configurable authentication order (issue#23089, pr#23501, Abhishek Lekshmanan)
> > * rgw: index complete miss zones_trace set (issue#24590, pr#22820, Tianshan Qu)
> > * rgw: Invalid Access-Control-Request-Request may bypass validate_cors_rule_method (issue#24223, pr#22934, Jeegn Chen)
> > * rgw: meta and data notify thread miss stop cr manager (issue#24589, pr#22822, Tianshan Qu)
> > * rgw-multisite: endless loop in RGWBucketShardIncrementalSyncCR (issue#24603, pr#22817, cfanz)
> > * rgw performance regression for luminous 12.2.4 (issue#23379, pr#22930, Mark Kogan)
> > * rgw: radogw-admin reshard status command should print text for reshar… (issue#23257, pr#23019, Orit Wasserman)
> > * rgw: "radosgw-admin objects expire" always returns ok even if the pro… (issue#24592, pr#23000, Zhang Shaowen)
> > * rgw: require --yes-i-really-mean-it to run radosgw-admin orphans find (issue#24146, pr#22985, Matt Benjamin)
> > * rgw: REST admin metadata API paging failure bucket & bucket.instance: InvalidArgument (issue#23099, pr#22932, Matt Benjamin)
> > * rgw: set cr state if aio_read err return in RGWCloneMetaLogCoroutine (issue#24566, pr#22942, Tianshan Qu)
> > * spdk: fix ceph-osd crash when activate SPDK (issue#24371, pr#22686, tone-zhang)
> > * tools/ceph-objectstore-tool: split filestore directories offline to target hash level (issue#21366, pr#23418, Zhi Zhang)
> >
> > Getting ceph:
> > ------------
> > * Git at git://github.com/ceph/ceph.git
> > * Tarball at http://download.ceph.com/tarballs/ceph-12.2.8.tar.gz
> > * For packages, see http://docs.ceph.com/docs/master/install/get-packages/
> > * Release git sha1: ae699615bac534ea496ee965ac6192cb7e0e07c0
> >
> >
> > --
> > Abhishek Lekshmanan
> > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton,
> > HRB 21284 (AG Nürnberg)
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux