Re: 10.2.4 Jewel released -- IMPORTANT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,

Please hold off on upgrading to this release.  It triggers a bug in 
SimpleMessenger that causes threads for broken connections to spin, eating 
CPU.

We're making sure we understand the root cause and preparing a fix.

Thanks!
sage




On Wed, 7 Dec 2016, Abhishek L wrote:

> This point release fixes several important bugs in RBD mirroring, RGW
> multi-site, CephFS, and RADOS.
> 
> We recommend that all v10.2.x users upgrade. Also note the following when upgrading from hammer
> 
> Upgrading from hammer
> ---------------------
> 
> When the last hammer OSD in a cluster containing jewel MONs is
> upgraded to jewel, as of 10.2.4 the jewel MONs will issue this
> warning: "all OSDs are running jewel or later but the
> 'require_jewel_osds' osdmap flag is not set" and change the
> cluster health status to HEALTH_WARN.
> 
> This is a signal for the admin to do "ceph osd set require_jewel_osds" - by
> doing this, the upgrade path is complete and no more pre-Jewel OSDs may be added
> to the cluster.
> 
> 
> Notable Changes
> ---------------
> * build/ops: aarch64: Compiler-based detection of crc32 extended CPU type is broken (issue#17516 , pr#11492 , Alexander Graf)
> * build/ops: allow building RGW with LDAP disabled (issue#17312 , pr#11478 , Daniel Gryniewicz)
> * build/ops: backport 'logrotate: Run as root/ceph' (issue#17381 , pr#11201 , Boris Ranto)
> * build/ops: ceph installs stuff in %_udevrulesdir but does not own that directory (issue#16949 , pr#10862 , Nathan Cutler)
> * build/ops: ceph-osd-prestart.sh fails confusingly when data directory does not exist (issue#17091 , pr#10812 , Nathan Cutler)
> * build/ops: disable LTTng-UST in openSUSE builds (issue#16937 , pr#10794 , Michel Normand)
> * build/ops: i386 tarball gitbuilder failure on master (issue#16398 , pr#10855 , Vikhyat Umrao, Kefu Chai)
> * build/ops: include more files in "make dist" tarball (issue#17560 , pr#11431 , Ken Dreyer)
> * build/ops: incorrect value of CINIT_FLAG_DEFER_DROP_PRIVILEGES (issue#16663 , pr#10278 , Casey Bodley)
> * build/ops: remove SYSTEMD_RUN from initscript (issue#7627 , issue#16441 , issue#16440 , pr#9872 , Vladislav Odintsov)
> * build/ops: systemd: add install section to rbdmap.service file (issue#17541 , pr#11158 , Jelle vd Kooij)
> * common: Enable/Disable of features is allowed even the features are already enabled/disabled (issue#16079 , pr#11460 , Lu Shi)
> * common: Log.cc: Assign LOG_INFO priority to syslog calls (issue#15808 , pr#11231 , Brad Hubbard)
> * common: Proxied operations shouldn't result in error messages if replayed (issue#16130 , pr#11461 , Vikhyat Umrao)
> * common: Request exclusive lock if owner sends -ENOTSUPP for proxied maintenance op (issue#16171 , pr#10784 , Jason Dillaman)
> * common: msgr/async: Messenger thread long time lock hold risk (issue#15758 , pr#10761 , Wei Jin)
> * doc: fix description for rsize and rasize (issue#17357 , pr#11171 , Andreas Gerstmayr)
> * filestore: can get stuck in an unbounded loop during scrub (issue#17859 , pr#12001 , Sage Weil)
> * fs: Failure in snaptest-git-ceph.sh (issue#17172 , pr#11419 , Yan, Zheng)
> * fs: Log path as well as ino when detecting metadata damage (issue#16973 , pr#11418 , John Spray)
> * fs: client: FAILED assert(root_ancestor->qtree == __null) (issue#16066 , issue#16067 , pr#10107 , Yan, Zheng)
> * fs: client: add missing client_lock for get_root (issue#17197 , pr#10921 , Patrick Donnelly)
> * fs: client: fix shutdown with open inodes (issue#16764 , pr#10958 , John Spray)
> * fs: client: nlink count is not maintained correctly (issue#16668 , pr#10877 , Jeff Layton)
> * fs: multimds: allow_multimds not required when max_mds is set in ceph.conf at startup (issue#17105 , pr#10997 , Patrick Donnelly)
> * librados: memory leaks from ceph::crypto (WITH_NSS) (issue#17205 , pr#11409 , Casey Bodley)
> * librados: modify Pipe::connect() to return the error code (issue#15308 , pr#11193 , Vikhyat Umrao)
> * librados: remove new setxattr overload to avoid breaking the C++ ABI (issue#18058 , pr#12207 , Josh Durgin)
> * librbd: cannot disable journaling or remove non-mirrored, non-primary image (issue#16740 , pr#11337 , Jason Dillaman)
> * librbd: discard after write can result in assertion failure (issue#17695 , pr#11644 , Jason Dillaman)
> * librbd::Operations: update notification failed: (2) No such file or directory (issue#17549 , pr#11420 , Jason Dillaman)
> * mds: Crash in Client::_invalidate_kernel_dcache when reconnecting during unmount (issue#17253 , pr#11414 , Yan, Zheng)
> * mds: Duplicate damage table entries (issue#17173 , pr#11412 , John Spray)
> * mds: Failure in dirfrag.sh (issue#17286 , pr#11416 , Yan, Zheng)
> * mds: Failure in snaptest-git-ceph.sh (issue#17271 , pr#11415 , Yan, Zheng)
> * mon: Ceph Status - Segmentation Fault (issue#16266 , pr#11408 , Brad Hubbard)
> * mon: Display full flag in ceph status if full flag is set (issue#15809 , pr#9388 , Vikhyat Umrao)
> * mon: Error EINVAL: removing mon.a at 172.21.15.16:6789/0, there will be 1 monitors (issue#17725 , pr#12267 , Joao Eduardo Luis)
> * mon: OSDMonitor: only reject MOSDBoot based on up_from if inst matches (issue#17899 , pr#12067 , Samuel Just)
> * mon: OSDMonitor: Missing nearfull flag set (issue#17390 , pr#11272 , Igor Podoski)
> * mon: Upgrading 0.94.6 -> 0.94.9 saturating mon node networking (issue#17365 , issue#17386 , pr#11679 , Sage Weil, xie xingguo)
> * mon: ceph mon Segmentation fault after set crush_ruleset ceph 10.2.2 (issue#16653 , pr#10861 , song baisen)
> * mon: crash: crush/CrushWrapper.h: 940: FAILED assert(successful_detach) (issue#16525 , pr#10496 , Kefu Chai)
> * mon: don't crash on invalid standby_for_fscid (issue#17466 , pr#11389 , John Spray)
> * mon: fix missing osd metadata (again) (issue#17685 , pr#11642 , John Spray)
> * mon: osdmonitor: decouple adjust_heartbeat_grace and min_down_reporters (issue#17055 , pr#10757 , Zengran Zhang)
> * mon: the %USED of ceph df is wrong (issue#16933 , pr#10860 , Kefu Chai)
> * osd: condition OSDMap encoding on features (issue#18015 , pr#12167 , Sage Weil)
> * osd: PG::_update_calc_stats wrong for CRUSH_ITEM_NONE up set items (issue#16998 , pr#10883 , Samuel Just)
> * osd: PG::choose_acting valgrind error or ./common/hobject.h: 182: FAILED assert(!max || (*this == hobject_t(hobject_t::get_max()))) (issue#13967 , pr#10885 , Tao Chang)
> * osd: Potential crash during journal::Replay shut down (issue#16433 , pr#10645 , Jason Dillaman)
> * osd: add peer_addr in heartbeat_check log message (issue#15762 , pr#9739 , Vikhyat Umrao, Sage Weil)
> * osd: adjust scrub boundary to object without SnapSet (issue#17470 , pr#11311 , Samuel Just)
> * osd: ceph osd df does not show summarized info correctly if one or more OSDs are out (issue#16706 , pr#10759 , xie xingguo)
> * osd: journal: do not prematurely flag object recorder as closed (issue#17590 , pr#11634 , Jason Dillaman)
> * osd: mark_all_unfound_lost() leaves unapplied changes (issue#16156 , pr#10886 , Samuel Just)
> * osd: segfault in ObjectCacher::FlusherThread (issue#16610 , pr#10864 , Yan, Zheng)
> * qa: remove EnumerateObjects from librados upgrade tests (pr#11728 , Josh Durgin)
> * rbd: Disabling pool mirror mode with registered peers results orphaned mirrored images (issue#16984 , pr#10857 , Jason Dillaman)
> * rbd: ImageWatcher: use after free within C_UnwatchAndFlush (issue#17289 , issue#17254 , pr#11466 , Jason Dillaman)
> * rbd: Prevent the creation of a clone from a non-primary mirrored image (issue#16449 , pr#10650 , Mykola Golub)
> * rbd: RBD should restrict mirror enable/disable actions on parents/clones (issue#16056 , pr#11459 , zhuangzeqiang)
> * rbd: TestJournalReplay: sporadic assert(m_state == STATE_READY || m_state == STATE_STOPPING) failure (issue#17566 , pr#11590 , Jason Dillaman)
> * rbd: bench io-size should not be larger than image size (issue#16967 , pr#10796 , Jason Dillaman)
> * rbd: ceph 10.2.2 rbd status on image format 2 returns (2) No such file or directory (issue#16887 , pr#10652 , Jason Dillaman)
> * rbd: helgrind: TestLibRBD.TestIOPP potential deadlock closing an image with read-ahead enabled (issue#17198 , pr#11463 , Jason Dillaman)
> * rbd: image.stat() call in librbdpy fails sometimes (issue#17310 , pr#11464 , Jason Dillaman)
> * rbd: krbd qa scripts and concurrent.sh test fix (issue#17223 , pr#11018 , Ilya Dryomov)
> * rbd: krbd-related CLI patches (issue#17554 , pr#11400 , Ilya Dryomov)
> * rbd: mirror: improve resiliency of stress test case (issue#16855 , issue#16555 , issue#14738 , issue#15259 , issue#17446 , issue#17355 , issue#16538 , issue#16974 , issue#17283 , issue#17317 , issue#17416 , issue#16227 , pr#11433 , Mykola Golub, Ricardo Dias, Jason Dillaman)
> * rbd: rbd-nbd IO hang (issue#16921 , pr#11467 , Jason Dillaman)
> * rbd: update_features API needs to support backwards/forward compatibility (issue#17330 , pr#11462 , Jason Dillaman)
> * rgw: COPY broke multipart files uploaded under dumpling (issue#16435 , pr#10866 , Yehuda Sadeh)
> * rgw: Config parameter rgw keystone make new tenants in radosgw multitenancy does not work (issue#17293 , pr#11473 , SirishaGuduru)
> * rgw: Do not archive metadata by default (issue#17256 , pr#11321 , Pavan Rallabhandi, Matt Benjamin)
> * rgw: ERROR: got unexpected error when trying to read object: -2 (issue#17111 , pr#11472 , Yang Honggang)
> * rgw: Modification for TEST S3 ACCESS section in INSTALL CEPH OBJECT GATEWAY page (issue#15603 , pr#11475 , la-sguduru)
> * rgw: RGW loses realm/period/zonegroup/zone data: period overwritten if somewhere in the cluster is still running Hammer (issue#17371 , pr#11519 , Orit Wasserman)
> * rgw: RGWDataSyncCR fails on errors from RGWListBucketIndexesCR (issue#17073 , pr#11330 , Casey Bodley)
> * rgw: S3 object versioning fails when applied on a non-master zone (issue#16494 , pr#11367 , Yehuda Sadeh)
> * rgw: add orphan options to radosgw-admin --help and man page (issue#17281 , issue#17280 , pr#11139 , Ken Dreyer, Thomas Serlin)
> * rgw: back off bucket sync on failures, don't store marker (issue#16742 , pr#11021 , Yehuda Sadeh)
> * rgw: combined LDAP backports (issue#17544 , issue#17185 , pr#11332 , Harald Klein, Matt Benjamin)
> * rgw: cors auto memleak (issue#16564 , pr#10656 , Yan Jun)
> * rgw: default quota fixes (issue#16410 , pr#10832 , Pavan Rallabhandi, Daniel Gryniewicz)
> * rgw: doc: description of multipart part entity is wrong (issue#17504 , pr#11342 , weiqiaomiao)
> * rgw: don't loop forever when reading data from 0 sized segment. (issue#17692 , pr#11626 , Marcus Watts)
> * rgw: fix put_acls for objects starting and ending with underscore (issue#17625 , pr#11669 , Orit Wasserman)
> * rgw: fix regression with handling double underscore (issue#17443 , issue#16856 , pr#11563 , Yehuda Sadeh, Orit Wasserman)
> * rgw: handle empty POST condition (issue#17635 , pr#11662 , Yehuda Sadeh)
> * rgw: metadata sync can skip markers for failed/incomplete entries (issue#16759 , pr#10657 , Yehuda Sadeh)
> * rgw: nfs backports (issue#17393 , issue#17311 , issue#17367 , issue#17319 , issue#17321 , issue#17322 , issue#17323 , issue#17325 , issue#17326 , issue#17327 , pr#11335 , Min Chen, Yan Jun, Weibing Zhang, Matt Benjamin)
> * rgw: period commit loses zonegroup changes: region_map converted repeatedly (issue#17051 , pr#10890 , Casey Bodley)
> * rgw: period commit return error when the current period has a zonegroup which doesn't have a master zone (issue#17110 , pr#10867 , weiqiaomiao)
> * rgw: radosgw daemon core when reopen logs (issue#17036 , pr#10868 , weiqiaomiao)
> * rgw: rgw file uses too much CPU in gc/idle thread (issue#16976 , pr#10889 , Matt Benjamin)
> * rgw: s3tests-test-readwrite failing with 500 (issue#16930 , pr#11471 , Yehuda Sadeh)
> * rgw: upgrade from old multisite to new multisite fails (issue#16751 , pr#10891 , Orit Wasserman)
> * rgw:response information is error when geting token of swift account (issue#15195 , pr#11474 , Qiankun Zheng)
> * rgw:user email can modify to empty when it has values (issue#13286 , pr#11469 , Yehuda Sadeh, Weijun Duan)
> * tests: ceph-disk must ignore debug monc (issue#17607 , pr#11548 , Loic Dachary)
> * tests: fix TestClsRbd.mirror_image failure in upgrade:jewel-x-master-distro-basic-vps (issue#16529 , pr#10888 , Jason Dillaman)
> * tests: scsi_debug fails /dev/disk/by-partuuid (issue#17100 , pr#11411 , Loic Dachary)
> * tests: test/ceph_test_msgr: do not use Message::middle for holding transient… (issue#17365 , issue#17728 , issue#16955 , pr#11742 , Haomai Wang, Kefu Chai, Michal Jarzabek, Sage Weil)
> * tools: Missing comma in ceph-create-keys causes concatenation of arguments (issue#17815 , pr#11822 , Patrick Donnelly)
> * tools: add a tool to rebuild mon store from OSD (issue#17179 , issue#17400 , pr#11126 , Kefu Chai, xie xingguo)
> * tools: ceph-create-keys: sometimes blocks forever if mds allow is set (issue#16255 , pr#11417 , John Spray)
> * tools: ceph-disk should timeout when a lock cannot be acquired (issue#16580 , pr#10758 , Loic Dachary)
> * tools: ceph-disk: expected systemd unit failures are confusing (issue#15990 , pr#10884 , Boris Ranto)
> * tools: ceph-disk: using a regular file as a journal fails (issue#16280 , issue#17662 , pr#11657 , Jayashree Candadai, Anirudha Bose, Loic Dachary, Shylesh Kumar)
> * tools: ceph-objectstore-tool crashes if --journal-path <a-directory> (issue#17307 , pr#11407 , Kefu Chai)
> * tools: ceph-objectstore-tool: add a way to split filestore directories offline (issue#17220 , pr#11252 , Josh Durgin)
> * tools: ceph-post-file: use new ssh key (issue#14267 , pr#11746 , David Galloway)
> 
> For more detailed information refer to the complete changelog[1] and the
> release notes[2]
> 
> Getting Ceph
> ------------
> 
> * Git at git://github.com/ceph/ceph.git
> * Tarball at http://download.ceph.com/tarballs/ceph-10.2.4.tar.gz
> * For packages, see http://ceph.com/docs/master/install/get-packages
> * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy
> 
> [1]: http://docs.ceph.com/docs/master/_downloads/v10.2.4.txt
> [2]: http://docs.ceph.com/docs/master/release-notes/#v10-2-4-jewel
> 
> Best,
> --
> Abhishek Lekshmanan
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux