Hi everyone, This is the first release candidate for Kraken, the next stable release series. There have been major changes from jewel with many features being added. Please note the upgrade process from jewel, before upgrading. Major Changes from Jewel ------------------------ - *RADOS*: * The new *BlueStore* backend now has a stable disk format and is passing our failure and stress testing. Although the backend is still flagged as experimental, we encourage users to try it out for non-production clusters and non-critical data sets. * RADOS now has experimental support for *overwrites on erasure-coded* pools. Because the disk format and implementation are not yet finalized, there is a special pool option that must be enabled to test the new feature. Enabling this option on a cluster will permanently bar that cluster from being upgraded to future versions. * We now default to the AsyncMessenger (``ms type = async``) instead of the legacy SimpleMessenger. The most noticeable difference is that we now use a fixed sized thread pool for network connections (instead of two threads per socket with SimpleMessenger). * Some OSD failures are now detected almost immediately, whereas previously the heartbeat timeout (which defaults to 20 seconds) had to expire. This prevents IO from blocking for an extended period for failures where the host remains up but the ceph-osd process is no longer running. * There is a new ``ceph-mgr`` daemon. It is currently collocated with the monitors by default, and is not yet used for much, but the basic infrastructure is now in place. * The size of encoded OSDMaps has been reduced. * The OSDs now quiesce scrubbing when recovery or rebalancing is in progress. - *RGW*: * RGW now supports a new zone type that can be used for metadata indexing via Elasticseasrch. * RGW now supports the S3 multipart object copy-part API. * It is possible now to reshard an existing bucket. Note that bucket resharding currently requires that all IO (especially writes) to the specific bucket is quiesced. * RGW now supports data compression for objects. * Civetweb version has been upgraded to 1.8 * The Swift static website API is now supported (S3 support has been added previously). * S3 bucket lifecycle API has been added. Note that currently it only supports object expiration. * Support for custom search filters has been added to the LDAP auth implementation. * Support for NFS version 3 has been added to the RGW NFS gateway. * A Python binding has been created for librgw. - *RBD*: * RBD now supports images stored in an *erasure-coded* RADOS pool using the new (experimental) overwrite support. Images must be created using the new rbd CLI "--data-pool <ec pool>" option to specify the EC pool where the backing data objects are stored. Attempting to create an image directly on an EC pool will not be successful since the image's backing metadata is only supported on a replicated pool. * The rbd-mirror daemon now supports replicating dynamic image feature updates and image metadata key/value pairs from the primary image to the non-primary image. * The number of image snapshots can be optionally restricted to a configurable maximum. * The rbd Python API now supports asynchronous IO operations. - *CephFS*: * libcephfs function definitions have been changed to enable proper uid/gid control. The library version has been increased to reflect the interface change. * Standby replay MDS daemons now consume less memory on workloads doing deletions. * Scrub now repairs backtrace, and populates `damage ls` with discovered errors. * A new `pg_files` subcommand to `cephfs-data-scan` can identify files affected by a damaged or lost RADOS PG. * The false-positive "failing to respond to cache pressure" warnings have been fixed. Upgrading from Jewel -------------------- * All clusters must first be upgraded to Jewel 10.2.z before upgrading to Kraken 11.2.z (or, eventually, Luminous 12.2.z). * The ``sortbitwise`` flag must be set on the Jewel cluster before upgrading to Kraken. The latest Jewel (10.2.4+) releases issue a health warning if the flag is not set, so this is probably already set. If it is not, Kraken OSDs will refuse to start and will print and error message in their log. Upgrading --------- * The list of monitor hosts/addresses for building the monmap can now be obtained from DNS SRV records. The service name used in when querying the DNS is defined in the "mon_dns_srv_name" config option, which defaults to "ceph-mon". * The 'osd class load list' config option is a list of object class names that the OSD is permitted to load (or '*' for all classes). By default it contains all existing in-tree classes for backwards compatibility. * The 'osd class default list' config option is a list of object class names (or '*' for all classes) that clients may invoke having only the '*', 'x', 'class-read', or 'class-write' capabilities. By default it contains all existing in-tree classes for backwards compatibility. Invoking classes not listed in 'osd class default list' requires a capability naming the class (e.g. 'allow class foo'). * The 'rgw rest getusage op compat' config option allows you to dump (or not dump) the description of user stats in the S3 GetUsage API. This option defaults to false. If the value is true, the reponse data for GetUsage looks like:: "stats": { "TotalBytes": 516, "TotalBytesRounded": 1024, "TotalEntries": 1 } If the value is false, the reponse for GetUsage looks as it did before:: { 516, 1024, 1 } * The 'osd out ...' and 'osd in ...' commands now preserve the OSD weight. That is, after marking an OSD out and then in, the weight will be the same as before (instead of being reset to 1.0). Previously the mons would only preserve the weight if the mon automatically marked and OSD out and then in, but not when an admin did so explicitly. * The 'ceph osd perf' command will display 'commit_latency(ms)' and 'apply_latency(ms)'. Previously, the names of these two columns are 'fs_commit_latency(ms)' and 'fs_apply_latency(ms)'. We remove the prefix 'fs_', because they are not filestore specific. * Monitors will no longer allow pools to be removed by default. The setting mon_allow_pool_delete has to be set to true (defaults to false) before they allow pools to be removed. This is a additional safeguard against pools being removed by accident. * If you have manually specified the monitor user rocksdb via the ``mon keyvaluedb = rocksdb`` option, you will need to manually add a file to the mon data directory to preserve this option:: echo rocksdb > /var/lib/ceph/mon/ceph-`hostname`/kv_backend New monitors will now use rocksdb by default, but if that file is not present, existing monitors will use leveldb. The ``mon keyvaluedb`` option now only affects the backend chosen when a monitor is created. * The 'osd crush initial weight' option allows you to specify a CRUSH weight for a newly added OSD. Previously a value of 0 (the default) meant that we should use the size of the OSD's store to weight the new OSD. Now, a value of 0 means it should have a weight of 0, and a negative value (the new default) means we should automatically weight the OSD based on its size. If your configuration file explicitly specifies a value of 0 for this option you will need to change it to a negative value (e.g., -1) to preserve the current behavior. * The `osd crush location` config option is no longer supported. Please update your ceph.conf to use the `crush location` option instead. * The static libraries are no longer included by the debian development packages (lib*-dev) as it is not required per debian packaging policy. The shared (.so) versions are packaged as before. * The libtool pseudo-libraries (.la files) are no longer included by the debian development packages (lib*-dev) as they are not required per https://wiki.debian.org/ReleaseGoals/LAFileRemoval and https://www.debian.org/doc/manuals/maint-guide/advanced.en.html. * The jerasure and shec plugins can now detect SIMD instruction at runtime and no longer need to be explicitly configured for different processors. The following plugins are now deprecated: jerasure_generic, jerasure_sse3, jerasure_sse4, jerasure_neon, shec_generic, shec_sse3, shec_sse4, and shec_neon. If you use any of these plugins directly you will see a warning in the mon log file. Please switch to using just 'jerasure' or 'shec'. * The librados omap get_keys and get_vals operations include a start key and a limit on the number of keys to return. The OSD now imposes a configurable limit on the number of keys and number of total bytes it will respond with, which means that a librados user might get fewer keys than they asked for. This is necessary to prevent careless users from requesting an unreasonable amount of data from the cluster in a single operation. The new limits are configured with `osd_max_omap_entries_per_request`, defaulting to 131,072, and 'osd_max_omap_bytes_per_request', defaulting to 4MB. Due to the really long changelog in this release, please read the detailed feature list here: http://ceph.com/releases/v11-1-0-kraken-released/ The debian and rpm packages are available at the usual locations at http://download.ceph.com/debian-kraken/ and http://download.ceph.com/rpm-kraken respectively. For more details refer below. Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at http://download.ceph.com/tarballs/ceph-11.1.0.tar.gz * For packages, see http://ceph.com/docs/master/install/get-packages * For ceph-deploy, see http://ceph.com/docs/master/install/install-ceph-deploy Best, Abhishek -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html