Re: To backport or not to backport

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 4, 2019 at 8:00 AM Stefan Kooman <stefan@xxxxxx> wrote:
Hi,

Now the release cadence has been set, it's time for another discussion
:-).

During Ceph day NL we had a panel q/a [1]. One of the things that was
discussed were backports. Occasionally users will ask for backports of
functionality in newer releases to older releases (that are still in
support).

Ceph is quite a unique project in the sense that new functionality gets
backported to older releases. Sometimes even functionality gets changed
in the lifetime of a release. I can recall "ceph-volume" change to LVM
in the beginning of the Luminous release. While backports can enrich the
user experience of a ceph operator, it's not without risks. There have
been several issues with "incomplete" backports and or unforeseen
circumstances that had the reverse effect: downtime of (part of) ceph
services. The ones that come to my mind are:

- MDS (cephfs damaged)  mimic backport (13.2.2)
- RADOS (pg log hard limit) luminous / mimic backport (12.2.8 / 13.2.2)

I would like to define a simple rule of when to backport:

- Only backport fixes that do not introduce new functionality, but addresses
  (impaired) functionality already present in the release.

Example of, IMHO, a backport that matches the backport criteria was the
"bitmap_allocator" fix. It fixed a real problem, not some corner case.
Don't get me wrong here, it is important to catch corner cases, but it
should not put the majority of clusters at risk.

The time and effort that might be saved with this approach can indeed be
spend in one of the new focus areas Sage mentioned during his keynote
talk at Cephalocon Barcelona: quality. Quality of the backports that are
needed, improved testing, especially for upgrades to newer releases. If
upgrades are seemless, people are more willing to upgrade, because hey,
it just works(tm). Upgrades should be boring.

How many clusters (not nautilus ;-)) are running with "bitmap_allocator" or
with the pglog_hardlimit enabled? If a new feature is not enabled by
default and it's unclear how "stable" it is to use, operators tend to not
enable it, defeating the purpose of the backport.

Backporting fixes to older releases can be considered a "business
opportunity" for the likes of Red Hat, SUSE, Fujitsu, etc. Especially
for users that want a system that "keeps on running forever" and never
needs "dangerous" updates.

This is my view on the matter, please let me know what you think of
this.

Gr. Stefan

P.s. Just to make things clear: this thread is in _no way_ intended to pick on
anybody.


[1]: https://pad.ceph.com/p/ceph-day-nl-2019-panel

I prefer a released version to be fairly static and not have new features introduced, only bug fixes. For one, I'd prefer not to have to read the release notes to figure out how dangerous a "bug-fix" release should be. The fixes in a released version should be tested extremely well so it "Just Works".

By not back porting new features, I think it gives more time to bake the features into the new version and frees up the developers to focus on the forward direction of the product. If I want a new feature, then the burden is on me to test a new version and verify that it works in my environment (or vendors), not the developers.

I wholeheartedly support only bug fixes and security fixes going into released versions.
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux