To backport or not to backport

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Now the release cadence has been set, it's time for another discussion
:-).

During Ceph day NL we had a panel q/a [1]. One of the things that was
discussed were backports. Occasionally users will ask for backports of
functionality in newer releases to older releases (that are still in
support).

Ceph is quite a unique project in the sense that new functionality gets
backported to older releases. Sometimes even functionality gets changed
in the lifetime of a release. I can recall "ceph-volume" change to LVM
in the beginning of the Luminous release. While backports can enrich the
user experience of a ceph operator, it's not without risks. There have
been several issues with "incomplete" backports and or unforeseen
circumstances that had the reverse effect: downtime of (part of) ceph
services. The ones that come to my mind are:

- MDS (cephfs damaged)  mimic backport (13.2.2)
- RADOS (pg log hard limit) luminous / mimic backport (12.2.8 / 13.2.2)

I would like to define a simple rule of when to backport:

- Only backport fixes that do not introduce new functionality, but addresses
  (impaired) functionality already present in the release.

Example of, IMHO, a backport that matches the backport criteria was the
"bitmap_allocator" fix. It fixed a real problem, not some corner case.
Don't get me wrong here, it is important to catch corner cases, but it
should not put the majority of clusters at risk.

The time and effort that might be saved with this approach can indeed be
spend in one of the new focus areas Sage mentioned during his keynote
talk at Cephalocon Barcelona: quality. Quality of the backports that are
needed, improved testing, especially for upgrades to newer releases. If
upgrades are seemless, people are more willing to upgrade, because hey,
it just works(tm). Upgrades should be boring.

How many clusters (not nautilus ;-)) are running with "bitmap_allocator" or
with the pglog_hardlimit enabled? If a new feature is not enabled by
default and it's unclear how "stable" it is to use, operators tend to not
enable it, defeating the purpose of the backport.

Backporting fixes to older releases can be considered a "business
opportunity" for the likes of Red Hat, SUSE, Fujitsu, etc. Especially
for users that want a system that "keeps on running forever" and never
needs "dangerous" updates.

This is my view on the matter, please let me know what you think of
this.

Gr. Stefan

P.s. Just to make things clear: this thread is in _no way_ intended to pick on
anybody. 


[1]: https://pad.ceph.com/p/ceph-day-nl-2019-panel

-- 
| BIT BV  https://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux