It's great to see discussion about the future Fedora package SCM again. However, I think that some people (myself included) are getting a little bogged down in the details of how to manage branches in a source code repository or the details of how particular SCMs work. Before deciding on details like this, there's one big question that we need to answer before we can make intelligent decisions about SCMs and how we should use them. The question is, do we want to move away from RPM's philosophy of using pristine sources plus patches to build binary packages? Determining the answer to this question is a fundamental first step before decide “what's next.” Currently, all of the packages in Fedora consist of pristine sources, patches to the pristine sources, and a spec file that contains instructions that tell the builders how to take the pristine source, apply the patches, and output a binary package. All of this package building information is managed in a source code repository. The build system pulls the information out of the package repository to build the binary RPMs. This process has served the Fedora Project well so far. The packages in Fedora, thanks to a lot of hard work by packager maintainers and reviewers – guided by the packaging guidelines – are already very high quality and getting better all of the time. What hasn't been done well so far is the management of the patches that we apply to the sources. As far as the package repository is concerned patches appear out of thin air. This is because patches have not been managed in a central and public manner by the Fedora project (other than storing them as blobs in the package repository). Package maintainers have had to devise their own systems for developing and managing patches, and this has meant that the patches have been developed in an ad-hoc and private manner. This has made life difficult in a number of ways. It's harder for package maintainers to manage patches, both on their own and in collaboration with other package maintainers. It's harder to communicate changes to upstream and downstream developers. Managing patches in a central and public manner will have the following benefits: 1. Make the packager maintainer's work easier by making it easier to forward-port patches as new releases of upstream code become available or to backport bug fixes and security patches as they are developed. It will also be easier for package maintainers to collaborate on patches. Making the the task of managing patches easier will indirectly improve the quality of the packages because it will be easier to update packages when new sources are released upstream or to backport bug or security fixes that have been committed to the upstream SCM but are not yet part of a formal release. 2. Make it easier to communicate changes to upstream developers (so that patches that fix bugs or add features can benefit the wider F/OSS community and the package maintainer doesn't need to maintain the patches indefinitely). Fairly or unfairly, there's the perception that many patches sit in our CVS and never get pushed upstream. Of course, some patches should never get pushed upstream since they represent Fedora-specific policy, but in most cases we'd all be better off if our patches were incorporated upstream. 3. Make it easier for downstream developers (e.g. the RHEL and OLPC engineers or anyone else that repackages Fedora) to add their own customizations and communicate those changes back to Fedora or the upstream developers. Of particular concern here would be making it easier for downstream distributions to apply patches that implement policies specific to those distributions while keeping it easy for them to track changes in the Fedora patches. There are (at least) two different approaches that we could take to managing patches in a central, public manner. One method would keep the traditional package repositories and add separate patch management repositories. This method would preserve the pristine source plus patches philosophy of RPM. Another more radical approach would be to integrate the package and patch management repositories into one. In the separate package and patch management repositories, the package management remains largely the same. The SCM technology used to maintain the package repository might change, but it would remain a collection of pristine source, patches, and spec files. The patch management repository for a package would be different – it would consist of a “vendor” branch that contained the unmodified upstream code and it would contain a number of “patch” branches that represent the patches to the upstream code that appear in the source package repository. Ultimately, every patch that appears in the package repository would have a branch in the patch management repository. Since the development of patches now happens in a central, public manner it's easier to communicate changes upstream, downstream, and within the Fedora community. Doing things in a central, public manner means that we can develop tools and procedures that will make managing patches easier for the package maintainer. If we want to be more radical, we could integrate the package and the patch repositories. Package building would no longer use pristine sources and patches to produce a binary package. Instead, the build system would pull already-patched code out of the repository and build the binary package from there. The advantage to integrating the patch and package repositories would be reducing the package maintainer's work. With separate package repositories and patch repositories the package maintainer has to do some work to export patches from the patch repository and import them into the package repository (I believe that we could develop some tools to minimize the amount of work it takes to export/import patches, but there would always be some manual steps). The primary disadvantage to integrating patch and package management is that we move away from RPM's philosophy of using pristine sources. Pristine sources have been required because it's possible to easily verify that our copy of the sources matches the upstream copy by comparing MD5 or GPG signatures. With an integrated repository where there are no longer pristine sources, it's not possible to verify that our copy of the code matches the upstream copy by comparing signatures on tarballs. Verifying the code integrity is possible with the integrated repository, but it's potentially much more difficult. Another disadvantage of integrated management is that unless the package maintainer disciplines himself/herself it can become difficult to separate changes to the code that were done to implement Fedora-specific policies (changing defaults in configuration files, moving files around to suit the FHS, etc.) from changes that were done to fix bugs or security problems (and thus might be of interest to upstream developers). Guidelines on how to manage vendor, patch, and policy branches will help maintain that discipline (but vigilance will be necessary). So would moving to an integrated package and patch repository be worth it? It's hard to say. Some of the predecessors of RPM used modified sources to build packages – one of the complaints about those early package management systems was that it was hard to keep track of local changes to the code. However, with the advantage of modern source code management systems keeping track of changes to the code shouldn't be an issue. In either case we need to get the development and maintenance of patches “out in the open” and that means having patches developed in central, public repositories.
Attachment:
signature.asc
Description: This is a digitally signed message part
-- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list