Yesterday at the Java SIG meeting we started discussion about future of packaging of Maven artifacts in Fedora (i.e. the new guidelines). We didn't get very far with discussion about the guidelines and the main reason for that seems to be that the new way of packaging Maven artifacts described in the guidelines causes effective POM files to be installed in place of raw POM files. Let me begin with my apologies for not communicating this matter earlier, but I didn't consider this change as controversial and I didn't think that anyone would have any problems with it. Because there were many doubts (and even accusations) I'll try to describe in detail why in my opinion this change is needed. The current problem =================== Maven has advanced and powerful dependency mechanism. It has such features as dependency scopes, exclusions, and optional dependencies (if you want you can get more detailed information about them in Maven documentation [5], describing them here wouldn't make much sense). On the other hand RPM has much simpler dependency mechanism. You can either require some package along with all its requirements or not require it at all. There is no way to express different dependency scopes in RPM packages. Let me give you a simple example. A and B are JAR artifacts, X is a POM artifact (aka parent POM), P is some Maven plugin. artifact B requires artifact A (scope: compile) artifact A inherits from artifact X artifact X requires plugin P And some ASCII-art graph (feel free to skip it if you don't like it of if it's unreadable for you, all the information represented in the graph is present in the text too). ,---. ,---. | X |--->| P | `---' `---' ^ | ,---. ,---. | B |--->| A | `---' `---' In this case there is no wonder that package B should require package A because its needed even during runtime. Question whether A should require X or if X should require P is more problematic. There are 4 possibilities in this case which I'll try to describe in more detail. Case 1. A requires X and X requires P. In this case if you want to install package B you'll have to install P as a transitive dependency. All dependencies of P will be installed too. For example Maven plugins require Maven, but possibly much more. This solution is not good because it causes many unneeded packages to be installed. Case 2. A does not require X (and X does require P). Now you get correct runtime dependencies from perspective of package B. But to build package B you'll need to manually add BuildRequires: X because Maven would otherwise fail to resolve artifact X which is referenced from POM A. This solution is not good because (1) you'd need to manually specify BuildRequires: X when in spec file of package B and (2) because plugin P is installed when building package B even that this plugin is not needed. Case 3: X does not require P (but A does require X). Now when installing B you get only a single uneeded dependency (i.e. package X). We could live with that, especially because X is a small POM-only package. Building B is simple - you don't need to specify extra BuildRequires. However in this case to build package A you need to add BuildRequires: P in spec file of package A. If package X changes (for example a plugin is added or removed) then you need to change package A too (to add or remove respective BuildRequires). This is error prone and tedious. Case 4 in which X does not require P and X does require P is just a combination of cases 2 and 3. It adds no benefits and combines their disadvantages. For this reason this case is not acceptable. To summarize: cases 1 and 4 were unacceptable. 2 and 3 could work, but have major disadvantages. These disadvantages get much more problematic as number of involved packages increases. Both cases require maintaining BuildRequires in different places from where they arising. When updating a single package one would need to investigate if any of related packages need updating. If you don't update related packages then you get inaccurate requires (which itself leads to build failures, dependency bloat, or two at the same time). I tried to improve the situation of Maven packaging in Fedora and I have thought of many different solutions. I can't really explain you in one email (which gets too long anyway) all the possibilities I considered in the whole 6-month process of designing and implementing XMvn. Explaining at least some of them would require knowledge about Maven internals. Instead I'll propose a solution which in my judgement is the best and try to show that it's better than current situation. [5] http://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html Proposed solution - effective POMs ================================== First let me explain what I mean by "effective POM". Effective POM is basically a POM with included metadata from all ancestor POMs (parent POM, parent of parent and so on). Effective POMs don't need to explicitly declare parent POM because they have all settings copied from ancestors. Inheriting from parent would be a NOP. So what happens in the previous example if package A installed effective POM instead of raw (upstream) POM? The most important consequence is that we can simplify requires: ,---. ,---. | X |--->| P | `---' `---' ,---. ,---. | B |--->| A | `---' `---' All requires are accurate now (minimal and correct). 1) Package A no longer needs to require X because POM A is effective and doesn't reference POM X. 2) Installing binary packages (A or B) doesn't bring any unneeded dependencies on parent POM packages or Maven plugins. 3) Building package B doesn't bring plugin P (which is not needed to build B). 4) Building package A automatically brings plugin P (which is needed to build A). Plugin P is installed automatically because A BuildRequires X, which pulls in P. 5) All packages declare Requires or BuildRequires for stuff that are specified only in their POMs. With this solution you don't *ever* need to declare Requires or BuildRequires on dependencies added in POMs from other packages. Let me highlight two things: 1) With this solution effective POMs need to be installed only in packages shipping binaries (like A). POM-only packages (like X) still install raw POMs. 2) As I showed, the noticeable improvement in dependencies is a direct consequence of installing effective POM instead of raw POM in package A. Any proposal like "let's keep XMvn but revert installing effective POMs" would nullify the benefit gained - pretty much the whole reason of using XMvn and automated dependency generation in the first place. Some time ago automated Maven and OSGi provides and requires generation was implemented. It was fully enabled for OSGi (mainly because OSGi dependencies are much simpler than Maven), but auto-requires for Maven artifacts were not enabled. They were disabled because we didn't install effective POMs and without that generated automatic requires wouldn't be sane (as I showed above). Effective POMs - evil or not ============================ There were several matters related to effective POMs touched at the meeting (and after). I'll try to comment on them. 1. "Effective POMs are bundling other POMs and because of that they need to be forbidden in Fedora." Explanation: The only thing that needs to be copied from parent POM is simple metadata. To be more specific - groupId, dependency artifact names and extension artifact names. No code is bundled or anything like that. Only simple, small metadata that would otherwise have to be included in form of package Requires or BuildRequires manually in order to get things working. 2. "Effective POMs are unreadable." Explanation: Effective POMs aren't supposed to be read by people, but for machine processing. You can install raw POMs next to effective POMs if you feel there is a need to have them for people to read (as a form of documentation). This is as if you said that we should install C code and interpret it instead of installing machine code because the second is unreadable. 3. "Using effective POMs breaks compatibility of Fedora system artifact repository with upstream Maven." Explanation: First of all, our repository has different structure from upstream Maven and there is no way to directly use it from upstream Maven. Secondly, effective POMs are valid POMs (not some custom format) and as such they can be parsed and used by unmodified upstream Maven. Installing effective POMs instead of raw POMs doesn't bring any change in terms of compatibility with upstream Maven. 4. "If there is a bug in parent POM then all dependant packages have to be rebuilt to fix the bug." Explanation: If the bug is not about declared in dependencies then dependant packages don't need to be rebuilt because all metadata besides dependencies is meaningless when used by effective POM in installed binary packlage (in future all the meaningless data will be stripped off to reduce POM size and improve readability). But if the bug is in dependencies declared in the POM then dependant packages would have to be rebuilt anyways, no matter if raw or effective POMs are used. Rebuild is needed because if a dependency in parent POM changed then this cange needs to be reflected in updated Requires or BuildRequires of other packages. If you don't update dependant packages then you silently introduce packaging bugs. 5. "If some packages install effective POMs and some raw POMs then dependencies become incorrect." Explanation: That is simply not true. Mixing effective POMs and raw POMs in Fedora could expose *existing* packaging bugs in other packages. There are cases that packages don't declare all of their dependencies but people don't experience that bugs because other package have excessive dependencies that cover missing requires in other packages. Reducing dependencies to minimum creates a possibility that fixing one bug (excessive dependencies) can expose another bug in a different package. Using effective POMs doesn't introduce new bugs by itself. Moreover, having both effective and raw POMs in distribution is hopefully a transitive state and I hope that at some point in future all non-POM packages will install effective POMs. 6. "Installing effective POMs instead of raw POMs is a deviation from upstream." The reason why POM files are installed with Fedora packages is that they are needed to be automatically processed by Maven during build of dependant packages. Internally Maven creates effective POM very soon in the build process and uses this effective POM during the build. Hence installing effective POMs has the same semantics as installing raw POMs. The difference is that our installed POMs will have a bit different structure, possibly won't include all the information (which would be meaningless in that context anyways) and won't be byte-identical to upstream POMs. But again, POMs should be treated as data for machine processing, so different structure is acceptable as long as semantics are the same. It's like in Fedora we install JAR files (or .so files) different (not bit-identical) from upstream, but with the same semantics (the same runtime behaviour; implementing exactly the same algorithm because they are compiled from the same source code). I hope that this explains why installing effective POMs is needed and covers most of concerns about them. If you have any questions or comments, please feel free to comment. Contrary to what some people seem to believe, I do not refuse to listen to any concerns about stuff related to Maven in Fedora and I would appreciate any (constructive) criticism or comments. -- Mikolaj Izdebski -- java-devel mailing list java-devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/java-devel