Git and Maven

There was a recent comment to a bug I posted in the Maven Git SCM Provider that triggered some thoughts. The comment was:

“GIT is a distributed SCM. There IS NO CENTRAL repository. Accept it.

Doing a push during the release process is counter to the GIT model.”

In general, the discussions around that bug have been quite interesting and very different from what I expected when I posted it. My reason for calling it a bug was that an unqualified ‘push‘ tries to push everything in your local git repository to the origin repository. That can fail for some branch that you’ve not kept up to date even if it is a legal operation for the branch that you’re currently doing a release of. Typically, that other branch has moved a bit, so your version is a couple of commits behind. A push in that state will abort the maven release process and leave you with some pretty tricky cleaning up to do (edit: Marta has posted about how to fix that). A lot of people commenting on the bug have made comments about how Git is distributed and therefore push shouldn’t be done at all, or be made optional.

I think that the issue here is that there is an impedance mismatch between Git and Maven. While Git is a distributed version control system – that of course also supports a centralised model perfectly well – the Maven model is fundamentally a centralised one. This is one case where the two models conflict, and my opinion is that the push should indeed happen, just in a way that is less likely to break. The push should happen because when doing a Maven release, supporting Maven’s centralised model is more important than supporting Git’s distributed model.

The main reason why Maven needs to be centralised is the way that artifact versions are managed. If releasing can be done by different people from local repositories without any central coordination, there is a big risk of different people creating artifact versions that are not the same. The act of creating a Maven release is in fact saying that “This binary package is version 2.1 of this artifact, and it will never change”. There should never be two versions of 2.1. Git of course gets around this problem using hashes of the things it version controls instead of sequential numbers, and if two things are identical, they will have the same hash code = the same version number. Maven produces artifacts on a higher conceptual level, where sequential version numbers are important, so there needs to be a central location that determines what is the next version number to use and provides a ‘master’ copy of the published artifacts.

I’ve also thought a bit about centralised versus distributed version management and when the different choices might work, but I think I’ll leave that for another post at another time (EDIT: that time was now). Either way, I think that regardless of the virtues of distributed version management systems like Git, Maven artifacts need to be managed centrally. It would be interesting to think about what a distributed dependency management system would look like…

Advertisement

,

  1. #1 by struberg on February 20, 2010 - 17:49

    thanks petter, really good sum up!
    Maybe I’ll hack in a backdoor do disable pushes in march. Imo the final way to go is not to push while staging and to perform the push if we do the full release.

    LieGrue,
    strub

  2. #2 by Hiram Chirino on February 24, 2010 - 13:56

    Hi Petter,

    I think maven and git are actually on the same model in this regard.

    Even in maven a ‘release’ can be staged and then ‘dropped’ via something like a nexus repository manger.
    Git also hase a similar feature. You can commit local changes and then use ‘git rebase -i’ to drop previous commits. It re-writes local history so that it looks like the commit never happend.

    But this feature only works if you have NOT pushed the changes to a remote repo. You can’t re-write the history of a remote git repo.

    Since the ‘release’ plugin does not necessarily push a release to public repos, then a the release plugin should also not push a release in to the public scm repo either.

    If the the release gets manually promoted in staging, then it can also be manually pushed from the local git repo to the publicly shared repo.

    • #3 by pettermahlen on February 24, 2010 - 18:17

      Hiram,

      I agree with you that with both Git and Maven, you can ‘test’ releases or commits locally before making them official. The main point I was trying to make – and I didn’t realise this until I read and thought about your comment on the bug report – is that Maven’s model with sequential version numbers needs a central source for those version numbers, and that therefore Maven’s model is centralised per se.

      So Maven is not distributed whether or not Git is, and when doing a Maven release I think you’re working primarily within the Maven framework.

      I guess that it isn’t really necessary to mandate publishing the sources (through the tag created by the maven-release process) just because you publish a certain version of an artifact (through the deploy part of the maven-release process), but I think it usually makes sense to do so. It could certainly be said that the release process bridges the two systems rather than happening within either, so perhaps some compromise like what Mark is suggesting is the best.

  3. #4 by Jakub Narebski on February 28, 2010 - 15:03

    Please note that while the default behavior of git-push is to push all matching branches, this can be configured using `push.default` configuration variable. This variable can be set to: ‘nothing’, ‘matching’, ‘tracking’ or ‘current’. See `man git-config` for details.

  4. #5 by struberg on February 28, 2010 - 22:06

    Hi Jakub!

    First of all a personal thanks to you for providing providing lots of patches to git :)

    A short explanation why it _currently_ is the way it is ;) there are 2 main pieces involved in the release process:

    a.) the maven-release-manager and
    b.) the respective maven-scm-provider

    So if we talk about that area, we should think a bit further and also consider other DSCMs like hg, etc also.

    The a.) doesn’t take much care about different scm providers. In the release:prepare step it simply applies changes to poms, scm:checkin them and later scm:tag them. The question is what kind of abstraction the scm-api should serve? Our initial idea (git was the first full DSCM provider, I think) was that we should ’emulate’ the behaviour of the existing workflow, because that’s what most users would expect, and also wouldn’t require us to change any other central maven parts. Means we took a fairly centralistic approach and by default, all the ‘write’ commands also push to the upstream repo.

    For the technical side:
    In the GitCheckInCommand of the gitexe scm provider (I also wrote a jgit version of that provider, but that is not yet released) I explicitly determine the upstream repo URL and both local and remote branches.

    In fact the GitCheckInCommand first executes a git-add / git-commit -a and after that executes a git-push [pushUrl] branchName:branchName

    See:
    https://svn.apache.org/repos/asf/maven/scm/trunk/maven-scm-providers/maven-scm-providers-git/maven-scm-provider-gitexe/src/main/java/org/apache/maven/scm/provider/git/gitexe/command/checkin/GitCheckInCommand.java

    The GitTagCommand is pretty similar, but the push for a tag obviously looks a bit different as you know.

    https://svn.apache.org/repos/asf/maven/scm/trunk/maven-scm-providers/maven-scm-providers-git/maven-scm-provider-gitexe/src/main/java/org/apache/maven/scm/provider/git/gitexe/command/tag/GitTagCommand.java

    A change of this default behaviour would be really easy in terms of hacking. I’ll have to release a few projects/plugins in the next weeks and then take an hour to do that.

    So be prepared ;)

    PS: @all anyone who likes to help out on a _decent_ submodules support is really welcome!
    That’s one of the missing areas, I need to solve. Git has no sparse checkout support. This is not a priori bad, but thus you cannot tag nor release submodules independently with maven. Currently you’ll need to release your project as a whole (including all submodules)
    There are 2 steps needed to fix this: 1.) submodules support 2.) auto-submodules detection while release:perform.

    LieGrue,
    strub

    PS: I’m not sure who wrote it in all the long discussions about that topic. But basically maven is all about reproducability. As is with SCMs. With git, you always have your sha1, and thus you don’t need a mandatory push. With maven there is no such thing. One have to set the version number themself. And if 2 people release 2 different artifacts with the same version number, then all is broken (because not reproducable).

  5. #6 by Jakub Narebski on March 1, 2010 - 09:39

    > That’s one of the missing areas, I need to solve. Git has no sparse checkout support.

    git version 1.7.0 has sparse checkout. From RelNotes-1.7.0.txt:

    * “sparse checkout” feature allows only part of the work tree to be checked out.

  6. #7 by Andreas Krey on March 29, 2010 - 11:22

    > One have to set the version number themself.

    Why? Why not first tag the commit you want to build from (possibly including pushing the tag to grand central), and then let the build script use the output of ‘git describe [–dirty]’ to name the release?

    • #8 by Petter Måhlén on March 29, 2010 - 17:16

      I may misunderstand the question, but I think the main thing you achieve by naming releases yourself is an easily understandable sequence of versions. With SHA-1 codes, there is nothing that indicates whether a version is newer or older than another, whereas 2.2 is clearly newer than 2.1.

      • #9 by Dominic Mitchell on March 29, 2010 - 20:00

        You might think that. Which is newer: 2.0.1 or 2.1? :)

      • #10 by Andreas Krey on March 31, 2010 - 15:31

        No, I wanted to have it the other way around. First create a tag (and push it to central), and when that succeeds check out that tag and build and release it. And most importantly: within the build use the tag information from the VCS (‘git describe’ or ‘svn info’) to give the release its name, and don’t patch it into the files under version control.

        I’m much opposed to have ‘build.release=1.2.3’ in a file under version control.

        ‘git describe’ also has the interesting property of producing names like ‘1.2.3-16-gabbbe55’ when the currently checked out commit is 16 commits away from the tag ‘1.2.3’, thus we get increasing ‘names’ even for untagged states, and the hash part makes it possible to identify the commit in question. You wouldn’t want to hand out such builds, though.

      • #11 by Petter Måhlén on March 31, 2010 - 17:24

        Well, with Maven, you already have a file saying which version you’re working on, the pom.xml file. What the release plugin does (among other things) is help you update that. It does things in the opposite order to what you’re suggesting: it lets you specify the release number for the version you’re releasing, then tags the source code in the repository, then pushes the build out everywhere. If you’re not using Maven, then what you’re suggesting seems like a good idea (for a while, we used git-describe to label builds from our CI system), but if you are using Maven, I would recommend using the release plugin instead.

  7. #12 by Mark Derricutt on July 6, 2010 - 22:12

    Personally I still think the main problem with maven enforcing a push on release is the assumption that there even IS a remote/upstream repository.

    If I’ve started a new project on my laptop, ran git init and committed and I want to release an artifact into my local repository, I don’t HAVE an upstream repository ( yet ).

    And in any case, upstream might not even be the ROOT upstream repo.

  8. #13 by struberg on July 7, 2010 - 09:49

    Mark,
    there is no such thing like a ‘local release’ in maven.
    If you do a
    $> mvn install
    then maven will copy the built artifacts to your local repo (~/.m2/repository).

    If you do a
    $> mvn deploy
    maven will do a scp (or whatever) to the upstream repository (maven.central, etc) which is defined in .

    And doing a ‘maven release’ (mvn release:prepare + mvn release:perform) always will perform a mvn deploy.

    And now the main argument is: if you push the artifacts (the freshly built jar) to an upstream repo, then the fresh tag should get pushed also.

    LieGrue,
    strub

  9. #14 by John on July 16, 2010 - 00:23

    strub,

    Mark meant his GIT repository, not the maven repository. Mark’s use-case is that there is only ONE git repository in the whole world for his project’s source code: to where will the git scm plugin push commits?

    There is no question here about maven uploading a released artifact into a maven repository. The question here is what should maven’s git scm plugin be doing in the release plugin with the changes it makes to the code.

    The maven release plugin HAS to commit its changes. In a centralized VCS, the plug in HAS to push those changes to the centralized source repository. In a decentralized VCS, the plugin HAS to commit the changes to the LOCAL source repository. It DOES NOT have to push the changes to any other repository. Auto-push would make sense, but still not required I think, if there were a blessed central git repository that had to stay in sync with the artifacts available in the blessed maven repository. Auto-push does not make sense if the release plugin is operating on the blessed repository directly.

    I think the git scm plugin in maven should do nothing except commit and tag the local git repository. This will be adequate and fit nicely into EVERY git workflow. It is what I expect as a git user.

  10. #15 by struberg on July 18, 2010 - 09:42

    John,
    You have to admit that those projects are really rare, isn’t? Because even the smallest projects should not only get stored on a notebook. At least not if they are so important that you like to ‘release’ them. For such projects, there is also no need to use the release plugin. Simply upgrade your poms, tag your sources and you are done.

    LieGrue,
    strub

  11. #16 by John on July 20, 2010 - 11:55

    strub,

    1) I think it is very difficult if not impossible to be certain how frequent this use-case could be. 2) There is little foundation for the assumption: only important projects are ‘released’. 3) Someone needs to define ‘important’ without being insulting.

    Regardless, here is a valid git workflow that a non-trvial project may use which the plugin will have difficulty with: the project release manager ‘releases’ directly from the public repository. (I guess this won’t work in public repository hosting services like github because login access to the hosted filesystem is required). I may run release on my public repository for my own reasons and why not? For what purpose would the plugin restrict a project from this workflow?

    Regards,
    John

    • #17 by Petter Måhlén on July 21, 2010 - 06:12

      John,

      As far as I am aware, the plugin now allows you to disable pushing with a command line flag. So the workflow you describe is possible.

      My take is that it is probably not the way you would typically want to do it, so I am glad it isn’t the main focus of the plugin developers. They should be free to smooth the most common paths rather than corner cases. No offense intended, I just have never seen an open source or commercial setup like what you describe, so it doesn’t seem important as a use case. The current choice of supporting it without focusing on it seems like a good compromise.

  12. #18 by Hiram Chirino on July 21, 2010 - 14:05

    Hi Petter,

    Anyone who uses a Nexus repository to ‘stage’ their releases would benefit from not pushing to the central repository.

    Once the release is staged it usually goes through a final QA pass to make sure the release is good. I’ve been involved in lots of releases where the staged version failed the QA and the release has to be re-rolled. If the git repo had been pushed, you have now committed to GIT version history that you did a release which in fact has not occurred yet since it has only been staged.

    • #19 by Petter Måhlén on July 21, 2010 - 16:11

      Hi Hiram,

      That’s true of course – the way we do it at work is to simply make an additional release if we should find a problem during final staging, which fortunately happens pretty rarely. We don’t worry too much about version 1.20 being extremely shortlived and replaced by 1.21 within a day or so, but I can certainly see that being a problem in other cases. I’m not sure how best to handle replacing version 1.20 with a new one given that Maven treats released versions as immutable. It feels a bit wrong to replace version 1.20 with a different 1.20 (I guess you have to manually clean out local repositories to do that, and that could go wrong too), but it also feels a bit wrong to have a released version that lives only for a short time. I guess which of those problems you ‘select’ depends on the project.

      • #20 by Andy Schlaikjer on February 23, 2011 - 06:55

        (I guess you have to manually clean out local repositories to do that, and that could go wrong too)

        See Sonatype OSS maven repo usage guidelines for example usage of staging repositories.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: