FHIR Chat · Git help needed · committers/git-help

Stream: committers/git-help

Topic: Git help needed


view this post on Zulip Grahame Grieve (Oct 07 2021 at 21:14):

We're getting a lot of feedback from HL7 committees: We really really hate git, and we're totally powerless to deal with it

view this post on Zulip Grahame Grieve (Oct 07 2021 at 21:15):

and when we try to ask for help, experts just tell us that we're using the wrong git client, but that doesn't actually help at all

view this post on Zulip Grahame Grieve (Oct 07 2021 at 21:15):

we need to do something about this. But what?

view this post on Zulip Grahame Grieve (Oct 07 2021 at 21:17):

I'm terrified of git conflicts - it's super stressful because I might merge wrong, and git's language is totally confusing. Perhaps I'm just stupid but in my head, there's other people's changes, my changes, and the product. Git gives me 'base', 'merge-head', and 'head'. Which of those is which, and could they choose less useful language?

view this post on Zulip Grahame Grieve (Oct 07 2021 at 21:19):

so you can imagine how committee members feel.

view this post on Zulip Grahame Grieve (Oct 07 2021 at 21:20):

and then, after a merge, git tells you you're committing hundreds of files. Did you change them? Is that a whoopsie?

view this post on Zulip Gino Canessa (Oct 07 2021 at 22:03):

I don't think the problem is git per se... We are building a product via many async distributed 'teams' that have varying levels of comfort with the tools. Version management is a hard problem, even more so for people not used to dealing with it. Short of something like a custom software platform for editing the FHIR spec (which I can't imagine being feasible), I don't know how much of a difference the tooling can make.

But, we should be able to reduce the burden.

First, I'd suggest identifying the 'repeat offenders' to see if they can be removed/not checked-in, e.g.:

  • Adding resources often causes conflicts in the 'top level' files (e.g., oids.ini, resourcelist.html, etc.) - can we move these out of 'normal' editing, make them generated files, etc.?
  • I often have conflicts with things like cache files (e.g., null.cache) - they are nice to have, but I would be willing to trade a longer 'first build' for not dealing with those conflicts.
  • There is a lot of 'build' stuff as well as the actual content source. If those files cause problems, they could be split into different repos (e.g., if people are making changes for local builds and accidentally causing problems).

Second, I'm guessing that some educational content would go a long way. Everything from 'setting up your environment correctly' to things like 'how to deal with long-lived branches' to 'how to add a new resource/operation/etc.'. I'm guessing a lot of frustration comes from not knowing.

After mulling for a bit (but not wanting to rewrite all the above =), another possibility would be to have someone (paid) responsible for managing the repo. Merging branches, warning people of long-lived ones, deleting stale ones, etc. ... localize the pain to a single person/team and compensate them for dealing with it (e.g., ops).

view this post on Zulip Jean Duteau (Oct 07 2021 at 22:14):

i was writing up a bunch of stuff and then Gino posted and that was a lot of what I was going to say. :)

can I step back and ask why you decided on Git as your version control platform? Git imposes a certain way of thinking about your source files and how they are managed and that has been imposed on us because of the decision to use Git.

the other question is what alternative would we use if we decided to move away from Git?

I do think that treating the FHIR specification as set of source files was a good decision. And if you do that, then you need a way to manage and control the source files. But it is a different way of thinking then many specification developers at HL7 are used to thinking. I agree with Gino's suggestions on reducing the burden - identify files that are pain points, provide some education to new committers, single out repository managers who can help and/or do some of the required tasks in managing a repository of this size.

view this post on Zulip Grahame Grieve (Oct 07 2021 at 22:22):

the files that are the pain points that I've heard about are the txCache files, but they don't really matter - fix them however way. It's source merge conflicts that are the real problem from what I've heard, since they matter

view this post on Zulip Grahame Grieve (Oct 07 2021 at 22:23):

note that the build is 3hours + longer when I rebuild the txCache. I don't think we want to have build times like that

view this post on Zulip Grahame Grieve (Oct 07 2021 at 23:56):

why choose git? because of the workflows around it , and in the hope that we could manage it. The second has turned out to not be true

view this post on Zulip Vassil Peytchev (Oct 08 2021 at 00:17):

There are quite a few things to unpack here.

  1. Git != GitHub. Are the beneficial workflows those of git mainly, or the add-ons of GitHub?

  2. Related to the previous - are our needs better served by git-flow or GitHub flow

  3. The creator of Git has stated that GitHub "creates absolutely useless garbage merges, and you should never ever use the github interfaces to merge anything" - could that be one of the reasons merges are such an issue?

  4. There are multiple ways to do the same things - e.g. pull requests from branches, vs. pull requests from forks. Is one better than the others?

  5. Should all merges be done without fast-forward (--no-ff parameter)?

  6. Do git clients provide sufficient added value compared to the command line interface?

view this post on Zulip Josh Mandel (Oct 08 2021 at 00:27):

(On 3, I don't think Linus's gripes are things we particularly worry about. They're to do with things like signatures which haven't been a pain point for us, not about the graph structure of the repository history.)

view this post on Zulip Vassil Peytchev (Oct 08 2021 at 01:29):

On the txCache files, would adding a step after the local build to add any new files generated due to the build, and then commit them and any modified existing ones with auto-add? I think the problems occur when these files are not updated after the build, is that the case?

view this post on Zulip Rob Hausam (Oct 08 2021 at 13:14):

I think that the txcache files really are the source of most of the merge issues that I have had and have heard of - probably even up to 80-90% of them? I think those cache files are still changing far too often (and often apparently unnecessarily), and any diffs on them to see what has changed and why it has changed are for the most part completely useless. I agree with Gino that it may be better to trade a longer 'first build' for avoiding having to deal with those conflicts - add the folder to .gitignore and manage it entirely locally. And Vassil I think is also right that many times these files are not being checked in consistently - and then when someone does check them in that probably means that the changes and the potential conflicts are greater. But the easiest way may be to not check them in at all.

view this post on Zulip Lloyd McKenzie (Oct 08 2021 at 16:42):

I think asking people to use command line is a non-starter. Some of the GUI tools aren't that intuitive, but the commandline is far worse.

view this post on Zulip Jean Duteau (Oct 08 2021 at 16:57):

i actually have no problem with the commandline. it's pretty much dead simple since there is quite a restricted set of commands you're going to use

view this post on Zulip Lloyd McKenzie (Oct 08 2021 at 17:32):

Yes, but you're not an example of someone who struggles with Git :)

view this post on Zulip Joel Schneider (Oct 08 2021 at 18:03):

I tend to stick with the git command line, rather than trying to second-guess what some GUI tool might actually be doing.

view this post on Zulip Rob Hausam (Oct 08 2021 at 18:41):

I'm also one who from my experience finds it often easier and more straightforward to work with the command line rather than the various tools that I've used so far (e.g., GitKraken, TortoiseGit).

view this post on Zulip Craig Newman (Oct 08 2021 at 18:43):

As someone without much knowledge of the process (and in your target audience), I actually prefer the command line to a tool provided that there is a clear set of steps I can follow (and better yet, cut and paste). Too many menu choices in most of the tools I've tried out.

view this post on Zulip Rob Hausam (Oct 08 2021 at 18:43):

I haven't wanted to or found a need to use any of the tools for quite some time (although I started by trying GitKraken).

view this post on Zulip Grahame Grieve (Oct 08 2021 at 19:10):

I can't stand the command line. so much typing of file names - what a waste of time

view this post on Zulip Grahame Grieve (Oct 08 2021 at 19:11):

and you might like saying to make the first build last longer - very much longer, but you all like that I can wear that pain once for all of you. otherwise I'll periodically wipe your cache and you'll all have three hour builds.

view this post on Zulip Josh Mandel (Oct 08 2021 at 19:12):

You very rarely need to type out full file names with tab completion :-)

But I'm not trying to say that there is a one size fits all solution.

view this post on Zulip Josh Mandel (Oct 08 2021 at 19:13):

I do think it would be helpful to have an in-depth technical discussion, maybe a live conversation, about the overall caching strategies so that we could explore whether other options might be able to meet the needs.

view this post on Zulip Jean Duteau (Oct 08 2021 at 19:24):

well, i deleted my entire vscache directory to see how long it would take and I just got a timeout so I couldn't check :)

view this post on Zulip Jean Duteau (Oct 08 2021 at 19:34):

okay i'm trying with just deleting the files in the vscache directory and not the entire directory :)

view this post on Zulip Gino Canessa (Oct 08 2021 at 20:31):

Grahame Grieve said:

and you might like saying to make the first build last longer - very much longer, but you all like that I can wear that pain once for all of you. otherwise I'll periodically wipe your cache and you'll all have three hour builds.

I'd argue it's not we 'all'... but yeah, that's usually what users/customers want. They are asking someone else to take on pain so they don't have to.

The tradeoff doesn't have to be paid in longer build times - for example, it can be paid in development time (though I worry that right now, it's still asking you to bear most of the pain - but that's another discussion). There are lots of options here: make txCache something that only updates from the main branch builds with another directory for 'local' changes; move txCache into another repo/project/package altogether; make it so that local builds don't modify txCache and just build the local things they need on the fly each time; or if it's only an issue in your environment, identify a process that keeps a 'local' copy of the txCache to swap in and out; etc..

I don't know enough about the caching mechanism of the build process to know what would be the simplest to implement. But, if the majority of conflicts are in the txCache, then that seems like the best place to start addressing it.

But to my knowledge, there is no tool that allows different people to work on the same files at the same time, has a 'branching' concept for building/testing/exploring, and does not have to deal with merge conflicts. I don't think it's a tool limitation, but rather a process one (or a physics one, if you want to blame it on linear time =).

view this post on Zulip Lloyd McKenzie (Oct 08 2021 at 20:54):

Could we have a dual-layer cache - one generated locally and not committed, and one generated by the CI build and auto-committed? Because there are two layers, we wouldn't have conflicts and everyone would get the benefit of the CI-build maintained cache.

view this post on Zulip Vassil Peytchev (Oct 09 2021 at 01:06):

Wouldn't that mean a 3-hour CI build?

view this post on Zulip Josh Mandel (Oct 09 2021 at 01:09):

The CI build can support caching strategies. I think the main thing (for me anyway) is to understand the problem better.

view this post on Zulip Lloyd McKenzie (Oct 09 2021 at 01:39):

The CI build wouldn't have to rebuild from scratch. It would retain its own cache (which it commits) and would simply update it. The complexity would be in managing a 2-level cache.

view this post on Zulip Jean Duteau (Oct 09 2021 at 01:40):

Gino Canessa said:

But to my knowledge, there is no tool that allows different people to work on the same files at the same time, has a 'branching' concept for building/testing/exploring, and does not have to deal with merge conflicts. I don't think it's a tool limitation, but rather a process one (or a physics one, if you want to blame it on linear time =).

I think that the merge conflicts are exacerbated by:
a) we have a lot of people making PRs
b) the length of time that our checks take when a PR is submitted is much longer than normal
In previous and current projects that I'm on, the checks run the tests and they take no more than 10 minutes, so the time from a PR request to merging it is short.

view this post on Zulip Lloyd McKenzie (Oct 09 2021 at 01:40):

Only issue would be the CI-build colliding with itself. If that happened, I think it'd just revert and not bother committing the cache updates and leave it to the next CI build that didn't collide to commit.

view this post on Zulip Josh Mandel (Oct 09 2021 at 02:01):

I'm still interested in understanding whether there's an approach to caching that would avoid version control conflicts altogether.

view this post on Zulip Grahame Grieve (Oct 09 2021 at 03:38):

the thing about the txCache files is that it's easy to resolve - just abandon one or the other and move on. It's the other merges that matter. And while it's true that non-locking version control programs lead to merge, I was never afraid of svn merges, and I'm always afraid of git merges

view this post on Zulip Jean Duteau (Oct 09 2021 at 07:48):

why would you be afraid of merges via git vs svn? i'm more afraid of svn merges because of the way that svn treats branches. as well, I don't think that the concept of a pull request where you can run pre-update checks before a merge occurs exists in SVN. It didn't back when I was using SVN and a quick web search seems to imply that it doesn't.

view this post on Zulip Grahame Grieve (Oct 11 2021 at 23:48):

well, our infrastructure is better. It's a lot quicker to build without the cache. But it doesn't work - I have some bugs to find and fix

view this post on Zulip Josh Mandel (Oct 11 2021 at 23:55):

A couple of notes reflecting on this thread:

  • Before migrating FHIR from svn to git (and this was a primary motivation for the migration), we had a broken build in svn a lot of the time. People would by necessity push their possibly-broken updates onto the already broken head of the repo, which compounded the problem. We don't have that anymore; our main branch in git is protected by pre-merge checks that are really pretty effective. (Yes, this has a cost in the effort of managing these merges, which I'm not trying to minimize.)

  • Grahame when you encounter a situation with a scary merge, maybe jot down the commit hashes of the branches you're merging, and share here; without trying to intervene midstream, we can review after the fact and see if there are any workflow improvements to suggest.

view this post on Zulip Grahame Grieve (Oct 11 2021 at 23:59):

well, most of them are java code, but sure, I can do that

view this post on Zulip Josh Mandel (Oct 12 2021 at 00:05):

Cool. (The actual java semantics involved in the merge process wouldn't be different between svn and git, right?)

view this post on Zulip Grahame Grieve (Oct 12 2021 at 00:08):

probably not.

view this post on Zulip Grahame Grieve (Oct 12 2021 at 00:09):

it may be the UI - I just can't tell what it's doing line wise. But sometimes, I have no idea what it's even showing me. I had one last week - the diff showed fundamantal change to the structure, but none of the recent commits had anything like that. I just reverted the repo and made my changes again.

view this post on Zulip Grahame Grieve (Oct 12 2021 at 00:22):

oh - and I'm doing this locally as well

view this post on Zulip Josh Mandel (Oct 12 2021 at 00:44):

Locally resolving is good (but cases where the parent branches exist in GitHub will make the best examples for us to review after the fact).

view this post on Zulip John Moehrke (Oct 12 2021 at 19:47):

besides the vscache and other "files that I didn't change, so I don't know what I should do with"..... The other issue I come up against is just understanding the "right" way to do something. These "somethings" might seem obvious to the GIT elite, but I am never quite sure what to do when I am told things like my branch being behind blah blah... these "somethings" should be identified and a set of visual tutorials (we started this github process with some nice visual tutorials).

view this post on Zulip Grahame Grieve (Oct 13 2021 at 23:04):

follow up on this. We could free up a few $$ to pay someone to document 'standard procedures' to resolve common challenges for editors. Would this be useful?

view this post on Zulip Grahame Grieve (Oct 13 2021 at 23:05):

(irrespective of what else we do about cache files)

view this post on Zulip Vassil Peytchev (Oct 14 2021 at 02:09):

I am writing up some expanded wording and diagrams on this, but I don't know exactly what are "common challenges". Would it be useful to investigate your current 202110 branch?

view this post on Zulip Grahame Grieve (Oct 14 2021 at 02:10):

don't know. possibly not - there's a lot of moving parts on that one and it's highly abnormal

view this post on Zulip Vassil Peytchev (Oct 14 2021 at 02:21):

Here are some of the points I am trying to address

  • The slight difference between creating pull requests from a branch vs from a fork
  • Regularly merge master/main into the local feature branch
  • Suggest use of kdiff3 to resolve conflicts.
  • How to commit large number of modified files without typing the file names

view this post on Zulip Vassil Peytchev (Oct 29 2021 at 21:04):

I think I have identified an issue that can explain at least some of the unexpected problems when merging things. I also think that it can be mitigated with some creative scripting.

The problem: All along we have been advising people to clone master and then to run the publish process. This makes sense, as we want to make sure that the local setup is done correctly, and the build can succeed. The problem is that the build modifies tracked files - the famous vscache files, but on occasion others, for example tools/tx/snomed/snomed.xml. This results in a set of files that are modified relative to the master branch but not part of it. If we now create a new branch locally, these modified files are mixing with the new branch. In most(?) cases that is OK, but I think it is a source of endless aggravation of the "death by a thousand cuts" type.

  • Possible mitigation 1

    • Change the steps for making changes to be:

      1. clone master locally
      2. create the feature branch
      3. run the build in the feature branch
      4. make changes
      5. run the build
      6. ...etc.
    • change the build script to not allow the build to run in the master branch

    • change the build script to auto commit the changes to any uncommitted files after the build
  • Possible mitigation 2

    • Disallow direct branching and pull requests from branches, forcing all PRs to be from a fork. I think this will do more damage than good.
    • change the build script to auto commit the changes to any uncommitted files after the build

@Grahame Grieve @Josh Mandel @Lloyd McKenzie @Mark Iantorno

view this post on Zulip Grahame Grieve (Oct 29 2021 at 21:12):

This results in a set of files that are modified relative to the master branch but not part of it. If we now create a new branch locally, these modified files are mixing with the new branch

I don't know what those sentences mean

view this post on Zulip Vassil Peytchev (Oct 29 2021 at 21:36):

Let's take tools/tx/snomed/snomed.xml - somehow the single quote in a description (cow's milk) got mangled into a sequence of the characters '?' '€' '™' - before the build the euro and tm characters were represented by the corresponding entity codes, after the build they were converted to the actual characters. The file is now marked as modified. git status will show it as modified, but not staged for commit.

If you create a branch: git checkout -b test_branch all the modified files that are not staged for commit will be in the new branch. git status will show them as part of the current branch. In most cases that is not a problem - if you now commit the modified files while in the test_branch, they will not show up in the master branch as modified.

I think the problems may arise when not everything is properly committed, or things get committed by mistake in the master branch instead of the test_branch.

view this post on Zulip Grahame Grieve (Oct 29 2021 at 21:37):

I don't understand why changing the branch makes any difference. You still have a modified file to do something about

view this post on Zulip Vassil Peytchev (Oct 29 2021 at 21:47):

True, but are the modifications from the first build run necessary after your run the build after you make changes? Does it matter if you commit them before you run your next build?

I think the more we limit the chances of confusion of which branch we are modifying, the less will people experience edge cases that are hard to resolve.

view this post on Zulip Grahame Grieve (Oct 29 2021 at 21:58):

it means someone else should have committed them

view this post on Zulip Grahame Grieve (Nov 01 2021 at 20:32):

are the modifications from the first build run necessary after your run the build after you make changes?

no. they'll just keep getting made

Does it matter if you commit them before you run your next build?

no

view this post on Zulip Grahame Grieve (Nov 01 2021 at 20:34):

so it presently takes my build 17 minutes longer to run without the vsCache. I can get rid of the files in vsCache, which means that

  • everybody's first build of a new checkout will get ~20mins longer
  • the ci-build will get 20min longer

view this post on Zulip Vassil Peytchev (Nov 01 2021 at 20:41):

I think it is fairly straight forward to force committing any changes in vscache, source, and tools as part of the build script. That will save the 20 minutes of build, and will avoid issues with missing generated files.

view this post on Zulip Vassil Peytchev (Nov 01 2021 at 20:42):

I will create a PR to modify publish.sh

view this post on Zulip Grahame Grieve (Nov 01 2021 at 20:45):

I don't understand how a build script can force people to commit something when it doesn't cover the commit?

view this post on Zulip Vassil Peytchev (Nov 01 2021 at 20:53):

The script will list all new files that have not been added into the "stage" (i.e. no git add). Then it will loop through them, and if they are under one of tools/ vscache/ or source/ it will do a git add to them. After that, a git commit -am "Auto-commit courtesy to the publish script" will commit both those, and any modified but not committed files.

view this post on Zulip Grahame Grieve (Nov 01 2021 at 20:54):

wooah so git commits everything any time the build runs?

view this post on Zulip Vassil Peytchev (Nov 01 2021 at 21:13):

Under certain folders. It can be limited to vscache/ only.
What is the use case for not committing everything that goes into the build?

view this post on Zulip Grahame Grieve (Nov 01 2021 at 21:25):

I'm working on some stuff, and I run the build to see if it works.

view this post on Zulip Vassil Peytchev (Nov 01 2021 at 21:26):

What's the downside of having committed it?

view this post on Zulip Josh Mandel (Nov 02 2021 at 03:06):

Commits should generally reflect an intent, and the commit history should tell a story (not just the story "Then I built. Then I built. Then I built...").

view this post on Zulip Josh Mandel (Nov 02 2021 at 03:07):

But if there were really practical wins in abandoning this ideal... I think they'd be worth evaluating.

view this post on Zulip Grahame Grieve (Nov 04 2021 at 07:13):

looks likely that I'm going to replace vsCache altogether anyway, and they'll no longer be in version control. Josh and I worked out an alternative approach that will work better, actually, and it doesn't have merge clashes.

view this post on Zulip Grahame Grieve (Nov 04 2021 at 07:14):

I'm not going to describe it quite yet - still doing research whether it's feasible. But if it is... than those files will disappear (main build first, and then, when that's bedded down, the IGs as well)

view this post on Zulip Vassil Peytchev (Nov 04 2021 at 13:41):

Another possible source of merge clashes might be the use of rebase instead of merge. While we never mention rebase in any of the instructions, there was at least one mention of it in previous answers to git questions.

Anyone seeing an issue with explicitly stating in the instructions not to use rebase for PRs?

view this post on Zulip Grahame Grieve (Nov 04 2021 at 18:19):

I don't know what it is and would never do it

view this post on Zulip Rob Hausam (Nov 04 2021 at 18:32):

@Grahame Grieve Rebase definitely has its uses. I just used it to streamline our commits (primarily consolidating and removing all of the excess "update txcache" commits) and then moving only the needed updates from our IPS development (i.e. 'connectathon*') branches into the CI build. The interactive rebase worked quite well for that. But typically you do not want to do that (or at least do it very judiciously) with any commits that have been pushed to a central repository and others may have accessed and are using. I agree with @Vassil Peytchev that inappropriate use of rebase would certainly be a source of merge conflicts.

view this post on Zulip Josh Mandel (Nov 04 2021 at 18:36):

Yeah, I wouldn't "forbid" rebasing for users that want to take this upon themselves during branch development on their own machines, but we've (since the beginning) configured the GitHub web interface for the FHIR repo to only allow merges, not rebases:

image.png

view this post on Zulip Vassil Peytchev (Nov 04 2021 at 18:39):

How about the step before - locally merge master into feature branch?

view this post on Zulip Josh Mandel (Nov 04 2021 at 19:03):

This is something developers can control how they want to do on their own side. I think our recommendations should be just to do simple merges, but if a developer wants to rebase for some reason it's really up to them, or I would say it's below the abstraction barrier because we don't have direct insight into what happens there.

view this post on Zulip Rob Hausam (Nov 04 2021 at 19:08):

Agree.

view this post on Zulip Vassil Peytchev (Nov 04 2021 at 19:24):

I think using rebase to pull changes from master into the feature branch since the branch was created is dangerous. I want to put a warning on step 8 here not to use rebase.

view this post on Zulip Grahame Grieve (Nov 10 2021 at 06:17):

see https://chat.fhir.org/#narrow/stream/179165-committers/topic/txCache.20is.20out about the tx-cache

view this post on Zulip Josh Mandel (Nov 10 2021 at 14:59):

Fine by me to add a warning; folks should not use rebase unless they know what they're doing and why

view this post on Zulip John Moehrke (Nov 10 2021 at 16:52):

who is using rebase without knowing what it is or why... I certainly don't have a clue

view this post on Zulip Oliver Egger (Nov 10 2021 at 20:39):

For individuals, rebasing makes a lot of sense :grinning: see https://medium.datadriveninvestor.com/git-rebase-vs-merge-cc5199edd77c. i often us it for ig development before merging it into the main to have the history streamlined.

view this post on Zulip Vassil Peytchev (Nov 10 2021 at 21:03):

Key paragraph from the link, and, AFAIK, reflecting generally accepted best practices (emphasis mine):

If the feature branch you are getting changes from is shared with other developers, rebasing is not recommended, because the rebasing process will create inconsistent repositories. For individuals, rebasing makes a lot of sense.

Since pull requests need to pass the build checks, I view these branches as shared with other developers. But mainly, the step when you merge master into the feature branch in order to run the build locally and confirm that nothing is broken, should never be a rebase.


Last updated: Apr 12 2022 at 19:14 UTC