FHIR Chat · R5 Bash Error · committers

Stream: committers

Topic: R5 Bash Error


view this post on Zulip Marc Duteau (Mar 16 2021 at 21:51):

Ran into the following error while making a pull request to the master branch:

mv: cannot stat '/home/***/uploading/www/branches/r5-mnm-changes-1': No such file or directory
find: './Pharmacy20210218': No such file or directory
##[error]Bash exited with code '1'.

It looks similar to the issue that @Mark Iantorno ran into with the R4B branch if that helps, though I'm not sure if that issue got resolved.

view this post on Zulip Lloyd McKenzie (Mar 17 2021 at 14:18):

You might ask here: https://chat.fhir.org/#narrow/stream/179293-committers.2Fgit-help. That's where the experts who promised to help with Git issues hang out. (Unfortunately, I'm not one of them...)

view this post on Zulip Jean Duteau (Mar 17 2021 at 15:31):

This isn't a git error, it's an error in the build process. It's basically the same error that some of us saw on the R4B branch and it's affecting this pull request on R5.

view this post on Zulip Mark Iantorno (Mar 17 2021 at 18:00):

Yeah, this is an error that is occurring server side when it tries to upload the built branch to the server. What I don't understand is why it only happens to some people and not others... It's trying to look for a branch name that doesn't exist

view this post on Zulip Jean Duteau (Mar 18 2021 at 18:58):

@Mark Iantorno now I have the same error on a different branch/pull request:

Publishing to target jd-pharmacy
501M    .
962M    .
1.5G    .
mv: cannot stat '/home/***/uploading/www/branches/jd-pharmacy': No such file or directory
find: './Consent-for-R4B': No such file or directory
##[error]Bash exited with code '1'.

view this post on Zulip David Pyke (Mar 18 2021 at 18:59):

Consent-for-R4B was merged and deleted. You need to update your branch list

view this post on Zulip David Pyke (Mar 18 2021 at 19:00):

git fetch -p

view this post on Zulip Mark Iantorno (Mar 18 2021 at 19:02):

Thanks David, I had not had the chance to look into a solution for this

view this post on Zulip Jean Duteau (Mar 18 2021 at 20:20):

@David Pyke Are you saying I need to do that? Or does Mark need to do that?

view this post on Zulip David Pyke (Mar 18 2021 at 20:20):

you need to do that

view this post on Zulip Jean Duteau (Mar 18 2021 at 20:21):

okay, except that I started with a brand new clone of the repository this morning, made a branch, made changes, pushed the changes, and made a pull request. Why would my branch list be out-of-date?

view this post on Zulip David Pyke (Mar 18 2021 at 20:22):

It does. I get that all the time. I even have a script to run to clean it up after a pull request

view this post on Zulip David Pyke (Mar 18 2021 at 20:23):

git fetch -p
git branch --unset-upstream
git checkout master
git pull

view this post on Zulip Melva Peters (Mar 18 2021 at 20:24):

Is this something new? I've never seen this and there are branches merged and deleted all of the time.

view this post on Zulip Jean Duteau (Mar 18 2021 at 20:25):

i don't think that will solve anything. your git commands don't do anything to my branch and certainly doesn't do anything to the build pipeline that is set up and is trying to copy a branch that a) may not exist and b) I don't even reference in my pull request.

view this post on Zulip Jean Duteau (Mar 18 2021 at 20:28):

yeah, this isn't going to solve anything because it doesn't change anything on my branch. the problem is still in Mark's build pipeline scripts.

view this post on Zulip Michelle (Moseman) Miller (Mar 19 2021 at 13:47):

I'm now getting a similar error for the first time today (and did the exact same steps as earlier this week without issue).
https://github.com/HL7/fhir/pull/1184/

mv: cannot stat '/home/***/uploading/www/branches/michelle-miller-30769-Procedure-method-extension-cardinality': No such file or directory

What is the fix?

view this post on Zulip Lloyd McKenzie (Mar 19 2021 at 13:49):

@Mark Iantorno @Grahame Grieve Given that R5 changes for the upcoming ballot are due in 11 days, these issues are a definite problem. Is one of you digging into them?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 14:01):

I will dig into it now

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:08):

Alright, so here's where I'm at so far. I cannot replicated this locally...yet. I was able to run the same publish commands locally on my box with no issues...in fact, you can see Michelle's published branch here: https://build.fhir.org/branches/michelle-miller-30769-Procedure-method-extension-cardinality/

I'm unsure why there are conflicts with branch names...that no longer exist. I'm trying once again with my own, new, local branch, but I suspect that will succeed as well. I think this has to do with the way that Azure pipelines is cloning the repo. I'm going to look into that, and attempt to replicate the exact steps azure uses to clone from git to see why this might be happening.

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:16):

So, I can replicate this now. The uploads _are_ actually working, it's just returning an error that the folder name doesn't exist? I will figure out why that is happening.

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:41):

It always works the second time, the issue is that the server tries to execute a move to a folder that doesn't exist yet

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:42):

@Josh Mandel you sent me this code snippet before, but in the code here: https://github.com/FHIR/auto-ig-builder/blob/master/images/ci-build/publish#L22-L25

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:42):

specifically line 23

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:43):

shouldn't it be mkdir -p ~/uploading/www/branches/$DEPLOY_TO_BRANCH

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:43):

I think line 24 is where the error is occurring

view this post on Zulip Jean Duteau (Mar 19 2021 at 15:44):

if you don't mind me asking, why is it trying to move the folder? Isn't the point of this pull request process just to see if the build passed without errors? Why do we then need to move that branch anywhere when it is just trying to be merged and might be deleted shortly?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:45):

This is occurring on the server, I didn't write this code, so I am unsure. If I had to guess, I would assume the uploaded tar is unzipped and then moved to the proper directory

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:46):

The issue is that the server is returning an error, which while not detrimental to the build process, is being caught by azure and it is flagging it as a fail

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:46):

which is the correct thing for azure to do

view this post on Zulip Jean Duteau (Mar 19 2021 at 15:47):

understood, except I'm wondering why this step is occurring at all. In the azure pipeline for FHIR core builds, it seems like it should just build the branch and if that passes without error, stop.

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:47):

I'm just doing a test right now, re-running this build here: https://dev.azure.com/fhir-pipelines/fhir-publisher/_build/results?buildId=2841&view=logs&j=ab68b630-6476-573a-954f-d89e4292687e

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:48):

I'm curious if the behaviour I'm seeing locally on my machine (fail always the first time, succeeds afterwards) is going to be mirrored by the pipeline

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:48):

if it is, then I'm 99% sure this is the issue

view this post on Zulip Mark Iantorno (Mar 19 2021 at 15:48):

if it fails again, I will have to start digging again

view this post on Zulip Mark Iantorno (Mar 19 2021 at 16:02):

@Jean Duteau Some users want to preview their build on the website before the merge?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 16:05):

Yeah the upload I just tested worked the second time. This is a server side problem with the way the files are moved around.

view this post on Zulip Jean Duteau (Mar 19 2021 at 16:06):

sure, but that seems like it should be a different process from the pull request. Obviously the error needs to be fixed but for merging pull requests, all we need is confirmation that the branch build succeeds.

view this post on Zulip Mark Iantorno (Mar 19 2021 at 16:07):

It's more part of the CI process, in theory, we want to do a full upload and test before merging into master

view this post on Zulip Mark Iantorno (Mar 19 2021 at 16:08):

because, if it does fail, we want it to fail on the branch and not master

view this post on Zulip Mark Iantorno (Mar 19 2021 at 16:08):

a full CI is not just about "does it build"

view this post on Zulip Mark Iantorno (Mar 19 2021 at 16:08):

gotta run it end to end. ...Ideally, we should have a process that tests if the branch webpage is live and working

view this post on Zulip Jean Duteau (Mar 19 2021 at 16:12):

i guess. i'm not sure that is really needed, but I don't know enough about the CI pipeline :)

view this post on Zulip Mark Iantorno (Mar 19 2021 at 16:13):

There is a balance, I push things from a purely technical point of view, and then I get push back from the standards world.

view this post on Zulip Mark Iantorno (Mar 19 2021 at 16:13):

the hope is that we meet somewhere that works for both worlds

view this post on Zulip Mark Iantorno (Mar 19 2021 at 16:16):

I'm re-running everyone's jobs right now, in the hope that they succeed the second time as we've seen

view this post on Zulip Mark Iantorno (Mar 19 2021 at 16:17):

I'm going to ignore error on the upload step for now, in hopes that I can talk with @Josh Mandel to resolve this

view this post on Zulip Josh Mandel (Mar 19 2021 at 16:40):

Mark Iantorno: shouldn't it be mkdir -p ~/uploading/www/branches/$DEPLOY_TO_BRANCH

No, if you do:

$ mkdir test-ig
$ mkdir /tmp/test-ig
$ mv test-ig/ /tmp/test-ig

then you wind up with /tmp/test-ig/test-ig when you want /tmp/test-ig.

view this post on Zulip Josh Mandel (Mar 19 2021 at 16:41):

I think line 24 is where the error is occurring

 mv ~/uploading/$TARGET/ ~/uploading/www/branches/$DEPLOY_TO_BRANCH

How is this failing? Maybe I'm being dense here --- what specifically is going wrong, and why do you think so?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 17:50):

So, the error is happening when I execute this command:
tar czf - * | ssh -i $(Agent.TempDirectory)/deploy.rsa -p 2222 $(BUILD_FHIR_ORG_USERNAME)@build.fhir.org ./publish $TARGET_DIRECTORY
...which I don't think should be executing any mv commands locally...unless I truly have no idea what is going on.

I think that error: mv: cannot stat '/home/fhir_upload/uploading/www/branches/<branch-name-here>': No such file or directory is coming from the server. This error only happens the first time a branch is uploaded, every subsequent run succeeds no problem.

So we know from that,

  1. The first run will always fail on a new branch, all run after succeed
  2. There is due to the fact that a mv command is being executed, with the destination location being a directory with the branch name
  3. The first run actually still succeeds on the server, despite this error being returned. We know this because the website for that branch is live and working, even after the first run

I think that the first time an upload runs, whatever process is on the server is creating the folder with the branch name successfully, but only after a mv is attempted and failed at some point. I'm not sure how to prove this though.

view this post on Zulip Mark Iantorno (Mar 19 2021 at 17:50):

@Josh Mandel

view this post on Zulip Josh Mandel (Mar 19 2021 at 17:55):

mv: cannot stat '/home/fhir_upload/uploading/www/branches/<branch-name-here>': No such file or directory

Yes you're getting this output -- but is the build failing at this step? the || true should prevent failure.

view this post on Zulip Mark Iantorno (Mar 19 2021 at 17:56):

Yeah, I would ahve thought so as well, but...it is not?

view this post on Zulip Josh Mandel (Mar 19 2021 at 17:57):

So the publish script is printing this and exiting?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:00):

it's printing that and continuing, but Azure is detecting it and failing the step I'm pretty sure?

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:09):

I'm not sure what it means to continue and to also fail the step. With set -e any failed step should exit the scrit. My understanding might be off here; is there a minimal example that might help reveal what's happening? Like when I test locally...

$ bash -c 'exit 5'
$ echo $?
5

$ bash -c 'exit 5' || true
$ echo $?
0

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:13):

Once the backlog of branches completes, I will try that, however...

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:13):

this is another example of one that hard fails

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:13):

502M    .
962M    .
1.5G    .
mv: cannot stat '/home/***/uploading/www/branches/michelle-miller-30769-Procedure-method-extension-cardinality': No such file or directory
find: './R4B-EBMonFHIR-Part-3': No such file or directory
find: './R4B-EBMonFHIR-Part-4': No such file or directory
##[error]Bash exited with code '1'.
Finishing: Bash

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:14):

the upload is initiated locally with the command I posted before, but then the server throws those errors

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:14):

the mv command tell us it cannot stat

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:14):

then two find commands fail back to back'

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:14):

those are run remotely not locally

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:14):

they cause bash to hard exit

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:15):

the upload operation shouldn't care about those other two branches: R4B-EBMon...-3 & -4

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:16):

yet it tries to find them and do something with them?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:16):

Why is the server even concerned with them?

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:17):

find commands fail back to back

This is what I was wondering about -- these aren't guarded by a || true, so I could understand how these break a build (though I don't know what's going wrong with them)

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:18):

Do you know what find is failing on?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:18):

I do not? This is all jsut from executing that one upload command

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:18):

upload operation shouldn't care about those other two branches

We use "someone uploaded something" as a trigger for "hey let's clean up old stuff"

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:18):

can we make that clean up process not cause a fail if something goes wrong?

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:19):

Yeah, I'd look at https://github.com/FHIR/auto-ig-builder/blob/master/images/ci-build/publish#L33-L35 as the culprit ; consider adding a guard around this.

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:19):

But... I don't know why that find should be able to fail.

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:19):

like, what is going wrong?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:20):

no clue

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:20):

if I commit to that branch and that repo will it automatically go into prod?

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:22):

No, there's a manual step to build and re-deploy these.

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:23):

But I'm wary about skipping things without understanding why they break. Let me try manually running this find in the current container and see what happens.

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:24):

KK thank you

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:27):

It... runs fine with exit code 0 and no output.

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:30):

Try with a brand new branch

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:30):

It consistently happens on the first try with a new branch

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:32):

I'm running this find command directly, not in the context of a branch upload.

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:32):

What is ./R4B-EBMonFHIR-Part-3' ? The find seems to be mentioning it in the output you shared above.

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:33):

I would guess it was a branch at one point that existed

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:33):

And does not anymore

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:34):

So in that log out I sent above the person was trying to upload their branch, but somehow, a job on the server is looking for that other folder belonging to that other branch to do something with

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:35):

Each time an upload is performed we run

find . -path ./master -prune -o -type d -ctime +21  -path './*'

which says "which branches here are >21 days old"?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:35):

The upload has completed though, even though you get the bash upload failed, you can still navigate to build.fhir.org/branches and see the newly uploaded branch

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:35):

Yeah I think this might be a result of that cleaning job

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:35):

Right, so that's consistent with failing near the find step. Which happens after the upload and "move into web hosting tree" is done.

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:37):

When you get an error is is always about "./R4B-EBMonFHIR-Part-3" ?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:37):

No

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:37):

The branch name varies

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:38):

Are they always branches with something in common? Some special character, some particular age? What's the pattern here?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:38):

I'll go through the last 5 failed builds and get the branch names

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:38):

one sec

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:39):

OK., thanks!

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:41):

find: './R4B-EBMonFHIR-Part-3': No such file or directory
find: './R4B-EBMonFHIR-Part-4': No such file or directory
find: './R4B-EBMonFHIR-Part-5': No such file or directory
find: './Consent-for-R4B': No such file or directory
find: './R4B-EBMonFHIR-Part-1': No such file or directory
find: './Pharmacy-20210223': No such file or directory
find: './rik-meddef': No such file or directory
find: './Pharmacy20210218': No such file or directory

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:41):

those are the last 4 or so failed pipeline runs

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:44):

Searching Zulip for these names... I see some of these branches (all?) deleted yesterday (like from GH). Should that matter?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:46):

I am unsure...? It shouldn't matter for the upload push though, right?

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:52):

Shouldn't. But... looking for patterns :)

view this post on Zulip Josh Mandel (Mar 19 2021 at 18:55):

If you have a reliable way to trigger this error, I can try disabling the find step manually and we can test that (will have to be later though -- I'm in a workshop with divided attention).

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:57):

I can trigger the error with a new branch publish

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:57):

let me know and I will run one

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:57):

takes 25 min

view this post on Zulip Mark Iantorno (Mar 19 2021 at 18:57):

to run on azure

view this post on Zulip Mark Iantorno (Mar 19 2021 at 19:08):

@Josh Mandel I'm just stepping away for a few minutes, but if you get your changes in before I get back, you can start the test by opening this PR: https://github.com/HL7/fhir/compare/server_side_error_test_branch?expand=1

view this post on Zulip Mark Iantorno (Mar 19 2021 at 21:01):

@Josh Mandel have you had a chance to push the changes in?

view this post on Zulip Josh Mandel (Mar 19 2021 at 21:05):

Just getting a chance -- OK, edited

view this post on Zulip Josh Mandel (Mar 19 2021 at 21:05):

creating your test PR

view this post on Zulip Mark Iantorno (Mar 19 2021 at 21:05):

ah don't merge!

view this post on Zulip Mark Iantorno (Mar 19 2021 at 21:05):

just create the pr

view this post on Zulip Mark Iantorno (Mar 19 2021 at 22:14):

It worked?

view this post on Zulip Mark Iantorno (Mar 19 2021 at 22:14):

https://github.com/HL7/fhir/pull/1185

view this post on Zulip Mark Iantorno (Mar 19 2021 at 22:14):

I will monitor the issue

view this post on Zulip Mark Iantorno (Mar 19 2021 at 22:14):

can you send me the commit with the changes you made when you get the chance please

view this post on Zulip Josh Mandel (Mar 19 2021 at 22:25):

I'm not comfortable committing these changes because I still don't know why they would help :/

view this post on Zulip Josh Mandel (Mar 19 2021 at 22:25):

But I guess you're confirming that they have helped?

view this post on Zulip Josh Mandel (Mar 19 2021 at 22:27):

I'm going to revert and poke again with another PR

view this post on Zulip Mark Iantorno (Mar 19 2021 at 22:34):

Okay, makes sense

view this post on Zulip Mark Iantorno (Mar 19 2021 at 22:34):

The PR worked

view this post on Zulip Mark Iantorno (Mar 19 2021 at 22:35):

it is still throwing this error: mv: cannot stat '/home/***/uploading/www/branches/server_side_error_test_branch': No such file or directory
but the other ones are not occuring

view this post on Zulip Mark Iantorno (Mar 19 2021 at 22:35):

but that's fine as it's not causing the crash, the other ones were for sure

view this post on Zulip Mark Iantorno (Mar 19 2021 at 22:36):

I also put a || true in the upload command locally to deal with the \mv: cannot stat '/home/***/uploading/www/branches/server_side_error_test_branch': No such file or directory issue

view this post on Zulip Josh Mandel (Mar 19 2021 at 22:41):

I don't think the line with mv is ever returning an exit status code other than 0, is it? I'll instrument with more debugging output.

view this post on Zulip Mark Iantorno (Mar 19 2021 at 22:41):

kk

view this post on Zulip Mark Iantorno (Mar 19 2021 at 22:41):

it was jsut the other two then

view this post on Zulip Mark Iantorno (Mar 19 2021 at 22:41):

and they did not occur

view this post on Zulip Josh Mandel (Mar 19 2021 at 23:28):

OK, got it. Adding an "echo" to the rm arg makes it clear what's happening:

find  .  -path ./master -prune  -o -type d  -path './*'  -exec echo rm  -rf {}  \;
rm -rf ./b
rm -rf ./b/a
rm -rf ./b/b
rm -rf ./a
rm -rf ./a/b
rm -rf ./a/a

... we delete b and then try to delete b/a after b is gone.

view this post on Zulip Josh Mandel (Mar 19 2021 at 23:29):

https://github.com/FHIR/auto-ig-builder/commit/fe48dde73835582f97ec9a6b4a9c29223dc58c47 should take care of it. Deploying...

view this post on Zulip Josh Mandel (Mar 19 2021 at 23:32):

(build.fhir.org should be back in a minute... and back!)

view this post on Zulip Mark Iantorno (Mar 20 2021 at 00:36):

Wonderful. Thanks Josh.


Last updated: Apr 12 2022 at 19:14 UTC