Stream: committers/git-help
Topic: Bash exited with code '137'.
John Moehrke (Oct 31 2018 at 21:45):
my pull failed with Bash exited with code 137 -- Please help https://github.com/HL7/fhir/pull/195
Josh Mandel (Oct 31 2018 at 21:47):
https://fhir-build.visualstudio.com/build.fhir.org/_build/results?buildId=1154&view=logs has the full logs -- did you review?
Josh Mandel (Oct 31 2018 at 21:48):
Looks like a terminology cache issue
John Moehrke (Oct 31 2018 at 21:48):
could be. I looked and can't see anything I can act upon.
John Moehrke (Oct 31 2018 at 21:48):
so, if it is a term cache issue... is there a way to kick it to try again?
Josh Mandel (Oct 31 2018 at 21:48):
@Grahame Grieve does this mean anything to you?
Josh Mandel (Oct 31 2018 at 21:49):
Well, if there a terminology issue, rerunning won't change the outcome
John Moehrke (Oct 31 2018 at 21:49):
2018-10-31T21:40:36.5102961Z [java] -tx cache miss: $validate {null#logic-library: "null"}: "null" for Include All codes from http://terminology.hl7.org/CodeSystem/library-type
Josh Mandel (Oct 31 2018 at 21:49):
But FYI you can always kick off another build by pushing another commit.
John Moehrke (Oct 31 2018 at 21:57):
so, make a small change that I push in another commit on the branch? (sorry I am struggling with GIT terms)
Josh Mandel (Oct 31 2018 at 21:58):
That'd do it, yeah. I can also kick this off for you, but I don't see how a rebuild would change anything here.
John Moehrke (Oct 31 2018 at 21:59):
the change I made had nothing to do with terminology... so I expect the problem was not foundational to my change
Rob Hausam (Oct 31 2018 at 22:00):
I'm not sure if the -tx cache miss is the issue. There are several of these -tx cache miss notifications that I typically see in the build that don't normally seem to be an issue. The build got to [java] ...validate library-exclusive-breastfeeding-cds-logic
before it failed.
Josh Mandel (Oct 31 2018 at 22:00):
Huh, okay. That's interesting then.
Josh Mandel (Oct 31 2018 at 22:00):
Maybe a memory error.
John Moehrke (Oct 31 2018 at 22:01):
the end of the log is rather abrupt
Josh Mandel (Oct 31 2018 at 22:01):
I'll kick off a rebuild right now
John Moehrke (Oct 31 2018 at 22:01):
thanks
Rob Hausam (Oct 31 2018 at 22:01):
sounds reasonable
Josh Mandel (Oct 31 2018 at 22:02):
building.
Josh Mandel (Oct 31 2018 at 22:03):
In general, we have a lot of trouble constraining the amount of memory the build process uses. We have a
_JAVA_OPTIONS=-Xmx3200m
env var which (I thought) was supposed to limit usage, but even with just two builds running at the same time, our 16Gb VM sometimes kills one.
Josh Mandel (Oct 31 2018 at 22:06):
16999 ubuntu 20 0 7266128 3.931g 24152 S 335.5 25.1 6:51.21 java 12247 ubuntu 20 0 7480120 4.743g 24044 S 199.3 30.3 82:16.41 java
I'm not sure how it gets to 4.7Gb with the limits that should be imposed.
Josh Mandel (Oct 31 2018 at 22:08):
Yeah, these two jobs are pushing very close to 12Gb, which is the amount of free RAM, even though they should be constrained to <6.4Gb between them.
Josh Mandel (Oct 31 2018 at 22:10):
Java helpfully notes:
2018-10-31T21:41:30.5755733Z [java] Picked up _JAVA_OPTIONS: -Xmx3200m
in its output -- but then doesn't respect the -Xmx
limit. Am I misunderstanding how this limit is fundamentally supposed to work? @Grahame Grieve @James Agnew ? Is there a different way to prevent Java from eating more than the available RAM?
Rob Hausam (Oct 31 2018 at 22:15):
I always thought that limit did work properly when I've used it (as when I need a lot of RAM for an OWL reasoner to classify a large ontology). In those cases for me the JVM (normally using Oracle) hasn't seemed to exceed the set limit. So I don't know why it would in this case, since it says that it successfully "picked up" the option.
Rob Hausam (Oct 31 2018 at 22:23):
The -Xmx option applies to only the heap space, so the process can use more RAM than that (not sure how much more is possible). And if you exceed the heap space you'll get an out of memory error, so that by itself wouldn't actually solve the problem.
Grahame Grieve (Oct 31 2018 at 22:42):
sig137 = running out of heap
Grahame Grieve (Oct 31 2018 at 22:42):
what's running out of memory? local, or the build?
John Moehrke (Oct 31 2018 at 22:47):
the second attempt built successfully. so it seems it was an infrastructural temporary failure.
Josh Mandel (Oct 31 2018 at 23:33):
The build is running out, when two competing Java builds run simultaneously. There's plenty of RAM on the machine (16Gb) to accommodate both, but I can't figure out how to limit each process effectively.
Josh Mandel (Oct 31 2018 at 23:33):
I can always just throwing more VMs at the problem, but I'd like to understand this.
Josh Mandel (Nov 01 2018 at 00:52):
jinfo
on a running build (currently using 5.5Gb RAM) shows me:
VM Flags: Non-default VM flags: -XX:CICompilerCount=4 -XX:CompressedClassSpaceSize=796917760 -XX:InitialHeapSize=1468006400 -XX:MaxHeapSize=3355443200 -XX:MaxMetaspaceSize=805306368 -XX:MaxNewSize=1118306304 -XX:MinHeapDeltaBytes=524288 -XX:NewSize=489160704 -XX:OldSize=978845696 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseParallelGC Command line: -Xmx2000m -Xms1400m -XX:MaxMetaspaceSize=768m -Djava.awt.headless=true -Djava.util.logging.config.file=logging.properties -Xmx10000m -Xmx3200m
... whereas I would have thought this should be limited to Xmx + MaxMetaspaceSize == 3200 + 768 == ~4000Mb.
Josh Mandel (Nov 01 2018 at 01:03):
Looking at the running process with https://github.com/patric-r/jvmtop I see "NHMAX" is 1700Mb, so that would account for total usage approaching 5Gb, though I'm not sure what factors contribute to that NHMAX value.
Josh Mandel (Nov 01 2018 at 01:05):
It's getNonHeapMemoryUsage ----> getMax()
Last updated: Apr 12 2022 at 19:14 UTC