Stream: committers
Topic: ci-build length limit
Grahame Grieve (Sep 05 2019 at 22:54):
hey @Josh Mandel - check http://build.fhir.org/ig/HL7/UTG/branches/master/build.log - what is the timeout limit for a IG build on the ci-build?
Josh Mandel (Sep 05 2019 at 23:27):
https://github.com/FHIR/auto-ig-builder/blob/master/triggers/ig-commit-trigger/job.json#L11
Josh Mandel (Sep 05 2019 at 23:27):
Looks like 30min
Grahame Grieve (Sep 05 2019 at 23:43):
hmmm but the log seems to be about 11 min... I'm investigating
Grahame Grieve (Sep 06 2019 at 01:04):
@Josh Mandel so I fixed something up, and committed it back to the repo, which should have kicked off another build.. but it did not. is there any way I can investigate why nothing happened?
Grahame Grieve (Sep 06 2019 at 01:45):
well, it didn't appear in the notification stream here, but it did try ,and the log is here: https://build.fhir.org/ig/HL7/UTG/branches/master/build.log
Grahame Grieve (Sep 06 2019 at 01:46):
there's something off about it - I cut the build time in half, but it still timed out....
Josh Mandel (Sep 06 2019 at 16:16):
It seems to have actually taken >30min and been killed, per the time limit. Is there a reason to think something else happened?
Rob Hausam (Sep 06 2019 at 17:08):
That's what I saw, too. Do we just extend the time limit? Currently the UTG CI build is still failed.
Grahame Grieve (Sep 06 2019 at 19:55):
well, we didn't get notification in the committers channel. that's the first issue. The second is what's going on, I'll investigate. but can we get notification?
Josh Mandel (Sep 06 2019 at 21:58):
Notifications: yes, this is doable but requires a bit of a re-write. I won't get to it this weekend unless someone thinks it's urgent.
Josh Mandel (Sep 06 2019 at 21:59):
Re: time limit, sure I'll update to 1h now.
Josh Mandel (Sep 06 2019 at 22:02):
Just updated at https://github.com/FHIR/auto-ig-builder/commit/1c341475da149452395eba71718b7d9f9248af92
Grahame Grieve (Sep 06 2019 at 22:02):
ok thanks
Josh Mandel (Sep 06 2019 at 22:03):
... and kicked off a build job for UTG.
Grahame Grieve (Sep 06 2019 at 22:11):
thx. 12 min for me. I just added late logging so I can see where it's going wrong
Josh Mandel (Sep 06 2019 at 22:14):
Hmm, that's a big difference in build times.
Josh Mandel (Sep 06 2019 at 22:14):
12 vs >30.
Grahame Grieve (Sep 06 2019 at 22:15):
y
Josh Mandel (Sep 06 2019 at 22:21):
Let's see about throwing some faster CPUs at it.
Josh Mandel (Sep 06 2019 at 22:21):
Just added a new node pool and over the weekend will switch over to it.
Grahame Grieve (Sep 06 2019 at 22:22):
ok thanks. btw, I just learnt something.... maven will happily release a snapshot when it doesn't compile...
Josh Mandel (Sep 06 2019 at 22:22):
Oh, that's delightful.
Grahame Grieve (Sep 06 2019 at 22:23):
this log from running UTG:
Connect to Terminology Server at http://tx.fhir.org (00:30.0066) Load Content (00:30.0577) Exception in thread "main" java.lang.Error: Unresolved compilation problems: The method Log(String) is undefined for the type Publisher The method Log(String) is undefined for the type Publisher at org.hl7.fhir.igtools.publisher.Publisher.load(Publisher.java:2541) at org.hl7.fhir.igtools.publisher.Publisher.createIg(Publisher.java:716) at org.hl7.fhir.igtools.publisher.Publisher.execute(Publisher.java:609) at org.hl7.fhir.igtools.publisher.Publisher.main(Publisher.java:6398)
Grahame Grieve (Sep 06 2019 at 23:23):
can we make more memory available?
Grahame Grieve (Sep 06 2019 at 23:32):
ok I have a log now. It runs ok for 15 min or so, and then it runs out of RAM and slows down to a crawl (managing memory). Now that you've extended the time out to 60 minutes, it finally runs out of heap space at 45 minutes. On the ci-build it has 4GB, but I'm running it locally with 7GB
Grahame Grieve (Sep 06 2019 at 23:32):
you might ask whether we need all of that RAM... I'll consider that, but it would be a major rework to change that
Rob Hausam (Sep 06 2019 at 23:35):
I've been running UTG locally with 8GB (6GB seemed not quite enough) - and that was before Ted made his latest updates
Ted Klein (Sep 06 2019 at 23:45):
When I added the value sets for CDA (>325 of them) it slowed down on my local build very considerably. My laptop howls and can do little else for about 25 minutes until finished. I have a 4 core i9 with 32Gb of RAM. If we cannot get this working over the weekend, I will update the CDA content manifest to only include a few dozen value sets and see if that will run to completion. Or if it will help to debug, I can do that tonight.
Grahame Grieve (Sep 07 2019 at 00:28):
I'm still running in 12 min with all the CDA value sets. And the build is about 1/2 that speed until it his the RAM wall
Josh Mandel (Sep 07 2019 at 01:09):
I can give it as much RAM as we want.. will have a go at quadrupling RAM tomorrow.
Grahame Grieve (Sep 07 2019 at 02:45):
wooah thanks
Rob Hausam (Sep 07 2019 at 02:59):
built successfully in 19 min with 8GB RAM
Josh Mandel (Sep 07 2019 at 16:36):
Okay, with 20GB RAM on the latest cascade lake CPU, builds in 13min (that includes the time for the cluster manager to notice that it needed to boot a new VM, and boot it).
Josh Mandel (Sep 07 2019 at 16:54):
(Er, 11 min when I actually set the right machine type, of which 9 are actual container running time.)
Last updated: Apr 12 2022 at 19:14 UTC