Stream: committers
Topic: slowness
Josh Mandel (Feb 24 2016 at 22:27):
I missed the fact that our builds are taking close to an hour. This is scary.
Grahame Grieve (Feb 24 2016 at 22:33):
it's because I I had to undo the change to batch up the example generation
Grahame Grieve (Feb 24 2016 at 22:33):
since that started failing every time,
Grahame Grieve (Feb 24 2016 at 22:34):
instead, I run the example generation for each example.
Grahame Grieve (Feb 24 2016 at 22:34):
hammers the jvm
Grahame Grieve (Feb 24 2016 at 22:34):
and still fails regularly
Grahame Grieve (Feb 24 2016 at 22:34):
I'm going to have to replace the whole spawning a jvm. I have no choice. Will take me several days :-(
Grahame Grieve (Feb 29 2016 at 13:43):
so I did replace the code, and it seems to have helped a little
Grahame Grieve (Feb 29 2016 at 14:09):
@Josh Mandel : maybe we should approach travis again about this: BUILD FAILED
/home/travis/build/hl7-fhir/fhir-svn/build.xml:30: The following error occurred while executing this line:
/home/travis/build/hl7-fhir/fhir-svn/tools/java/org.hl7.fhir.tools.core/build.xml:90: Java returned: 137?
Grahame Grieve (Feb 29 2016 at 14:09):
can we pay for a beefier build machine?
Grahame Grieve (Feb 29 2016 at 14:09):
I no longer have any tools to approach this problem with...
Josh Mandel (Feb 29 2016 at 15:02):
I emailed the Travis CI team to discuss (and cc'd @Grahame Grieve )
Josh Mandel (Mar 04 2016 at 20:56):
The following single item is taking over 80 seconds on my (fast!) laptop:
[java] ...dictionary hspc-qnlab-de 0.073 190sec 936MB
Grahame Grieve (Mar 05 2016 at 00:39):
that's a big item. Generates a lot of files
Keith Boone (Mar 10 2016 at 17:15):
How long does a full build take these days? I'm approaching 100 minutes and still going, when it used to take 17.
Josh Mandel (Mar 10 2016 at 17:16):
It's a real problem. You can see a history of completed CI builds at https://travis-ci.org/hl7-fhir/fhir-svn/builds
Josh Mandel (Mar 10 2016 at 17:16):
But 100 minutes... locally? That's surprising and awful (and beyond the point of unusable). I was totally bummed that the CI builds are taking 25min.
Keith Boone (Mar 10 2016 at 17:17):
It's a complete build from scratch, to be sure that I have everything that I've missed in the last six weeks. Trying to get myself back on track after job shifting.
Josh Mandel (Mar 10 2016 at 17:17):
Generally the Travis CI servers are under-resourced compared to developper's laptop, so I'm surprised you're seeing longer.
Keith Boone (Mar 10 2016 at 17:17):
that may have some impact.
Josh Mandel (Mar 10 2016 at 17:17):
Yes. But every CI build is from scratch on a machine with no dependencies in place.
Keith Boone (Mar 10 2016 at 17:18):
I don't have a tank like you do. My laptop is probably under-resourced for a build as well
Josh Mandel (Mar 10 2016 at 17:18):
A tank?? My laptop weighs 1.3 kg
Josh Mandel (Mar 10 2016 at 17:19):
But seriously: you don't have <4G RAM, right?
Brett Marquard (Mar 10 2016 at 17:19):
My from scratch build is about 55 mins
Keith Boone (Mar 10 2016 at 17:21):
Yeah, but you probably have 32Gb ram with 1Tb of SSD storage.
Josh Mandel (Mar 10 2016 at 17:22):
Not quite, but I'll take your point ;-)
Josh Mandel (Mar 10 2016 at 17:22):
I'll time my local build.
Keith Boone (Mar 10 2016 at 17:23):
I have 8gb, with 230gb SSD, so I would actually expect it not to be that slow.
Keith Boone (Mar 10 2016 at 17:23):
But this isn't what I'd spec out as a developers box.
Josh Mandel (Mar 10 2016 at 17:25):
Right, fair enough. The FHIR build only allocates <2G RAM (max) so if you have >2 GB free, the specific amoutn shouldn't have an effect (unless you tweak the build files).
Keith Boone (Mar 10 2016 at 17:25):
Hmm, I wonder if a tweak would help
Keith Boone (Mar 10 2016 at 17:26):
It's also only really using one of four cores as far as I can tell.
Josh Mandel (Mar 10 2016 at 17:27):
Yes, there are massive opportunities to parallelize the process (specifically, the validation of examples is slow and could occur entirely in parallel)
Josh Mandel (Mar 10 2016 at 17:28):
(I think before we invest in that we should get the whole build running in a "build tool" like ant/mvn/gradle/sbt/whatever, rather than re-writing all this logic ourselves. But this is a massive undertaking that nobody has time/funding/perseverance to do)
John Moehrke (Mar 10 2016 at 17:28):
a build uses all 4 of my cores... bring up task-manager
Keith Boone (Mar 10 2016 at 17:28):
What funding is necessary
John Moehrke (Mar 10 2016 at 17:29):
takes about an hour.
Keith Boone (Mar 10 2016 at 17:29):
It's using all four cores, but only consuming about 25% of CPU...
Josh Mandel (Mar 10 2016 at 17:29):
I mean, one could imagine hiring a firm to re-organize the build tool. But the contract rates would be $$$. And currently probably only Grahame understands how it works.
John Moehrke (Mar 10 2016 at 17:30):
bring the build to front and don't do anything else.
John Moehrke (Mar 10 2016 at 17:30):
go read a book...
Keith Boone (Mar 10 2016 at 17:31):
BUILD SUCCESSFUL
Total time: 103 minutes 29 seconds
Josh Mandel (Mar 10 2016 at 17:32):
Wow. Can you post the log? Would be good to see if it was just uniformly slow, or whether there were particular "cold spots".
Josh Mandel (Mar 10 2016 at 17:41):
Yeah, for me the build is 16m41s. Log online here
John Moehrke (Mar 10 2016 at 17:57):
full build in 17 minutes? that would be reasonable. Even the build server takes almost a full hour. What is you secret?
Josh Mandel (Mar 10 2016 at 17:59):
The server build currently takes 25-27min.
Josh Mandel (Mar 10 2016 at 18:01):
I have a fast laptop, sure (2.5Ghz i7-6500U, 16GB RAM, and SSD -- it's a Dell XPS 13-9350).
John Moehrke (Mar 10 2016 at 18:06):
the big difference from my laptop is your memory... and watching my system it is very memory bound, not CPU.
Josh Mandel (Mar 10 2016 at 18:07):
Yes, I think there are a few steps that are particularly memory bound. When I tried dialing down the memory for the Travis CI build I caused a few steps to get very slow and then eventually crash with Java GC errors.
Josh Mandel (Mar 10 2016 at 18:08):
That's why I'm curious to see Keith's 103-minute log.
John Moehrke (Mar 10 2016 at 18:09):
I just got full build in 38:10
Jason Walonoski (Mar 10 2016 at 18:20):
Full build
BUILD SUCCESSFUL
Total time: 15 minutes 17 seconds
Jason Walonoski (Mar 10 2016 at 18:21):
2015 MacBook Pro; 2.5 GHz i7; 16 GB RAM
John Moehrke (Mar 10 2016 at 18:24):
note my 38:10 on a Dell E7240 (2.5 Ghz I5-4300U, SSD, but only 4 GB RAM)
Grahame Grieve (Mar 10 2016 at 19:46):
my build is round 15 min, except on hot days when my computer is step limited due to GPU heat. Then it takes 45min
Rob Hausam (Mar 10 2016 at 20:17):
my full build this morning was 12 minutes 44 seconds - they've been running between 12 and 15 min.
running on a Linux desktop box with 16G RAM (which also hosts a Windows 10 VM, etc.)
John Moehrke (Mar 10 2016 at 22:04):
Hmm GE only ones skow? Others with encrypted drives?
Josh Mandel (Mar 10 2016 at 22:16):
I use whole-disk encryption on linux.
Keith Boone (Mar 10 2016 at 22:39):
See the log here: baseline-build.log
Lloyd McKenzie (Mar 10 2016 at 23:02):
What difference does it make if you constrain out all the non-HL7-maintained implementation guides?
Grahame Grieve (Mar 10 2016 at 23:04):
probably a 1/4 of the time is doing implemnetation guides
Grahame Grieve (Mar 10 2016 at 23:04):
one thing to do if it's running really slow is to goto %temp% in your explorer, and delete everything
Grahame Grieve (Mar 10 2016 at 23:04):
ntfs slows to a crawl when there's >200k files in a folder
Keith Boone (Mar 10 2016 at 23:07):
That only deleted about 10,000 files and 3GB of data so far...
Grahame Grieve (Mar 11 2016 at 22:59):
if people really think it would be better, I could parallelise the validation. I could create one thread per core. Then. instead of taking ages, it would be quicker, but your computer would be pretty much unusable while it was validating
Josh Mandel (Mar 12 2016 at 00:23):
I think we shouldn't invest in loads of customization here because things will be hard to control and debug. Would rather invest in getting the build operating via standardized build tooling (like gradle or mvn, per my comments from the other day).
Grahame Grieve (Mar 12 2016 at 09:39):
I know that standard tooling is your magic answer for everything, but what you actually mean is, redo the tools with complete dependency tracking and compjte
Grahame Grieve (Mar 12 2016 at 09:40):
completely modular build activities. which would end up slower
Grahame Grieve (Mar 12 2016 at 09:40):
but would be easier to manage, if I could figure out the dependencies
Paul Knapp (Mar 12 2016 at 12:53):
+1 for multi threading.
Paul Knapp (Mar 12 2016 at 13:08):
Does vscache need to be part of the svn maintained files?
Josh Mandel (Mar 12 2016 at 16:06):
Not magic -- hard work. This is an engineering project to be sure, and I'm pretty confident nobody in the core FHIR community is going to take it on.
Bryn Rhodes (Mar 12 2016 at 18:34):
+1 for standardized build tooling. Slowness aside, it would make it easier for people to onboard and help deal with things like slowness.
Grahame Grieve (Mar 12 2016 at 20:13):
Paul - yes. Caching the results from the terminology service significantly decreases the length of the build time, and allows you to run offline, though that's still a challenge for me
James Agnew (Mar 14 2016 at 15:16):
Does that mean the vscache files need to be in SVN though? I've been wondering about this for a while.. Every time I try to do an SVN update using command line SVN it seems to take about 30 manual keypresses to get past all the vscache files...
Grahame Grieve (Mar 14 2016 at 19:57):
30 manual keypresses - i don't understand.... and the point is for them to be in svn
Rob Hausam (Mar 14 2016 at 20:06):
My experience seems similar to James' - using command line svn I frequently have to manually resolve large numbers (sometimes 20+) of vscache file conflicts.
Grahame Grieve (Mar 14 2016 at 20:13):
I'll have a deeper look at this- you shouldn't be resolving conflicts; it means I have something wrong in the logic. Maybe timezones are tripping me oover again
Ewout Kramer (Mar 14 2016 at 20:23):
Oh, I have the same. Just thought it was just me since no one complained ;-)
David McKillop (Mar 14 2016 at 22:51):
FYI Grahame - I'm not using the command line SVN and I had approx 50 vscache lines to resolve this morning. I just CTRL click them and "resolve to their's" in batches, so it's not a biggie to me.
Lloyd McKenzie (Mar 14 2016 at 22:57):
Me too. I regularly delete everything in vscache and then run an update because there's always collisions. Quite happy if this doesn't need to continue . . . :)
Keith Boone (Mar 15 2016 at 19:05):
I've complained about those keypresses... one time I had 288 unresolved issues that I had to press through (after two months of inactivity)
Grahame Grieve (Mar 16 2016 at 21:45):
can someone help me out with this?
- update to the latest version in svn
- run a full build
- pick a file in vscache that has changed during the build, and send it to me, so I can check the diff
Ewout Kramer (Mar 16 2016 at 21:46):
Some not only change, but go missing!
Ewout Kramer (Mar 16 2016 at 21:46):
actionlist.json -> missing
Grahame Grieve (Mar 16 2016 at 21:47):
'go missing' - how? I don't understand that, since there's no deletion in vscache
Ewout Kramer (Mar 16 2016 at 21:49):
Well, I had a build with nothing to commit, but after running the build, there are 187(!) changes: 122
deletions and 65 changed files
Ewout Kramer (Mar 16 2016 at 21:49):
Just to be sure, I'll do it again.
Ewout Kramer (Mar 16 2016 at 21:50):
(don't know whether it was a full build)
Ewout Kramer (Mar 16 2016 at 21:51):
How do I trigger a full build again?
Grahame Grieve (Mar 16 2016 at 22:04):
kill the build and run it again - that's the easiest way
Ewout Kramer (Mar 16 2016 at 22:10):
Yes. So, now just wait for the build to complete. It's been running 15 mins, so about 15 more....
Ewout Kramer (Mar 16 2016 at 22:29):
Ok. done! 28 minutes, 30 seconds. Now, no deleted files, let's take address-use.json:
Index: address-use.json
===================================================================
--- address-use.json (revision 7854)
+++ address-use.json (working copy)
@@ -2,25 +2,25 @@
"url": "http://hl7.org/fhir/ValueSet/address-use",
"outcomes": [
{
- "hash": "|null|null|work|null",
+ "hash": "|null|null|home|null",
"severity": null,
"message": null,
"definition": {
"abstract": false,
- "code": "work",
+ "code": "home",
"definition": null,
- "display": "Work"
+ "display": "Home"
}
},
{
- "hash": "|null|null|home|null",
+ "hash": "|null|null|work|null",
"severity": null,
"message": null,
"definition": {
"abstract": false,
- "code": "home",
+ "code": "work",
"definition": null,
- "display": "Home"
+ "display": "Work"
}
}
]
Ewout Kramer (Mar 16 2016 at 22:30):
Some more: changedfileswitha.zip
Richard Ettema (Mar 16 2016 at 22:49):
FYI - I just completed a successful full build on my local workstation against the latest svn revision and I have 57 changed/updated files under the validation.cache folder. Would you like me to zip them up for you?
Grahame Grieve (Mar 16 2016 at 22:56):
yes Richard, thanks. Ewout's files aren't actually different to mine...
Grahame Grieve (Mar 16 2016 at 22:56):
so you don't have files that have changed in the vscache folder itself?
Ewout Kramer (Mar 16 2016 at 22:58):
All these files are from vscache, and they show as modified in SVN...
Richard Ettema (Mar 16 2016 at 22:58):
Grahame, none of the files in my local vscache folder were changed by my last build - only files in vscache/validation.cache. I'll send you the zip file via email.
Ewout Kramer (Mar 16 2016 at 22:59):
Yes, they are inside validation.cache indeed
Ewout Kramer (Mar 16 2016 at 22:59):
The diff I included in zulip shows that home and work are reversed, so the same codes, but a different order?
Richard Ettema (Mar 16 2016 at 23:00):
FYI - my local workstation is a DELL Precision M4500 running Win7 Pro with 12 Gb memory and a quad-core, hyper-threaded Intel CPU (showing 8 cores). Total build time was 38 minutes 26 seconds.
Ewout Kramer (Mar 16 2016 at 23:03):
So, my light-weight surface pro 3 wasn't doing that bad at 28 minutes!
Richard Ettema (Mar 16 2016 at 23:04):
Ewout, yes, I just did a diff on administrative-gender.json and the ordering has changed; i.e. all the same values are there, just re-arranged.
Richard Ettema (Mar 16 2016 at 23:08):
I think my CPU clock speed may be the issue - only 1.73 GHz. :(
Grahame Grieve (Mar 17 2016 at 00:11):
ok. my computer is committing. once it's committed, Richard, can you update and try again? you shouldn't get any changed files in validation.cache after this
Richard Ettema (Mar 17 2016 at 02:21):
Grahame, dropped offline for a bit... I'll start another full build in a few minutes.
Richard Ettema (Mar 17 2016 at 03:10):
Grahame, my local full build just finished. I don't see any changes in vscache or vscache/validation.cache. Looks good now. :) ttyl
Ewout Kramer (Mar 17 2016 at 09:39):
I'll run an update and retry too...
Brian Postlethwaite (Mar 17 2016 at 09:40):
Had no issues here after a rebuild too
Grahame Grieve (Mar 17 2016 at 10:47):
great. well, that's something. It should make the build faster a little
Paul Knapp (Mar 17 2016 at 19:31):
Yes mine now builds in 19 minutes and change for a full build, but I got 2 new files in vscache, operation-parameter-type and icd-10
Grahame Grieve (Mar 17 2016 at 19:53):
guess you are using codes in your examples that are not committed yet
Paul Knapp (Mar 18 2016 at 10:40):
Ok I'll look into that.
David McKillop (Mar 21 2016 at 01:05):
Grahame, FYI:
1) I've done an update and didn't get any vscache file clashes.
2) doing a full build took just under 23 minutes.
Grahame Grieve (Mar 21 2016 at 01:06):
ok thanks
Paul Knapp (Mar 25 2016 at 08:57):
Full build now down to 16 minutes 30 seconds, nice.
James Agnew (May 05 2016 at 11:12):
John is do you have a virus checker running? I know for a while our corporate-installed virus checker had a policy of scanning every JAR file every time upon access at one point. This made Java builds 400% slower than they should be.
Grahame Grieve (May 05 2016 at 11:13):
?
James Agnew (May 05 2016 at 11:36):
Like, we had a virus checker running on every laptop supplied by work at one point, and that virus checker (McAfee) had a mandatory policy that every time any application read from a Java JAR file, it would first open that JAR and scan every class inside for viruses (sloooooow). That sucked.
Grahame Grieve (May 05 2016 at 11:43):
I was actually asking about the context of what you wrote
James Agnew (May 05 2016 at 12:17):
Oh, I'm just thinking if John's local build is taking way longer than the CI server does, possibly a similar virus checker could be the culprit.
James Agnew (May 05 2016 at 12:18):
Oh, and I now see I was replying to a comment from weeks ago. I still suck at Zulip obviously. :)
Grahame Grieve (May 05 2016 at 12:19):
I didn't look back far enough - could be me that sucks!
John Moehrke (May 10 2016 at 13:27):
well, no longer have corporate laptop.... but new win10 laptop has only 4 Gig ram, so build takes 90-110 minutes... When I get really worried I will get more ram, and replace spinning HD with SSD. --> Still worthy discussion as this build is approaching too big.
Grahame Grieve (May 10 2016 at 13:59):
past that a long time ago, but not obvious what to do about it
John Moehrke (May 14 2016 at 18:13):
SSD and 12 gig memory have moved build from 110 minutes to 36 minutes.
Keith Boone (May 16 2016 at 12:24):
And no more disk encryption right?
John Moehrke (May 16 2016 at 12:42):
The change from 110 minutes to 38 minutes was purely the move to 12 Gig and SSD.... but yes, this is a personal $400 (+$100) ASUS laptop, so no hard drive encryption.
Last updated: Apr 12 2022 at 19:14 UTC