Stream: IG creation
Topic: tx.fhir.org
Grahame Grieve (Sep 23 2020 at 10:43):
It's run out of disk space again. :frown:. I'm reorganizing so it won't happen again - will be down for an hour or more
Grahame Grieve (Sep 23 2020 at 10:43):
I should be asleep. :tears:
Patrick Werner (Sep 23 2020 at 12:21):
thanks Grahame
Grahame Grieve (Oct 20 2020 at 03:38):
I am taking tx.fhir.org down for an upgrade
Grahame Grieve (Oct 20 2020 at 06:57):
ok, it's back. The new terminology server is a fairly large rewrite - shouldn't behave any different but totally different internally. I expect it to be more stable, though I haven't yet moved it to linux (which should help)
Max Masnick (Oct 20 2020 at 09:56):
@Grahame Grieve I'm getting a 500 error:
$ curl -vv tx.fhir.org
* Trying 104.196.166.17...
* TCP_NODELAY set
* Connected to tx.fhir.org (104.196.166.17) port 80 (#0)
> GET / HTTP/1.1
> Host: tx.fhir.org
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Connection: keep-alive
< Content-Type: text/html; charset=ISO-8859-1
< Content-Length: 28
< Date: Tue, 20 Oct 2020 09:55:39 GMT
< X-Request-Id: 369-8685
< Server: Health Intersections FHIR Server
<
* Connection #0 to host tx.fhir.org left intact
Unable to find homepage.html* Closing connection 0
Max Masnick (Oct 20 2020 at 10:23):
It looks like it might just be a configuration issue with the index page, which makes _genonce.sh
think the server is down. If I circumvent the connection check (by commenting out curl -sSf tx.fhir.org > /dev/null
), everything works.
(Though there were a bunch of changes to the txcache/*
files since I last ran a build ~10 hours ago...only mentioning in case this is unexpected with the rewrite.)
Grahame Grieve (Oct 20 2020 at 11:08):
oh hmm
Grahame Grieve (Oct 20 2020 at 11:08):
I'll look at the home page - didn't realise that was necessary
Jose Costa Teixeira (Oct 20 2020 at 11:10):
For the check on the _genonce scripts - is there another way to check if the server is online?
Grahame Grieve (Oct 20 2020 at 11:10):
no I'll restore the home page
Jose Costa Teixeira (Oct 20 2020 at 11:10):
ok
Grahame Grieve (Oct 20 2020 at 11:20):
should be good now
Max Masnick (Oct 20 2020 at 11:52):
@Grahame Grieve hmm, I'm now getting an error when I try to build the mCODE IG. This was working fine 30 minutes ago and I don't have any local changes...maybe something with the change you just made?
Generating Snapshots (00:21.0358)
Generating Narratives (00:26.0776)
Publishing Content Failed: null (00:35.0386)
(00:35.0387)
Use -? to get command line help (00:35.0387)
(00:35.0387)
Stack Dump (for debugging): (00:35.0387)
java.lang.NullPointerException
at org.hl7.fhir.r5.context.BaseWorkerContext.validateCodeBatch(BaseWorkerContext.java:808)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.getConceptsForCodes(ValueSetRenderer.java:1001)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.genInclude(ValueSetRenderer.java:825)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.generateComposition(ValueSetRenderer.java:748)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.render(ValueSetRenderer.java:84)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.render(ValueSetRenderer.java:73)
at org.hl7.fhir.r5.renderers.ResourceRenderer.render(ResourceRenderer.java:74)
at org.hl7.fhir.igtools.publisher.Publisher.generateNarratives(Publisher.java:1128)
at org.hl7.fhir.igtools.publisher.Publisher.loadConformance(Publisher.java:3782)
at org.hl7.fhir.igtools.publisher.Publisher.createIg(Publisher.java:870)
at org.hl7.fhir.igtools.publisher.Publisher.execute(Publisher.java:725)
at org.hl7.fhir.igtools.publisher.Publisher.main(Publisher.java:8342)
Max Masnick (Oct 20 2020 at 11:54):
If you need to repro locally, git clone git@github.com:HL7/fhir-mCODE-ig.git
and run _genonce
on master
Grahame Grieve (Oct 20 2020 at 14:04):
you know, I thought I had tested my new tx.fhir.org server up hill and down dale, but it turns out that the deployed config is ever so slightly different to my testing config, and bingo, you get a NPE
Grahame Grieve (Oct 20 2020 at 14:24):
interesting. starting from an empty cache, the mCode Ig makes 724 calls to tx.fhir.org
Grahame Grieve (Oct 20 2020 at 14:50):
ok should be good now
Niek van Galen (Oct 20 2020 at 15:33):
At Nictiz, we encountered the same problem. Just did a new run and the problem is solved indeed. Thanks @Grahame Grieve!
Max Masnick (Oct 20 2020 at 16:33):
Thank you Grahame!
Max Masnick (Oct 20 2020 at 21:14):
@Grahame Grieve with the mCODE IG, I'm seeing a lot more churn than previously in the txcache
, accompanied by very long build times (>20 min).
Previously I would get long build times periodically (e.g., if I hadn't run the publisher for a week) but not this long. Since this morning I've had slow "stale cache" builds more than once, and they seem slower than "stale cache" builds used to be.
Is there anything we can do to improve performance when running the publisher? With a hot cache we typically see build times around 3-5 minutes. In case it matters I've got plenty of bandwidth (>200Mbps down) and ample system resources (modern Mac).
Max Masnick (Oct 20 2020 at 21:15):
Here's some of the timing data from the publisher output:
empty txcache folder on master (16af2c3b00)
Times: loading: 00:01.0446, generate: 23:15.0327, narrative generation: 00:40.0604, realm-rules: 01:25.0364, previous-version: 11:58.0527, jekyll: 00:09.0136, validation: 01:02.0374 (#192), template: 00:01.0821 (#3) (23:38.0398)
re-run with warm cache on master (16af2c3b00)
Times: loading: 00:01.0568, generate: 02:42.0468, narrative generation: 00:30.0735, realm-rules: 00:01.0929, previous-version: 00:04.0340, jekyll: 00:07.0380, validation: 00:26.0246 (#192), template: 00:04.0029 (#3) (03:07.0146)
Grahame Grieve (Oct 20 2020 at 21:35):
that's pretty much what I'd expect?
Max Masnick (Oct 20 2020 at 22:59):
@Grahame Grieve will -tx n/a
still avoid the "stale cache" builds? That's the main issue for me -- waiting 20 min for the IG to build is problematic when troubleshooting or iterating. It's fine if it happens once a week, but multiple times a day is less fun :)
Grahame Grieve (Oct 20 2020 at 22:59):
-tx n/a doesn't apply to the IG publisher
Max Masnick (Oct 20 2020 at 23:00):
I guess this is a feature request then to have a way to trust the local cache when being totally up to date isn't important
Grahame Grieve (Oct 20 2020 at 23:00):
why are you having 'stale cache' builds? It's because you're not following recommended practice, and committing the txCache
Grahame Grieve (Oct 20 2020 at 23:00):
there's no flag to not trust the local cache
Max Masnick (Oct 20 2020 at 23:01):
I have no idea why I'm getting the stale cache builds. We are committing txcache
to the git repo
Grahame Grieve (Oct 20 2020 at 23:02):
it was empty when I synced it
Max Masnick (Oct 20 2020 at 23:04):
There's a bunch of files in it: https://github.com/HL7/fhir-mCODE-ig/tree/master/input-cache/txcache
Grahame Grieve (Oct 20 2020 at 23:04):
apparently I'm wrong.
Max Masnick (Oct 20 2020 at 23:14):
So I committed a big update to the txcache folder this morning (12 hours ago). If it's helpful, here are the changes to the txcache folder on my system _since_ then (i.e., during the last 12 hours): txcache_changes.txt
Grahame Grieve (Oct 20 2020 at 23:17):
ah. so. The cache gets wiped when the publisher sees that tx.fhir.org has been upgraded. Since I've been doing upgrades today, you'll get the cache being wiped out until you commit after rebuilding it
Max Masnick (Oct 20 2020 at 23:41):
Ahh ok, thanks
Grahame Grieve (Oct 20 2020 at 23:52):
one more coming - just found a bug
Grahame Grieve (Oct 21 2021 at 23:18):
It's down - I'm reconfiguring it to get more space. and hopefully more reliability
Grahame Grieve (Oct 21 2021 at 23:50):
back up and stable
Grahame Grieve (Oct 25 2021 at 00:58):
it's going down again for some more work
Grahame Grieve (Oct 25 2021 at 03:54):
normal service is restored
Diana_Ovelgoenne (Oct 25 2021 at 06:32):
Is it really working? I got these errors just now:
Previous POST attempt returned HTTP<422> from url -> http://tx.fhir.org/r4/CodeSystem/$validate-code?.
Previous POST attempt returned HTTP<413> from url -> http://tx.fhir.org/r4/.
Grahame Grieve (Oct 25 2021 at 06:36):
those are not errors
Grahame Grieve (Oct 25 2021 at 06:36):
status notes to alert the maintainers to something that has to be cleaned up
Diana_Ovelgoenne (Oct 25 2021 at 06:40):
After those there was a stack dump, and IG Publisher couldn't complete
Grahame Grieve (Oct 25 2021 at 06:40):
oh? well, what was that?
Diana_Ovelgoenne (Oct 25 2021 at 06:40):
Previous POST attempt returned HTTP<413> from url -> http://tx.fhir.org/r4/.
Publishing Content Failed: null (03:18.0439)
(03:18.0441)
Use -? to get command line help (03:18.0442)
(03:18.0444)
Stack Dump (for debugging): (03:18.0445)
java.lang.NullPointerException
at org.hl7.fhir.r5.context.BaseWorkerContext.validateCodeBatch(BaseWorkerContext.java:832)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.getConceptsForCodes(ValueSetRenderer.java:1009)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.genInclude(ValueSetRenderer.java:825)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.generateComposition(ValueSetRenderer.java:748)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.render(ValueSetRenderer.java:84)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.render(ValueSetRenderer.java:73)
at org.hl7.fhir.r5.renderers.ResourceRenderer.render(ResourceRenderer.java:74)
at org.hl7.fhir.igtools.publisher.Publisher.generateNarratives(Publisher.java:1159)
at org.hl7.fhir.igtools.publisher.Publisher.loadConformance(Publisher.java:3912)
at org.hl7.fhir.igtools.publisher.Publisher.createIg(Publisher.java:893)
at org.hl7.fhir.igtools.publisher.Publisher.execute(Publisher.java:748)
at org.hl7.fhir.igtools.publisher.Publisher.main(Publisher.java:8566)
Grahame Grieve (Oct 25 2021 at 06:46):
what version is this?
Diana_Ovelgoenne (Oct 25 2021 at 06:54):
I just updated the publisher to 1.1.83 and still having the same results, the 422 Errors are gone though
Grahame Grieve (Oct 25 2021 at 06:57):
what line does it report for the error?
Diana_Ovelgoenne (Oct 25 2021 at 07:06):
It is on Generating Narratives (02:02.0486)
But I tried with another project, and that one compiles fine, so I will dig in my pages, probably there is where the problem is laying, weird thing is that a couple of weeks ago it was working and I just changes a diagram this morning :S
Thanks for your help though
Diana_Ovelgoenne (Oct 25 2021 at 07:13):
So i tried with another project and same problem
Generating Narratives (02:32.0386)
Previous POST attempt returned HTTP<413> from url -> http://tx.fhir.org/r4/.
Publishing Content Failed: null
Is it something to do with big IGs?
Grahame Grieve (Oct 25 2021 at 09:44):
no. tell me which line....
Diana_Ovelgoenne (Oct 25 2021 at 10:42):
I am using the _genonce. bat to call the publisher, sorry for the stupid question, how do I get the line number it crashes so I give it to you? I tired the logging options, but the temp file isn't getting generated, and when trying to use the GUI of the Publisher, the button for Get Summary just fades away while executing the ini file.
Diana_Ovelgoenne (Oct 25 2021 at 13:21):
I tried now to use the ontoserver and my IG Publisher went through the Narratives (where it gets dumped with tx.fhir.org
Grahame Grieve (Oct 25 2021 at 18:30):
the line number is in the stack dump - I want the stack dump
Grahame Grieve (Oct 26 2021 at 18:37):
I'm about to take it down for an upgrade for a few minutes
Lloyd McKenzie (Oct 26 2021 at 19:19):
@Grahame Grieve - technically this should probably go out on #committers/announce
Grahame Grieve (Oct 26 2021 at 19:33):
probably but this is where everyone complains.
Lloyd McKenzie (Oct 26 2021 at 19:43):
If you post there, and they complain here, then we know whose committer access to revoke because they're not actively monitoring #committers/announce ... :wink:
Last updated: Apr 12 2022 at 19:14 UTC