FHIR Chat · tx.fhir.org · IG creation

Stream: IG creation

Topic: tx.fhir.org


view this post on Zulip Grahame Grieve (Sep 23 2020 at 10:43):

It's run out of disk space again. :frown:. I'm reorganizing so it won't happen again - will be down for an hour or more

view this post on Zulip Grahame Grieve (Sep 23 2020 at 10:43):

I should be asleep. :tears:

view this post on Zulip Patrick Werner (Sep 23 2020 at 12:21):

thanks Grahame

view this post on Zulip Grahame Grieve (Oct 20 2020 at 03:38):

I am taking tx.fhir.org down for an upgrade

view this post on Zulip Grahame Grieve (Oct 20 2020 at 06:57):

ok, it's back. The new terminology server is a fairly large rewrite - shouldn't behave any different but totally different internally. I expect it to be more stable, though I haven't yet moved it to linux (which should help)

view this post on Zulip Max Masnick (Oct 20 2020 at 09:56):

@Grahame Grieve I'm getting a 500 error:

$ curl -vv tx.fhir.org
*   Trying 104.196.166.17...
* TCP_NODELAY set
* Connected to tx.fhir.org (104.196.166.17) port 80 (#0)
> GET / HTTP/1.1
> Host: tx.fhir.org
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 500 Internal Server Error
< Connection: keep-alive
< Content-Type: text/html; charset=ISO-8859-1
< Content-Length: 28
< Date: Tue, 20 Oct 2020 09:55:39 GMT
< X-Request-Id: 369-8685
< Server: Health Intersections FHIR Server
<
* Connection #0 to host tx.fhir.org left intact
Unable to find homepage.html* Closing connection 0

view this post on Zulip Max Masnick (Oct 20 2020 at 10:23):

It looks like it might just be a configuration issue with the index page, which makes _genonce.sh think the server is down. If I circumvent the connection check (by commenting out curl -sSf tx.fhir.org > /dev/null), everything works.

(Though there were a bunch of changes to the txcache/* files since I last ran a build ~10 hours ago...only mentioning in case this is unexpected with the rewrite.)

view this post on Zulip Grahame Grieve (Oct 20 2020 at 11:08):

oh hmm

view this post on Zulip Grahame Grieve (Oct 20 2020 at 11:08):

I'll look at the home page - didn't realise that was necessary

view this post on Zulip Jose Costa Teixeira (Oct 20 2020 at 11:10):

For the check on the _genonce scripts - is there another way to check if the server is online?

view this post on Zulip Grahame Grieve (Oct 20 2020 at 11:10):

no I'll restore the home page

view this post on Zulip Jose Costa Teixeira (Oct 20 2020 at 11:10):

ok

view this post on Zulip Grahame Grieve (Oct 20 2020 at 11:20):

should be good now

view this post on Zulip Max Masnick (Oct 20 2020 at 11:52):

@Grahame Grieve hmm, I'm now getting an error when I try to build the mCODE IG. This was working fine 30 minutes ago and I don't have any local changes...maybe something with the change you just made?

Generating Snapshots                                                             (00:21.0358)
Generating Narratives                                                            (00:26.0776)
Publishing Content Failed: null                                                  (00:35.0386)
                                                                                 (00:35.0387)
Use -? to get command line help                                                  (00:35.0387)
                                                                                 (00:35.0387)
Stack Dump (for debugging):                                                      (00:35.0387)
java.lang.NullPointerException
    at org.hl7.fhir.r5.context.BaseWorkerContext.validateCodeBatch(BaseWorkerContext.java:808)
    at org.hl7.fhir.r5.renderers.ValueSetRenderer.getConceptsForCodes(ValueSetRenderer.java:1001)
    at org.hl7.fhir.r5.renderers.ValueSetRenderer.genInclude(ValueSetRenderer.java:825)
    at org.hl7.fhir.r5.renderers.ValueSetRenderer.generateComposition(ValueSetRenderer.java:748)
    at org.hl7.fhir.r5.renderers.ValueSetRenderer.render(ValueSetRenderer.java:84)
    at org.hl7.fhir.r5.renderers.ValueSetRenderer.render(ValueSetRenderer.java:73)
    at org.hl7.fhir.r5.renderers.ResourceRenderer.render(ResourceRenderer.java:74)
    at org.hl7.fhir.igtools.publisher.Publisher.generateNarratives(Publisher.java:1128)
    at org.hl7.fhir.igtools.publisher.Publisher.loadConformance(Publisher.java:3782)
    at org.hl7.fhir.igtools.publisher.Publisher.createIg(Publisher.java:870)
    at org.hl7.fhir.igtools.publisher.Publisher.execute(Publisher.java:725)
    at org.hl7.fhir.igtools.publisher.Publisher.main(Publisher.java:8342)

view this post on Zulip Max Masnick (Oct 20 2020 at 11:54):

If you need to repro locally, git clone git@github.com:HL7/fhir-mCODE-ig.git and run _genonce on master

view this post on Zulip Grahame Grieve (Oct 20 2020 at 14:04):

you know, I thought I had tested my new tx.fhir.org server up hill and down dale, but it turns out that the deployed config is ever so slightly different to my testing config, and bingo, you get a NPE

view this post on Zulip Grahame Grieve (Oct 20 2020 at 14:24):

interesting. starting from an empty cache, the mCode Ig makes 724 calls to tx.fhir.org

view this post on Zulip Grahame Grieve (Oct 20 2020 at 14:50):

ok should be good now

view this post on Zulip Niek van Galen (Oct 20 2020 at 15:33):

At Nictiz, we encountered the same problem. Just did a new run and the problem is solved indeed. Thanks @Grahame Grieve!

view this post on Zulip Max Masnick (Oct 20 2020 at 16:33):

Thank you Grahame!

view this post on Zulip Max Masnick (Oct 20 2020 at 21:14):

@Grahame Grieve with the mCODE IG, I'm seeing a lot more churn than previously in the txcache, accompanied by very long build times (>20 min).

Previously I would get long build times periodically (e.g., if I hadn't run the publisher for a week) but not this long. Since this morning I've had slow "stale cache" builds more than once, and they seem slower than "stale cache" builds used to be.

Is there anything we can do to improve performance when running the publisher? With a hot cache we typically see build times around 3-5 minutes. In case it matters I've got plenty of bandwidth (>200Mbps down) and ample system resources (modern Mac).

view this post on Zulip Max Masnick (Oct 20 2020 at 21:15):

Here's some of the timing data from the publisher output:

empty txcache folder on master (16af2c3b00)
            Times: loading: 00:01.0446, generate: 23:15.0327, narrative generation: 00:40.0604, realm-rules: 01:25.0364, previous-version: 11:58.0527, jekyll: 00:09.0136, validation: 01:02.0374 (#192), template: 00:01.0821 (#3) (23:38.0398)

re-run with warm cache on master (16af2c3b00)
            Times: loading: 00:01.0568, generate: 02:42.0468, narrative generation: 00:30.0735, realm-rules: 00:01.0929, previous-version: 00:04.0340, jekyll: 00:07.0380, validation: 00:26.0246 (#192), template: 00:04.0029 (#3) (03:07.0146)

view this post on Zulip Grahame Grieve (Oct 20 2020 at 21:35):

that's pretty much what I'd expect?

view this post on Zulip Max Masnick (Oct 20 2020 at 22:59):

@Grahame Grieve will -tx n/a still avoid the "stale cache" builds? That's the main issue for me -- waiting 20 min for the IG to build is problematic when troubleshooting or iterating. It's fine if it happens once a week, but multiple times a day is less fun :)

view this post on Zulip Grahame Grieve (Oct 20 2020 at 22:59):

-tx n/a doesn't apply to the IG publisher

view this post on Zulip Max Masnick (Oct 20 2020 at 23:00):

I guess this is a feature request then to have a way to trust the local cache when being totally up to date isn't important

view this post on Zulip Grahame Grieve (Oct 20 2020 at 23:00):

why are you having 'stale cache' builds? It's because you're not following recommended practice, and committing the txCache

view this post on Zulip Grahame Grieve (Oct 20 2020 at 23:00):

there's no flag to not trust the local cache

view this post on Zulip Max Masnick (Oct 20 2020 at 23:01):

I have no idea why I'm getting the stale cache builds. We are committing txcache to the git repo

view this post on Zulip Grahame Grieve (Oct 20 2020 at 23:02):

it was empty when I synced it

view this post on Zulip Max Masnick (Oct 20 2020 at 23:04):

There's a bunch of files in it: https://github.com/HL7/fhir-mCODE-ig/tree/master/input-cache/txcache

view this post on Zulip Grahame Grieve (Oct 20 2020 at 23:04):

apparently I'm wrong.

view this post on Zulip Max Masnick (Oct 20 2020 at 23:14):

So I committed a big update to the txcache folder this morning (12 hours ago). If it's helpful, here are the changes to the txcache folder on my system _since_ then (i.e., during the last 12 hours): txcache_changes.txt

view this post on Zulip Grahame Grieve (Oct 20 2020 at 23:17):

ah. so. The cache gets wiped when the publisher sees that tx.fhir.org has been upgraded. Since I've been doing upgrades today, you'll get the cache being wiped out until you commit after rebuilding it

view this post on Zulip Max Masnick (Oct 20 2020 at 23:41):

Ahh ok, thanks

view this post on Zulip Grahame Grieve (Oct 20 2020 at 23:52):

one more coming - just found a bug

view this post on Zulip Grahame Grieve (Oct 21 2021 at 23:18):

It's down - I'm reconfiguring it to get more space. and hopefully more reliability

view this post on Zulip Grahame Grieve (Oct 21 2021 at 23:50):

back up and stable

view this post on Zulip Grahame Grieve (Oct 25 2021 at 00:58):

it's going down again for some more work

view this post on Zulip Grahame Grieve (Oct 25 2021 at 03:54):

normal service is restored

view this post on Zulip Diana_Ovelgoenne (Oct 25 2021 at 06:32):

Is it really working? I got these errors just now:
Previous POST attempt returned HTTP<422> from url -> http://tx.fhir.org/r4/CodeSystem/$validate-code?.
Previous POST attempt returned HTTP<413> from url -> http://tx.fhir.org/r4/.

view this post on Zulip Grahame Grieve (Oct 25 2021 at 06:36):

those are not errors

view this post on Zulip Grahame Grieve (Oct 25 2021 at 06:36):

status notes to alert the maintainers to something that has to be cleaned up

view this post on Zulip Diana_Ovelgoenne (Oct 25 2021 at 06:40):

After those there was a stack dump, and IG Publisher couldn't complete

view this post on Zulip Grahame Grieve (Oct 25 2021 at 06:40):

oh? well, what was that?

view this post on Zulip Diana_Ovelgoenne (Oct 25 2021 at 06:40):

Previous POST attempt returned HTTP<413> from url -> http://tx.fhir.org/r4/.
Publishing Content Failed: null (03:18.0439)
(03:18.0441)
Use -? to get command line help (03:18.0442)
(03:18.0444)
Stack Dump (for debugging): (03:18.0445)
java.lang.NullPointerException
at org.hl7.fhir.r5.context.BaseWorkerContext.validateCodeBatch(BaseWorkerContext.java:832)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.getConceptsForCodes(ValueSetRenderer.java:1009)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.genInclude(ValueSetRenderer.java:825)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.generateComposition(ValueSetRenderer.java:748)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.render(ValueSetRenderer.java:84)
at org.hl7.fhir.r5.renderers.ValueSetRenderer.render(ValueSetRenderer.java:73)
at org.hl7.fhir.r5.renderers.ResourceRenderer.render(ResourceRenderer.java:74)
at org.hl7.fhir.igtools.publisher.Publisher.generateNarratives(Publisher.java:1159)
at org.hl7.fhir.igtools.publisher.Publisher.loadConformance(Publisher.java:3912)
at org.hl7.fhir.igtools.publisher.Publisher.createIg(Publisher.java:893)
at org.hl7.fhir.igtools.publisher.Publisher.execute(Publisher.java:748)
at org.hl7.fhir.igtools.publisher.Publisher.main(Publisher.java:8566)

view this post on Zulip Grahame Grieve (Oct 25 2021 at 06:46):

what version is this?

view this post on Zulip Diana_Ovelgoenne (Oct 25 2021 at 06:54):

I just updated the publisher to 1.1.83 and still having the same results, the 422 Errors are gone though

view this post on Zulip Grahame Grieve (Oct 25 2021 at 06:57):

what line does it report for the error?

view this post on Zulip Diana_Ovelgoenne (Oct 25 2021 at 07:06):

It is on Generating Narratives (02:02.0486)
But I tried with another project, and that one compiles fine, so I will dig in my pages, probably there is where the problem is laying, weird thing is that a couple of weeks ago it was working and I just changes a diagram this morning :S
Thanks for your help though

view this post on Zulip Diana_Ovelgoenne (Oct 25 2021 at 07:13):

So i tried with another project and same problem
Generating Narratives (02:32.0386)
Previous POST attempt returned HTTP<413> from url -> http://tx.fhir.org/r4/.
Publishing Content Failed: null
Is it something to do with big IGs?

view this post on Zulip Grahame Grieve (Oct 25 2021 at 09:44):

no. tell me which line....

view this post on Zulip Diana_Ovelgoenne (Oct 25 2021 at 10:42):

I am using the _genonce. bat to call the publisher, sorry for the stupid question, how do I get the line number it crashes so I give it to you? I tired the logging options, but the temp file isn't getting generated, and when trying to use the GUI of the Publisher, the button for Get Summary just fades away while executing the ini file.

view this post on Zulip Diana_Ovelgoenne (Oct 25 2021 at 13:21):

I tried now to use the ontoserver and my IG Publisher went through the Narratives (where it gets dumped with tx.fhir.org

view this post on Zulip Grahame Grieve (Oct 25 2021 at 18:30):

the line number is in the stack dump - I want the stack dump

view this post on Zulip Grahame Grieve (Oct 26 2021 at 18:37):

I'm about to take it down for an upgrade for a few minutes

view this post on Zulip Lloyd McKenzie (Oct 26 2021 at 19:19):

@Grahame Grieve - technically this should probably go out on #committers/announce

view this post on Zulip Grahame Grieve (Oct 26 2021 at 19:33):

probably but this is where everyone complains.

view this post on Zulip Lloyd McKenzie (Oct 26 2021 at 19:43):

If you post there, and they complain here, then we know whose committer access to revoke because they're not actively monitoring #committers/announce ... :wink:


Last updated: Apr 12 2022 at 19:14 UTC