FHIR Chat · crucible hanging · crucible

Stream: crucible

Topic: crucible hanging


view this post on Zulip Ben Spencer (Mar 05 2019 at 13:23):

Hello again

We've deployed a public server and are attempting to run crucible against it. It seems to consistently get stuck after about half an hour, not always in exactly the same place. Once it gets stuck we can see from the logs that we're no longer receiving http requests from it. I've run a local crucible in docker-compose against it and that runs the tests to completion.

https://projectcrucible.org/servers/5c791f4404ebd07fa1000000

Last request from that run was at 13:07:42

view this post on Zulip Robert Scanlon (Mar 05 2019 at 14:06):

Hmmm, thanks Ben, we'll take a look. Looks like it is hanging on 'Resource Test Supply Delivery' right now. Screen-Shot-2019-03-05-at-9.05.00-AM.png

view this post on Zulip Robert Scanlon (Mar 05 2019 at 14:09):

In the past we have had some problems when running the full barrage of tests, though it wasn't consistent enough to replicate and track down the issue. I believe our running hypothesis was either the JSON parsing or XML parsing libraries caused some kind of fatal error to occur.

view this post on Zulip Robert Scanlon (Mar 05 2019 at 14:09):

From what you can tell, is it always failing on the same test?

view this post on Zulip Robert Scanlon (Mar 05 2019 at 14:10):

We also should have a cleanup job that identifies stalled tests, kills them, and resumes when this type of thing happens, but if it has been stuck awhile then perhaps that isn't working either

view this post on Zulip Ben Spencer (Mar 05 2019 at 14:17):

From what I can tell, it's not always failing on the same test, but it does seem to be roughly in the same place, somewhere in the Base Resources / Clinical Resources / Financial Resources sections

Let me know if I can provide any more information from our end.

view this post on Zulip Robert Scanlon (Mar 05 2019 at 17:40):

Turns out our process that identifies stalled jobs and restarts them had been disabled due a cron issue, I re-enabled it and now the test will get marked as a fatal 'Crucible Error', and Crucible will pick up on the next test in the run. Screen-Shot-2019-03-05-at-12.37.05-PM.png

view this post on Zulip Robert Scanlon (Mar 05 2019 at 17:40):

Not ideal, but at least Crucible gracefully recovers now.

view this post on Zulip Robert Scanlon (Mar 05 2019 at 17:41):

... and the rest of the tests will be run.

view this post on Zulip Ben Spencer (Mar 06 2019 at 08:31):

thanks Robert!

view this post on Zulip Ben Spencer (Mar 06 2019 at 08:32):

is it likely that the error was caused by something on our end?

view this post on Zulip Robert Scanlon (Mar 06 2019 at 16:57):

We've seen this happen on other servers, so it is unlikely that you are doing anything wrong. Let me know if things get stuck again -- I expect you'll still see the occasional 'Unrecoverable Crucible Error', but the tests shouldn't hang indefinitely any more

view this post on Zulip Robert Scanlon (Mar 06 2019 at 16:58):

Thanks for reporting that issue!


Last updated: Apr 12 2022 at 19:14 UTC