Stream: tx.fhir.org/notification
Topic: Prod Tx Server: FHIRServer - 7/19/2021 8:40:37 PM
Terminology Service Monitor Bot (Jul 19 2021 at 20:40):
7/19/2021 8:40:37 PM: Issueed start request, service: FHIRServer
Terminology Service Monitor Bot (Jul 19 2021 at 20:41):
7/19/2021 8:41:10 PM: Waiting on service (FHIRServer) after start...
Mark Iantorno (Jul 19 2021 at 20:47):
Oh nice
Terminology Service Monitor Bot (Jul 19 2021 at 20:48):
7/19/2021 8:48:07 PM: Service is up!
Rob Hausam (Jul 19 2021 at 20:48):
Yes, yikes! The service is now re-starting - so that seems to have worked, but there was a lot of churn beforehand. I don't know if @Mark Iantorno or @Gino Canessa may have intervened manually (I haven't seen anything that said so)? The monitor still doesn't seem to quite be there yet, but we are making progress.
Mark Iantorno (Jul 19 2021 at 20:48):
I was actually trying to code today, so I was ignoring Zulip. So...wasn't me
Gino Canessa (Jul 19 2021 at 20:49):
Yes, I'll update it to post messages more frequently as well - right now it only posts a message during a state change.
Gino Canessa (Jul 19 2021 at 20:49):
I haven't touched it
Rob Hausam (Jul 19 2021 at 20:49):
Ok. So that's good - it did work.
Gino Canessa (Jul 19 2021 at 20:50):
That's probably a fair estimate on time.. it issues a Service stop request, waits until the service is reported as stopped, then issues a start request (which I believe takes some number of minutes to come back up as well)
Rob Hausam (Jul 19 2021 at 20:52):
Actually, I was reading the time wrong - had looked at the one from a few days ago, too. Seems like it only took a few minutes from when the downtime was noticed. It did send quite a few messages, though. :)
Rob Hausam (Jul 19 2021 at 20:53):
The four stop request messages (are they all real?) seems of the most potential concern to me.
Gino Canessa (Jul 19 2021 at 20:53):
Ahh, I see that now. I think it's because the service reports to the Service Control Manager that it failed to start (so I issue another start request)... I guess I need to put in some logic to only issue a single start request and wait, regardless of what the SCM says the state is.
Rob Hausam (Jul 19 2021 at 20:54):
Yes, I think so.
Gino Canessa (Jul 19 2021 at 20:55):
Yeah, I didn't have anything that simulates that process (successfully starting but reporting a failure). I don't really want to try an figure out what I would need to write to simulate that =)
Terminology Service Monitor Bot (Jul 20 2021 at 02:38):
7/20/2021 2:38:17 AM: Service is down! Will restart...
Terminology Service Monitor Bot (Jul 20 2021 at 02:38):
7/20/2021 2:38:18 AM: Issued stop request, service: FHIRServer
Terminology Service Monitor Bot (Jul 20 2021 at 02:38):
7/20/2021 2:38:25 AM: Waiting on service (FHIRServer) after start...
Terminology Service Monitor Bot (Jul 20 2021 at 02:38):
7/20/2021 2:38:25 AM: Waiting on service (FHIRServer) after start...
Terminology Service Monitor Bot (Jul 20 2021 at 02:38):
7/20/2021 2:38:25 AM: Waiting on service (FHIRServer) after start...
Last updated: Apr 12 2022 at 19:14 UTC