Stream: ibm
Topic: Delete Job
Mahesh Dabi (Nov 03 2021 at 04:46):
While testing a case (BDE-10: Bulk Data Server returns "202 Accepted" for delete request) in Inferno we received the following exception. Please let us know how do we handle this.
Job deletion is not supported with the configured (in-memory) database.
com.ibm.fhir.exception.FHIROperationException: Job deletion is not supported with the configured (in-memory) database. [probeId=ac-1e-0-bf-64dd1ae5-19b6-4b78-87b3-829a85a226cd]
at com.ibm.fhir.operation.bulkdata.util.BulkDataExportUtil.buildOperationException(BulkDataExportUtil.java:143)
at com.ibm.fhir.operation.bulkdata.util.BulkDataExportUtil.buildOperationException(BulkDataExportUtil.java:139)
at com.ibm.fhir.operation.bulkdata.client.action.batch.BatchCancelRequestAction.run(BatchCancelRequestAction.java:120)
at com.ibm.fhir.operation.bulkdata.client.BulkDataClient.delete(BulkDataClient.java:517)
at com.ibm.fhir.operation.bulkdata.processor.impl.ExportImportImpl.delete(ExportImportImpl.java:154)
at com.ibm.fhir.operation.bulkdata.StatusOperation.doInvoke(StatusOperation.java:61)
at com.ibm.fhir.server.operation.spi.AbstractOperation.invoke(AbstractOperation.java:57)
at com.ibm.fhir.server.util.FHIRRestHelper.doInvoke(FHIRRestHelper.java:1171)
at com.ibm.fhir.server.resources.Operation.invokeDelete(Operation.java:171)
at com.ibm.fhir.server.resources.Operation$Proxy$_$$_WeldClientProxy.invokeDelete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.ibm.ws.jaxrs20.cdi.component.JaxRsFactoryImplicitBeanCDICustomizer.serviceInvoke(JaxRsFactoryImplicitBeanCDICustomizer.java:342)
at com.ibm.ws.jaxrs20.server.LibertyJaxRsServerFactoryBean.performInvocation(LibertyJaxRsServerFactoryBean.java:641)
at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.performInvocation(LibertyJaxRsInvoker.java:160)
at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:101)
at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:273)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:213)
at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:444)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:112)
at org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
at org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:123)
at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:274)
at com.ibm.ws.jaxrs20.endpoint.AbstractJaxRsWebEndpoint.invoke(AbstractJaxRsWebEndpoint.java:137)
at com.ibm.websphere.jaxrs.server.IBMRestServlet.handleRequest(IBMRestServlet.java:146)
at com.ibm.websphere.jaxrs.server.IBMRestServlet.doDelete(IBMRestServlet.java:131)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:712)
at com.ibm.websphere.jaxrs.server.IBMRestServlet.service(IBMRestServlet.java:96)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1258)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:746)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:443)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.invokeTarget(WebAppFilterChain.java:183)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:94)
at com.ibm.fhir.server.filter.rest.FHIRRestServletFilter.doFilter(FHIRRestServletFilter.java:155)
at javax.servlet.http.HttpFilter.doFilter(HttpFilter.java:127)
at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:197)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
at com.ibm.ws.security.jaspi.JaspiServletFilter.doFilter(JaspiServletFilter.java:56)
at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:197)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:1002)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1140)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1011)
at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:75)
at com.ibm.ws.webcontainer40.servlet.CacheServletWrapper40.handleRequest(CacheServletWrapper40.java:85)
at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:938)
at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.run(DynamicVirtualHost.java:279)
at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink$TaskWrapper.run(HttpDispatcherLink.java:1159)
at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.wrapHandlerAndExecute(HttpDispatcherLink.java:428)
at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.ready(HttpDispatcherLink.java:387)
at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:566)
at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleNewRequest(HttpInboundLink.java:500)
at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.processRequest(HttpInboundLink.java:360)
at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.ready(HttpInboundLink.java:327)
at com.ibm.ws.channel.ssl.internal.SSLConnectionLink.determineNextChannel(SSLConnectionLink.java:1100)
at com.ibm.ws.channel.ssl.internal.SSLConnectionLink$MyReadCompletedCallback.complete(SSLConnectionLink.java:675)
at com.ibm.ws.channel.ssl.internal.SSLReadServiceContext$SSLReadCompletedCallback.complete(SSLReadServiceContext.java:1824)
at com.ibm.ws.tcpchannel.internal.WorkQueueManager.requestComplete(WorkQueueManager.java:504)
at com.ibm.ws.tcpchannel.internal.WorkQueueManager.attemptIO(WorkQueueManager.java:574)
at com.ibm.ws.tcpchannel.internal.WorkQueueManager.workerRun(WorkQueueManager.java:958)
at com.ibm.ws.tcpchannel.internal.WorkQueueManager$Worker.run(WorkQueueManager.java:1047)
at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:238)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Lee Surprenant (Nov 03 2021 at 14:15):
the OpenLiberty jbatch default is to use an in-memory job database. that doesn't support purging (what we do when we see a delete of a completed job).
you'll need to use a real job database to support deletes.
Sagar Sarvankar (Nov 04 2021 at 08:15):
Thank you, Lee and Paul, delete job worked after we did the change suggested by Paul.
Sagar Sarvankar (Nov 04 2021 at 10:11):
Sorry there, the delete job worked just once. Now it fails. I have the configuration in bulkdata.xml as below
<variable name="BATCH_DB_NAME" defaultValue="fhirdb"/>
<variable name="BATCH_DB_SCHEMA" defaultValue="fhir_jbatch"/>
<variable name="BATCH_DB_PORT" defaultValue="5432"/>
<variable name="BATCH_DB_USER" defaultValue="postgres"/>
<variable name="BATCH_DB_PASS" defaultValue="password"/>
<variable name="BATCH_DB_SSL" defaultValue=""/>
<variable name="BATCH_DB_SSL_CERT_PATH" defaultValue=""/>
<dataSource id="fhirbatchDS" jndiName="jdbc/fhirbatchDB" type="javax.sql.XADataSource" statementCacheSize="200" syncQueryTimeoutWithTransactionTimeout="true">
<jdbcDriver javax.sql.XADataSource="org.postgresql.xa.PGXADataSource" libraryRef="sharedLibPostgres"/>
<properties.postgresql
serverName="localhost"
portNumber="${BATCH_DB_PORT}"
databaseName="${BATCH_DB_NAME}"
user="${BATCH_DB_USER}"
password="${BATCH_DB_PASS}"
/>
<connectionManager maxPoolSize="200" minPoolSize="40"/>
</dataSource>
The error message we are getting is as below
An authentication alias should be used instead of defining a user name and password on dataSource[fhirbatchDS].
[11/4/21 9:40:13:988 UTC] 0000001c com.ibm.fhir.server.filter.rest.FHIRRestServletFilter I Received request: tenantId:[default] dsId:[default] user:[fhiruser] method:[DELETE] uri:[https://fhir.secureit.co.in:9443/fhir-server/api/v4/$bulkdata-status?job=LqzauvqtHSmkpChVHo%2B1MQ]
[11/4/21 9:40:14:723 UTC] 0000001c com.ibm.ws.jca.cm.ConnectorService I J2CA8050I: An authentication alias should be used instead of defining a user name and password on dataSource[fhirbatchDS].
[11/4/21 9:40:15:737 UTC] 0000001c com.ibm.fhir.server.resources.FHIRResource E The existing job has failed to delete
com.ibm.fhir.exception.FHIROperationException: The existing job has failed to delete [probeId=ac-1e-0-bf-e4b8d9d3-c645-488f-a4ba-a3478c83e38b]
at com.ibm.fhir.operation.bulkdata.util.BulkDataExportUtil.buildOperationException(BulkDataExportUtil.java:143)
at com.ibm.fhir.operation.bulkdata.util.BulkDataExportUtil.buildOperationException(BulkDataExportUtil.java:139)
at com.ibm.fhir.operation.bulkdata.client.action.batch.BatchCancelRequestAction.deleteJob(BatchCancelRequestAction.java:411)
at com.ibm.fhir.operation.bulkdata.client.action.batch.BatchCancelRequestAction.run(BatchCancelRequestAction.java:118)
at com.ibm.fhir.operation.bulkdata.client.BulkDataClient.delete(BulkDataClient.java:517)
at com.ibm.fhir.operation.bulkdata.processor.impl.ExportImportImpl.delete(ExportImportImpl.java:154)
at com.ibm.fhir.operation.bulkdata.StatusOperation.doInvoke(StatusOperation.java:61)
at com.ibm.fhir.server.operation.spi.AbstractOperation.invoke(AbstractOperation.java:57)
at com.ibm.fhir.server.util.FHIRRestHelper.doInvoke(FHIRRestHelper.java:1171)
at com.ibm.fhir.server.resources.Operation.invokeDelete(Operation.java:171)
at com.ibm.fhir.server.resources.Operation$Proxy$_$$_WeldClientProxy.invokeDelete(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.ibm.ws.jaxrs20.cdi.component.JaxRsFactoryImplicitBeanCDICustomizer.serviceInvoke(JaxRsFactoryImplicitBeanCDICustomizer.java:342)
at com.ibm.ws.jaxrs20.server.LibertyJaxRsServerFactoryBean.performInvocation(LibertyJaxRsServerFactoryBean.java:641)
at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.performInvocation(LibertyJaxRsInvoker.java:160)
at org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:101)
at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:273)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:213)
at com.ibm.ws.jaxrs20.server.LibertyJaxRsInvoker.invoke(LibertyJaxRsInvoker.java:444)
at org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:112)
at org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)
at org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)
at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:308)
at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:123)
at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:274)
at com.ibm.ws.jaxrs20.endpoint.AbstractJaxRsWebEndpoint.invoke(AbstractJaxRsWebEndpoint.java:137)
at com.ibm.websphere.jaxrs.server.IBMRestServlet.handleRequest(IBMRestServlet.java:146)
at com.ibm.websphere.jaxrs.server.IBMRestServlet.doDelete(IBMRestServlet.java:131)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:712)
at com.ibm.websphere.jaxrs.server.IBMRestServlet.service(IBMRestServlet.java:96)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.service(ServletWrapper.java:1258)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:746)
at com.ibm.ws.webcontainer.servlet.ServletWrapper.handleRequest(ServletWrapper.java:443)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.invokeTarget(WebAppFilterChain.java:183)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:94)
at com.ibm.fhir.server.filter.rest.FHIRRestServletFilter.doFilter(FHIRRestServletFilter.java:155)
at javax.servlet.http.HttpFilter.doFilter(HttpFilter.java:127)
at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:197)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
at com.ibm.ws.security.jaspi.JaspiServletFilter.doFilter(JaspiServletFilter.java:56)
at com.ibm.ws.webcontainer.filter.FilterInstanceWrapper.doFilter(FilterInstanceWrapper.java:197)
at com.ibm.ws.webcontainer.filter.WebAppFilterChain.doFilter(WebAppFilterChain.java:91)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.doFilter(WebAppFilterManager.java:1002)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1140)
at com.ibm.ws.webcontainer.filter.WebAppFilterManager.invokeFilters(WebAppFilterManager.java:1011)
at com.ibm.ws.webcontainer.servlet.CacheServletWrapper.handleRequest(CacheServletWrapper.java:75)
at com.ibm.ws.webcontainer40.servlet.CacheServletWrapper40.handleRequest(CacheServletWrapper40.java:85)
at com.ibm.ws.webcontainer.WebContainer.handleRequest(WebContainer.java:938)
at com.ibm.ws.webcontainer.osgi.DynamicVirtualHost$2.run(DynamicVirtualHost.java:279)
at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink$TaskWrapper.run(HttpDispatcherLink.java:1159)
at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.wrapHandlerAndExecute(HttpDispatcherLink.java:428)
at com.ibm.ws.http.dispatcher.internal.channel.HttpDispatcherLink.ready(HttpDispatcherLink.java:387)
at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleDiscrimination(HttpInboundLink.java:566)
at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.handleNewRequest(HttpInboundLink.java:500)
at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.processRequest(HttpInboundLink.java:360)
at com.ibm.ws.http.channel.internal.inbound.HttpInboundLink.ready(HttpInboundLink.java:327)
at com.ibm.ws.channel.ssl.internal.SSLConnectionLink.determineNextChannel(SSLConnectionLink.java:1100)
at com.ibm.ws.channel.ssl.internal.SSLConnectionLink$MyReadCompletedCallback.complete(SSLConnectionLink.java:675)
at com.ibm.ws.channel.ssl.internal.SSLReadServiceContext$SSLReadCompletedCallback.complete(SSLReadServiceContext.java:1824)
at com.ibm.ws.tcpchannel.internal.WorkQueueManager.requestComplete(WorkQueueManager.java:504)
at com.ibm.ws.tcpchannel.internal.WorkQueueManager.attemptIO(WorkQueueManager.java:574)
at com.ibm.ws.tcpchannel.internal.WorkQueueManager.workerRun(WorkQueueManager.java:958)
at com.ibm.ws.tcpchannel.internal.WorkQueueManager$Worker.run(WorkQueueManager.java:1047)
at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:238)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[11/4/21 9:40:15:752 UTC] 0000001c com.ibm.fhir.server.filter.rest.FHIRRestServletFilter I Completed request[1.765 secs]: tenantId:[default] dsId:[default] user:[fhiruser] method:[DELETE] uri:[https://fhir.secureit.co.in:9443/fhir-server/api/v4/$bulkdata-status?job=LqzauvqtHSmkpChVHo%2B1MQ] status:[500]
Sagar Sarvankar (Nov 05 2021 at 03:09):
Lee, did you get a chance to look at this?
Sagar Sarvankar (Nov 05 2021 at 06:50):
Hello Lee and Paul,
You can ignore the earlier message.
Here is the scenario:
After starting the ubuntu server, the very first time the 'job delete' works fine, at least we get the correct response code. But when I visited the logs, there were errors and the job delete was not completely successful. Please see an attached snapshot of different logs (SuccessfulJobDelete_Log.zip). SuccessfulJobDelete_Log.zip
When we attempted a fresh bulk export and further job delete it failed, I think it is because the earlier job delete was not completely successful. Please see attached logs (SecondJobDelete_Log.zip). SecondJobDelete_Log.zip
Please let me know if you need more information.
Thank you.
Lee Surprenant (Nov 05 2021 at 11:31):
Do you know how to retrieve the "job logs"? The console logs you sent are interesting, but maybe the job logs will have more info on the first exception:
[INFO ] FFDC1015I: An FFDC Incident has been created: "java.lang.NullPointerException com.ibm.jbatch.container.artifact.proxy.ItemReaderProxy 40" at ffdc_21.11.05_05.43.03.0.log
doc: https://ibm.github.io/FHIR/guides/FHIRServerUsersGuide#4105-job-logs
Sagar Sarvankar (Nov 05 2021 at 12:38):
Hello Lee,
Please see attached logs folder logs.zip after I tried a fresh bulk export followed by a successful delete. This was run through Inferno, and Inferno runs some other test cases apart from Bulk export and Delete. Please check only logs related to DELETE job.
Thank you.
Sagar Sarvankar (Nov 09 2021 at 04:13):
Hello @Lee Surprenant ,
Did you get a chance to check the logs? Please let me know if you need more information.
Thank you,
Sagar
Sagar Sarvankar (Dec 15 2021 at 10:35):
Hello @Lee Surprenant .
We installed the latest IBM FHIR server, but the delete job fails intermittently. Did you get a chance to see the logs attached in the earlier chat here? Please let us know your thoughts.
Thank you,
Sagar
Lee Surprenant (Dec 15 2021 at 12:55):
Hi Sagar, thanks for your patience. I suggest you open an issue at https://github.com/IBM/FHIR/issues and attach your log.
It looks to me like inferno is doing a Group/$export with the default resource types which results in an export job with 66 partitions.
The job starts to run these partitions (4 at a time) and then we receive the DELETE.
When our server receives the DELETE, it will take one of two actions:
- If the job is running, it will try to stop the job
- If the job is completed, it will purge the job
The job execution's step1 logs all look good to me...they recognize the DELETE and exit with a status of STOPPED.
But then the overall execution gets marked with status FAILED for some reason:
[11/5/21 12:29:18:990 UTC] com.ibm.jbatch.container.controller.impl.PartitionedStepCont startedPartitions = [0, 1, 2, 3], numThreads = 3
[11/5/21 12:29:18:990 UTC] com.ibm.jbatch.container.context.impl.JobContextImpl Returing ref (non-clone) to job context properties
[11/5/21 12:29:19:168 UTC] com.ibm.jbatch.container.controller.impl.PartitionedStepCont Calling analyzeStatus(): PartitionReplyMsg: partition #: 3exec info: TopLevelNameIntanceExecutionInfo:jobName=bulkgroupexportchunkjob:instanceId=2:executionId=2, PARTITION_FINAL_STATUS, FAILED, FAILED
[11/5/21 12:29:19:168 UTC] com.ibm.jbatch.container.controller.impl.PartitionedStepCont CWWKY0013I: Partition 3 ended with batch status FAILED and exit status FAILED for step step1 for job instance 2 and job execution 2.
[11/5/21 12:29:19:168 UTC] com.ibm.jbatch.container.controller.impl.PartitionedStepCont Iterate through loop with finishedPartitions = [com.ibm.jbatch.container.controller.impl.PartitionedStepControllerImpl$FinishedPartition@4e6ab5ff, com.ibm.jbatch.container.controller.impl.PartitionedStepControllerImpl$FinishedPartition@66ff0a1e, com.ibm.jbatch.container.controller.impl.PartitionedStepControllerImpl$FinishedPartition@53a4855b]
[11/5/21 12:29:19:168 UTC] com.ibm.jbatch.container.controller.impl.PartitionedStepCont startedPartitions = [0, 1, 2, 3, 4], numThreads = 3
[11/5/21 12:29:19:168 UTC] com.ibm.jbatch.container.context.impl.JobContextImpl Returing ref (non-clone) to job context properties
[11/5/21 12:29:19:168 UTC] com.ibm.jbatch.container.controller.impl.PartitionedStepCont For partitioned step: step1, the partition # 1 ended with status 'COMPLETED
[11/5/21 12:29:19:168 UTC] com.ibm.jbatch.container.controller.impl.PartitionedStepCont For partitioned step: step1, the partition # 0 ended with status 'COMPLETED
[11/5/21 12:29:19:168 UTC] com.ibm.jbatch.container.controller.impl.PartitionedStepCont For partitioned step: step1, the partition # 3 ended with status 'FAILED
[11/5/21 12:29:19:168 UTC] com.ibm.jbatch.container.controller.impl.BaseStepControllerI CWWKY0019W: Step step1 failed with batch status FAILED and exit status null for job instance 2 and job execution 2.
[11/5/21 12:29:19:465 UTC] com.ibm.jbatch.container.controller.impl.BaseStepControllerI CWWKY0030I: An exception occurred while running the step step1.
com.ibm.jbatch.container.exception.BatchContainerRuntimeException: One or more partitions failed: [For partitioned step: step1, the partition # 3 ended with status 'FAILED]
at com.ibm.jbatch.container.controller.impl.PartitionedStepControllerImpl.checkFinishedPartitionsForFailures(PartitionedStepControllerImpl.java:1021)
at com.ibm.jbatch.container.controller.impl.PartitionedStepControllerImpl.invokeCoreStep(PartitionedStepControllerImpl.java:475)
at com.ibm.jbatch.container.controller.impl.BaseStepControllerImpl.execute(BaseStepControllerImpl.java:295)
at com.ibm.jbatch.container.controller.impl.PartitionedStepControllerImpl.execute(PartitionedStepControllerImpl.java:201)
at com.ibm.jbatch.container.controller.impl.ExecutionTransitioner.doExecutionLoop(ExecutionTransitioner.java:118)
at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.executeCoreTransitionLoop(WorkUnitThreadControllerImpl.java:96)
at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.executeWorkUnit(WorkUnitThreadControllerImpl.java:178)
at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl$AbstractControllerHelper.runExecutionOnThread(WorkUnitThreadControllerImpl.java:503)
at com.ibm.jbatch.container.controller.impl.WorkUnitThreadControllerImpl.runExecutionOnThread(WorkUnitThreadControllerImpl.java:92)
at com.ibm.jbatch.container.util.BatchWorkUnit.run(BatchWorkUnit.java:113)
at com.ibm.ws.context.service.serializable.ContextualRunnable.run(ContextualRunnable.java:79)
at com.ibm.ws.threading.internal.ExecutorServiceImpl$RunnableWrapper.run(ExecutorServiceImpl.java:238)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Still, it proceeds to set the overall job batch status (and the Job Exist Status) to STOPPED.
Then, we try the purge and it fails with the following:
[11/5/21 12:29:19:436 UTC] 00000031 com.ibm.ws.logging.internal.impl.IncidentImpl I FFDC1015I: An FFDC Incident has been created: "com.ibm.ws.jbatch.rest.internal.resources.RequestException: The specified job instance, 2, cannot be purged because it has active job executions. com.ibm.ws.jbatch.rest.internal.resources.RequestRouter 71" at ffdc_21.11.05_12.29.19.1.log
[11/5/21 12:29:19:464 UTC] 0000001d com.ibm.ws.logging.internal.impl.IncidentImpl I FFDC1015I: An FFDC Incident has been created: "com.ibm.jbatch.container.exception.BatchContainerRuntimeException: One or more partitions failed: [For partitioned step: step1, the partition # 3 ended with status 'FAILED] com.ibm.jbatch.container.controller.impl.BaseStepControllerImpl 296" at ffdc_21.11.05_12.29.19.0.log
It will take some work to investigate any further, but we already have a couple folks looking into bulkdata right now, so the timing is good. Thanks again.
Sagar Sarvankar (Dec 16 2021 at 06:10):
Hello @Lee Surprenant
Thank you for your reply, I have created an issue at https://github.com/IBM/FHIR/issues/3120 . Also attached fresh set of logs in the issue.
Please assign it appropriately and let me know if you need more information.
Thank you,
Sagar
Paul Bastide (Dec 16 2021 at 13:39):
HI @Sagar Sarvankar basically your job is a STOPPING state with "batchStatus": "STOPPING",[\n]"
it's looping over this status. When you run with trace.specification = *=fine
you essentially bury the machine, and extend the amount of time it takes to process the batch job being stopped. Could you retry with *=info:- org.jboss.*=FINE:io.openliberty.microprofile.*=FINE:com.ibm.fhir.*=FINE:javax.batch.*=FINE:com.ibm.ws.jbatch.*=FINE:com.ibm.ws.*=INFO:com.ibm.jbatch.*=FINE:com.ibm.ws.jbatch.rest.*=FINEST
? Also is there a reason you running with OpenJDK 1.8? Using the OpenJ9 (Semeru JVM) may actually help performance wise - https://developer.ibm.com/languages/java/semeru-runtimes/downloads/
Sagar Sarvankar (Dec 17 2021 at 12:41):
Hello @Paul Bastide ,
Thank you for your quick reply. I did the changes that you mentioned, but those did not work.
Here is the pattern I identified in the cases that it passes and fails.
Using Postman:
Performed two different requests
GET https://fhirserver.justtest.in:9443/fhir-server/api/v4/Group/17db31227ca-52d96331-11d8-4ebd-8bc5-f4ef2d39a4ec/$export
DELETE https://fhirserver.justtest.in:9443/fhir-server/api/v4/$bulkdata-status?job=k_d8cTAU%2BUeVEwqURPZ3oA
After the GET is completed, if I quickly do a DELETE then job gets deleted and returns success.
After the GET is completed, if I take my own time to send DELETE, then it returns Internal Server Error (500).
Looking at the logs I observed that, if the DELETE is sent while the export is in progress it works, else it returns 500. I am not sure if that observation is correct. I have attached the Postman logs of both the above scenarios. Could you please check that? Postman.zip
Using Inferno:
Tried several times and it just worked once, whereas it failed all the times otherwise. Could you please check the logs for the same here? Inferno.zip
We are very close in passing the Inferno tool with IBM FHIR. If the DELETE works fine, we will be good.
Please let me know your thoughts.
Regards,
Sagar
Last updated: Apr 12 2022 at 19:14 UTC