FHIR Chat · documentation on Elasticsearch · hapi

Stream: hapi

Topic: documentation on Elasticsearch


view this post on Zulip Rob Hausam (Apr 15 2021 at 21:02):

Is there additional documentation that anyone is aware of for setting up and using Elasticsearch - primarily for terminology (but not limited to that)? I'm looking specifically at the jpaserver-starter (5.3.0). The documentation in the README is fairly minimal and leaves (at least for me) a few things not quite clear. I also didn't find very much on it doing searches on this Zulip stream and in the Google Group. One thing in particular is, I'm not sure how things are supposed to work alongside or instead of the Lucene searching and what the proper settings are in application.yaml (or anywhere else) for that. And what is the best way to check if Elasticsearch is working correctly and the necessary indexes are all there? Any suggestions or answers will be appreciated!

view this post on Zulip Rob Hausam (Apr 20 2021 at 14:10):

@Jens Villadsen Do you have some thoughts/suggestions on this - particularly about the proper settings in application.yaml (James suggested checking with you on that)?

view this post on Zulip Jens Villadsen (Apr 20 2021 at 15:02):

I do have - but I haven't got the time to go into it. I'm currently doing a national COVID solution that takes up 150% of my time. I'll have suggestions and thoughts on this matter after the 7th of may

view this post on Zulip Rob Hausam (Apr 20 2021 at 15:48):

Thanks, Jens. No problem and definitely understand the priorities!

view this post on Zulip Rob Hausam (Apr 28 2021 at 21:34):

@Sean McIlvenna Do you have any insights or suggestions for the application.yaml elasticsearch settings? What particularly isn't clear to me is how to set username and password when the Elasticsearch endpoint is on localhost. My local Elasticsearch endpoint on Ubuntu 20.04 (installed from the Ubuntu repo) doesn't require (or even directly allow?) any credentials, so I haven't figured out how to populate those parameters. I tried commenting those two parameters out, but that gave: java.lang.IllegalArgumentException: Username may not be null. I also tried leaving the values empty - that didn't give a exception, but it also so far hasn't worked (value set expansions particularly are failing).

view this post on Zulip Sean McIlvenna (May 02 2021 at 21:13):

Hi @Rob Hausam , would love to say I know... I didn't setup any of the elastic search stuff, so I'm a bit in the dark. I'll take a quick glance and see if I can spot anything, though.

view this post on Zulip Sean McIlvenna (May 02 2021 at 21:16):

So, I see this in the hapi-fhir project:

        final CredentialsProvider credentialsProvider =
                new BasicCredentialsProvider();
        credentialsProvider.setCredentials(AuthScope.ANY,
                new UsernamePasswordCredentials(theUsername, thePassword));
        RestClientBuilder clientBuilder = RestClient.builder(
                new HttpHost(stripHostOfScheme(theHostname), thePort, determineScheme(theHostname)))
                .setHttpClientConfigCallback(httpClientBuilder -> httpClientBuilder
                        .setDefaultCredentialsProvider(credentialsProvider));

view this post on Zulip Sean McIlvenna (May 02 2021 at 21:16):

It appears that it should be modified to not always assume basic credentials...

view this post on Zulip Sean McIlvenna (May 02 2021 at 21:16):

But, this is part of the core hapi-fhir project, not the hapi-fhir-jpaserver-starter project... so, it's not as easy to get that changed

view this post on Zulip Rob Hausam (May 02 2021 at 21:17):

Thanks, Sean. I think I've figured out some of it. I do have Elasticsearch running now and it seems to be indexing - but I'm not sure yet how well and completely it's working. It seems to not really care (at least in my particular case running on localhost) what the username and password are, as long as they are there.

view this post on Zulip Sean McIlvenna (May 02 2021 at 21:17):

nice!

view this post on Zulip Sean McIlvenna (May 02 2021 at 21:18):

I'd suggest submitting an issue requesting that this functionality be changed to not always require user/pass

view this post on Zulip Sean McIlvenna (May 02 2021 at 21:18):

seems like a reasonable change to me

view this post on Zulip Rob Hausam (May 02 2021 at 21:21):

Here is my current application.yaml. Any further thoughts you have would be great.

view this post on Zulip dsh (May 02 2021 at 21:22):

@Rob Hausam I am fairly new to FHIR, so this may be very basic stuff for you. But what's your primary driver to move to ElasticSearch ? Is there an expectation of performance gains on REST queries ?

view this post on Zulip Rob Hausam (May 02 2021 at 21:34):

@dsh No, probably not that basic at all - I'm still working through figuring it out. The motivation for trying Elasticsearch was a couple of things. We were trying to implement type-ahead search using $expand + filter on ValueSet, but the existing search behavior was doing partial string matching but only with an exact match on the full entered text, rather than basing the search on tokens (which would provide a much more natural search result). Elasticsearch seems potentially able to provide much more configurability and capability in that regard. Plus the HAPI documentation states that Elasticsearch is now available and that it is likely to become the default in the future, so I thought it would make sense to go ahead and give it a try now and see what it would be able to do.

view this post on Zulip dsh (May 03 2021 at 09:21):

Thanks for sharing that @Rob Hausam

view this post on Zulip David Meyers (Jan 10 2022 at 11:02):

In initial tests on queries on resources with huge volumes (e.g., over >5 million observation resources), we did significantly better with the standard Lucenes engine than with ElasticSearch integration (e.g.: 6 minutes when retrieving all resources via self + next instead of 7 minutes with ElasticSearch). Full text searches via "contains" operator were also significantly faster.
Are there any other experience reports here? Could of course be due to configuration or other technical details.

view this post on Zulip Kevin Mayfield (Jan 10 2022 at 11:53):

ElasticSearch is built on top of lucene. So Lucene should always be faster.
ElasticSearch adds in horizontal scalibility

view this post on Zulip David Meyers (Jan 10 2022 at 12:39):

Thanks for the clarification @Kevin Mayfield. Are there any tips (configuration, infrastructure, ...) that you don't constantly run into timeouts with "_summary=count" initial queries (without prior caching) for resources with high entity ? Sure you could increase the timeout parameter (currently 60 seconds afaik), however I think that just counting the resources shouldn't take that long ? We are currently using a mysql database.


Last updated: Apr 12 2022 at 19:14 UTC