FHIR Chat · Full text searching · hapi

Stream: hapi

Topic: Full text searching


view this post on Zulip Luke Swindale (May 15 2018 at 00:04):

Has anyone had experience using HAPI for lucene full text searching? I've got an autocomplete I'm using to look up Medication resources using a 'prefix, any order' approach. We're using fhir-jpa-server-base:3.2.0 with Hibernate Search, backed by MySQL. Achieving 'prefix, any order' was a bit of a challenge and very ugly. I'm just wondering whether there's a better way.

Request: http://localhost:8080/my-app/Medication?_content=blister%20onda&_profile=https://www.healthterminologies.gov.au/StructureDefinition/MyMedication

Problem:
Trying to do a full text search, using _content, used the "myContentText" field, within the index, and not "myContentTextEdgeNgram". I did note that it looks as if the author originally intended to search this field as it is commented out on line 86 of FulltextSearchSvcImpl.java. As a result the search would only return matches on whole words, not prefix.

Additionally, the search results did not appear to be in relevance order. For my autocomplete I just wanted to show the first page of results and not bother with paging through the remaining. However, since relevance ordering is not honored I'd never get back the results I'd expect. The cause of this is that IFulltextSearchSvc.search() returns the set of matching resource ids, in relevance order... but the SearchBuilder.java then uses that set in an IN SQL query, which does not honor the order of provided ids.

Solution 1 - Prefix searching:
To achieve this I created my own implementation of IFulltextSearchSvc.java which extends FulltextSearchSvcImpl.java and then just override FulltextSearchSvcImpl.search() and FulltextSearchSvcImpl.addTextSearch(). To configure my new implementation I included the following in my application configuration:

@Configuration
@EnableTransactionManagement()
public class MyServerConfig extends BaseJavaConfigDstu3 {
    @Bean(autowire = Autowire.BY_TYPE)
    public IFulltextSearchSvc searchDaoDstu3() {
        return new MyFulltextSearchSvc();
    }
}

I then attempted to simply uncomment the field definition for "myContentTextEdgeNgram". This did make lucene query the desired field. However, the query was too complex and overboosting the first term in the provided search text. To achieve a reasonable result I rewrote the query with just lucene, and not the hibernate querybuilder. The resulting lucene query is below (for the search text "blister onda").

Hibernate search query builder (ranked my expected result of 'ZOFRAN ondansetron ... blister pack' wayyyyyy down the result set):
+(myContentText:"blister onda"~2^4.0 ((myContentTextEdgeNGram:bli myContentTextEdgeNGram:blis myContentTextEdgeNGram:blist myContentTextEdgeNGram:bliste myContentTextEdgeNGram:blister myContentTextEdgeNGram:blister myContentTextEdgeNGram:blister o myContentTextEdgeNGram:blister on myContentTextEdgeNGram:blister ond myContentTextEdgeNGram:blister onda)^2.0)) +myResourceType:medication

Lucene query (ranked my expected result of 'ZOFRAN ondansetron ... blister pack' at the top of the result set):
+(myContentTextEdgeNGram:blister myContentTextEdgeNGram:onda) +myResourceType:medication

BEFORE:

    private void addTextSearch(QueryBuilder theQueryBuilder, BooleanJunction<?> theBoolean, List<List<? extends IQueryParameterType>> theTerms, String theFieldName, String theFieldNameEdgeNGram, String theFieldNameNGram) {
        if (theTerms == null) {
            return;
        }
        for (List<? extends IQueryParameterType> nextAnd : theTerms) {
            Set<String> terms = new HashSet<String>();
            for (IQueryParameterType nextOr : nextAnd) {
                StringParam nextOrString = (StringParam) nextOr;
                String nextValueTrimmed = StringUtils.defaultString(nextOrString.getValue()).trim();
                if (isNotBlank(nextValueTrimmed)) {
                    terms.add(nextValueTrimmed);
                }
            }
            if (terms.isEmpty() == false) {
                if (terms.size() == 1) {
                    //@formatter:off
                    Query textQuery = theQueryBuilder
                        .phrase()
                        .withSlop(2)
                        .onField(theFieldName).boostedTo(4.0f)
//                      .andField(theFieldNameEdgeNGram).boostedTo(2.0f)
//                      .andField(theFieldNameNGram).boostedTo(1.0f)
                        .sentence(terms.iterator().next().toLowerCase()).createQuery();
                    //@formatter:on

                    theBoolean.must(textQuery);
                } else {
                    String joinedTerms = StringUtils.join(terms, ' ');
                    theBoolean.must(theQueryBuilder.keyword().onField(theFieldName).matching(joinedTerms).createQuery());
                }
            }
        }
    }

AFTER:

    private void addTextSearch(QueryBuilder theQueryBuilder, BooleanJunction<?> theBoolean, List<List<? extends IQueryParameterType>> theTerms, String theFieldName, String theFieldNameEdgeNGram, String theFieldNameNGram) {
        if (theTerms == null) {
            return;
        }
        for (List<? extends IQueryParameterType> nextAnd : theTerms) {
            Set<String> terms = new HashSet<String>();
            for (IQueryParameterType nextOr : nextAnd) {
                StringParam nextOrString = (StringParam) nextOr;
                String nextValueTrimmed = StringUtils.defaultString(nextOrString.getValue()).trim();
                if (isNotBlank(nextValueTrimmed)) {
                    terms.add(nextValueTrimmed);
                }
            }
            if (terms.isEmpty() == false) {
                if (terms.size() > 1) {
                    String termsAsString = StringUtils.join(terms, ' ');
                    terms = new HashSet<>(1);
                    terms.add(termsAsString);
                }
                List<TermQuery> queries = Arrays.asList(terms.iterator().next().toLowerCase().split(" ")).stream()
                        .map(x -> new TermQuery(new Term(theFieldNameEdgeNGram, x)))
                        .collect(Collectors.toList());
                BooleanQuery.Builder booleanQuery = new BooleanQuery.Builder();
                for (TermQuery termQuery : queries) {
                    booleanQuery.add(termQuery, BooleanClause.Occur.SHOULD);
                }
                theBoolean.must(booleanQuery.build());
            }
        }
    }

Solution 2 - Relevance Ordering:
Firstly, I should point out that this solution is specific to MySQL although there are similar variations for most other common databases. Additionally, injecting my implementation is VERY ugly and far from desired. Since there was no way of configuring an alternate SearchBuilder, the only way (without forking HAPI) I could apply a fix was to replace the SearchBuilder.java with a copy contained within our own source code, keeping the same packaging to trick the classloader.... I did say it was ugly. Depending on any responses I may just create a branch of HAPI and a pull request.

Anyway, given that IFulltextSearchSvc.search() returned the relevance ordered set of resource ids, I then added an order by condition to the subsequent SearchBuilder query that fetched the corresponding set of resources. The result seems legit. The top results returned in my autocomplete are now the most relevant.

BEFORE (around line 1370):

         /*
         * Fulltext search
         */
        if (myParams.containsKey(Constants.PARAM_CONTENT) || myParams.containsKey(Constants.PARAM_TEXT)) {
            if (myFulltextSearchSvc == null) {
                if (myParams.containsKey(Constants.PARAM_TEXT)) {
                    throw new InvalidRequestException("Fulltext search is not enabled on this service, can not process parameter: " + Constants.PARAM_TEXT);
                } else if (myParams.containsKey(Constants.PARAM_CONTENT)) {
                    throw new InvalidRequestException("Fulltext search is not enabled on this service, can not process parameter: " + Constants.PARAM_CONTENT);
                }
            }

            List<Long> pids;
            if (myParams.getEverythingMode() != null) {
                pids = myFulltextSearchSvc.everything(myResourceName, myParams);
            } else {
                pids = myFulltextSearchSvc.search(myResourceName, myParams);
            }
            if (pids.isEmpty()) {
                // Will never match
                pids = Collections.singletonList(-1L);
            }

            myPredicates.add(myResourceTableRoot.get("myId").as(Long.class).in(pids));
        }

AFTER (around line 1370):

         /*
         * Fulltext search
         */
        if (myParams.containsKey(Constants.PARAM_CONTENT) || myParams.containsKey(Constants.PARAM_TEXT)) {
            if (myFulltextSearchSvc == null) {
                if (myParams.containsKey(Constants.PARAM_TEXT)) {
                    throw new InvalidRequestException("Fulltext search is not enabled on this service, can not process parameter: " + Constants.PARAM_TEXT);
                } else if (myParams.containsKey(Constants.PARAM_CONTENT)) {
                    throw new InvalidRequestException("Fulltext search is not enabled on this service, can not process parameter: " + Constants.PARAM_CONTENT);
                }
            }

            List<Long> pids;
            if (myParams.getEverythingMode() != null) {
                pids = myFulltextSearchSvc.everything(myResourceName, myParams);
            } else {
                pids = myFulltextSearchSvc.search(myResourceName, myParams);
                // The MySQL specific order by fix... to honour pid ordering
                outerQuery.orderBy(myBuilder.asc(myBuilder.function("FIND_IN_SET", Integer.class,
                        myResourceTableRoot.get("myId"), myBuilder.literal(StringUtils.join(pids, ",")))));
            }
            if (pids.isEmpty()) {
                // Will never match
                pids = Collections.singletonList(-1L);
            }
            myPredicates.add(myResourceTableRoot.get("myId").as(Long.class).in(pids));
        }

Sooo... the above changes have resulted in a much more useful full text search, for my implementation, but is a bit ugly (particularly the SearchBuilder hack). Please let me know if there is a much simpler way of achieving the same or if I've fallen into any traps and missed any important considerations. At the very least, maybe my hack might prove useful to others.


Last updated: Apr 12 2022 at 19:14 UTC