In our project we have a Client entity that can hold one to many name entities and one to many address entities. This is what a client containing names and address collections looks like with the annotations.
@Entity @Indexed public class Client extends BaseEntity{ ... @IndexedEmbedded(depth = 1) @Fetch(FetchMode.SUBSELECT) @OneToMany(mappedBy = "client", fetch = FetchType.LAZY, cascade = {CascadeType.MERGE,CascadeType.REMOVE, CascadeType.REFRESH}) public Set<Name> getNames() { return names; } ...
@IndexedEmbedded(depth = 1) @Fetch(FetchMode.SUBSELECT) @OneToMany(mappedBy = "client", fetch = FetchType.LAZY, cascade = {CascadeType.MERGE,CascadeType.REMOVE, CascadeType.REFRESH}) public Set<UsAddress> getUsAddresses() { return usAddresses; } ... }
This is what Name entity looks like with analyzers applied.
@AnalyzerDef(name="nameanalyzer", charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class, params = { @Parameter(name = "mapping", value = "mapping-chars.properties") }) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = ASCIIFoldingFilterFactory.class), @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = StopFilterFactory.class, params = { @Parameter(name="words", value= "stoplist.properties" ), @Parameter(name="ignoreCase", value="true") }), @TokenFilterDef(factory = SynonymFilterFactory.class, params = { @Parameter(name="synonyms", value= "nicknames.txt" ), @Parameter(name="ignoreCase", value="true"), @Parameter(name="expand", value="false"), }) }) @Entity public class Name extends BaseEntity { ... @Basic @Field(index = Index.YES, analyze = Analyze.YES, store = Store.NO) @Analyzer(definition = "nameanalyzer") @Column(name = "BUSINESS_NAME", nullable = true, insertable = true, updatable = true, length = 370, precision = 0) public String getBusinessName() { return businessName; } ... }
This is what UsAddress entity looks like with analyzers applied.
@AnalyzerDef(name="usAddressAnalyzer", charFilters = { @CharFilterDef(factory = MappingCharFilterFactory.class, params = { @Parameter(name = "mapping", value = "mapping-chars.properties") }) }, tokenizer = @TokenizerDef(factory = StandardTokenizerFactory.class), filters = { @TokenFilterDef(factory = LowerCaseFilterFactory.class), @TokenFilterDef(factory = StopFilterFactory.class, params = { @Parameter(name="words", value= "stoplist.properties" ), @Parameter(name="ignoreCase", value="true") }), @TokenFilterDef(factory = SynonymFilterFactory.class, params = { @Parameter(name="synonyms", value= "street_synonyms.txt" ), @Parameter(name="ignoreCase", value="true"), @Parameter(name="expand", value="false"), }) }) @Entity public class UsAddress extends BaseEntity { ... @Basic @Field(index = Index.YES, analyze = Analyze.YES, store = Store.NO) @Analyzer(definition = "usAddressAnalyzer") @Column(name = "DELIVERY_LINE", nullable = false, insertable = true, updatable = true, length = 256, precision = 0) public String getDeliveryLine() { return deliveryLine; } ... }
Here we are indexing only client because all our fuzzy searches on businessName field and/or deliveryLine field should return list of clients. So I believe the index document for a client is holding all businessName values from the name collection against the indexed businessName field and same for deliveryLine. So suppose a client consists of the following two names
<client> <names> <name> <id>efd2173d-d7d4-4449-b100-92ae373fedb1</id> <clientId>ca46c8e1-69a9-4bf9-ab7d-15938f7a459d</clientId> <businessName>Tom Raulston Co.</businessName> </name> <name> <id>970cb247-195b-406d-937a-dc8399a8e0e9</id> <clientId>ca46c8e1-69a9-4bf9-ab7d-15938f7a459d</clientId> <businessName>Mike Jason Incorp</businessName> </name> </names> ... </client>
Now if I fuzzy search for 'Tom AND Raulston' or 'Mike AND Jason' I find this client and that is ok. But the problem is even if I search for 'Tom AND Jason' I find this client despite no business name existing as <businessName>Tom Jason</businessName>. So my question is how do I ensure a fuzzy search like 'Tom AND Jason' not find this client at all because the combination does not exist.
I do not want to use phrase query with slops because then the fuzziness is lost and the consumers of our system want fuzziness. Thanks