-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 6 posts ] 
Author Message
 Post subject: Using Hibernate Search with an "Unflat Structure"
PostPosted: Tue Dec 15, 2009 6:43 pm 
Newbie

Joined: Tue Dec 15, 2009 3:49 pm
Posts: 6
I would like to use Hibernate Search so that the data in the index is somewhat structured. Below is an example of the structure of classes/data that I am working with. Books contain author entities and author entities contain name entities.

public class Book {

private List<Author> authors;

private String title;
}

public class Author {

private List<Name> names;

}

public class Name {

private String firstName;

private String lastName;

}


XML format of the data...
<book>
<title>My First Book Title</title>
<author>
<name>
<firstname>Sue</firstname>
<lastname>Smith</lastname>
</name>
<name>
<firstname>John</firstname>
<lastname>Doe</lastname>
</name>
</author>
</book>

From my understanding of Hibernate Search, all the attributes of Book will be flattened to one dimension. The data example above will be stored as one entry into the index as something like:

book.title: My First Book Title
book.author.name.firstname: Sue
book.author.name.firstname: John
book.author.name.lastname: Smith
book.author.name.lastname: Doe

When I conduct a search for book.author.name.firstname=Sue and book.author.name.lastname=Doe, this record will return. However, Doe is really John's last name and does not really match what I am looking for in the search. Is there any way I can get these context dependent searches to work?


Top
 Profile  
 
 Post subject: Re: Using Hibernate Search with an "Unflat Structure"
PostPosted: Wed Dec 16, 2009 9:59 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

I don't think you need a "more structured" index. Instead you have to explore the Lucene query syntax and the Lucene query API.

Quote:
When I conduct a search for book.author.name.firstname=Sue and book.author.name.lastname=Doe, this record will return. However, Doe is really John's last name and does not really match what I am looking for in the search. Is there any way I can get these context dependent searches to work?


That's depends. If you use a MultiFieldQueryParser and your input is "book.author.name.firstname=Sue book.author.name.lastname=Doe" then you will get a hit, but if you use the AND oprtator it won't "book.author.name.firstname=Sue AND book.author.name.lastname=Doe". If you don't have a single search field, but rather a search mask (several input fields), then you can also use BooleanQuery and progammatically build your query.

--Hardy


Top
 Profile  
 
 Post subject: Re: Using Hibernate Search with an "Unflat Structure"
PostPosted: Tue Jan 12, 2010 2:59 pm 
Newbie

Joined: Tue Dec 15, 2009 3:49 pm
Posts: 6
I do not think using the MultiFieldQueryParser will work, unless I am using it incorrectly.

Here is a snipet of sample code (hard coded values for simplicity):

String[] fields = "book.authors.name.firstName", "book.authors.name.lastName"};
String[] queries = {"Derek", "User"};
MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, analyzer);
parser.setDefaultOperator(QueryParser.AND_OPERATOR);
parser.setAllowLeadingWildcard(true);
Query luceneQuery = parser.parse(queries, fields, analyzer);

org.hibernate.Query hibQuery =
fullTextSession.createFullTextQuery(luceneQuery, VicapCase.class);

List<VicapCase> searchResults = hibQuery.list();


I have a book record with two authors (Derek Smith and John User). The above query searches for firstname=Derek and lastname=User. It returns the below record which it shouldn't since 'Derek' and 'User' belong to two different author entities.

<book>
<title>Test Book Title</title>
<author>
<name>
<firstname>Derek</firstname>
<lastname>Smith</lastname>
</name>
<name>
<firstname>John</firstname>
<lastname>User</lastname>
</name>
</author>
</book>


Top
 Profile  
 
 Post subject: Re: Using Hibernate Search with an "Unflat Structure"
PostPosted: Mon Jan 18, 2010 4:02 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi, you're right this is a known limitation: multiple joined entities are not properly correlated one-with-the-other.
Even while OR is default, it's going to boost the result as if it contained both Sue and Doe, because the Document contains both and there's no notion of having both in the same property.

There are some workarounds but they make code a bit more complex, for example in your case I would have it index these terms:
firstname=John
lastname=Doe
firstnameLastname=John_Doe

When someone searches for both firstname and lastname and you have to force AND operator you programmatically include the firstnameLastname field to force it.
It's not a very nice limitation, if you have better suggestions they're welcome.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Using Hibernate Search with an "Unflat Structure"
PostPosted: Tue Jan 19, 2010 7:07 pm 
Newbie

Joined: Tue Dec 15, 2009 3:49 pm
Posts: 6
Thanks for your response, Sanne. I have thought about using the approach you have suggested for adding a conglomerate field such as "firstnameLastname." However, I think there will be issues if we want to use any of Lucene's other search features, such as fuzzy searches.

I am still thinking of solutions for this.. though it might require abandoning HS.


Top
 Profile  
 
 Post subject: Re: Using Hibernate Search with an "Unflat Structure"
PostPosted: Wed Jan 20, 2010 6:45 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
whatever technology you choose, you'll always have this problem as this is not a limitation of HS but of the way fields are managed in an inverted index as Lucene. What HS really provides is tools to keep entities and indexes in synch, and manage the indexreaders/writers lifecycle in the most efficient way, but it won't interfere in any way in how you perform queries (you can use the "low level APIs" to get direct reference to the appropriate searcher) and also gives you full control on the way to index entities (using the bridges et al).

The hardest part is usually to understand the problem to be able to make best use of these tools, so you're in a good position to handle this as there are several tricks which can be applied and it looks like you've a clear idea of the problem.

One strategy as suggested by Emmanuel in mailing list (and book) is to actually search for authors, and then use getBook() to return the book entity. If you end up implementing these tricks "manually" it will certainly be harder than making use of HS.

Actually if you find a good general purpose strategy usable with Lucene to achieve this, please speak up I'll implement it! Any suggestion welcome, AFAIK no other tool can do this :-)

This is also a limitation of most if not all NoSQL databases, unless you can fully denormalize your data you can't perform all kind of queries.
Another option you have is let Lucene return all values, and then filter out yourself the "wrong ones"; it is in fact far more efficient than trying to implement fuzzy search on a database ;-)

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 6 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.