Hibernate Books

All times are UTC - 5 hours [ DST ]



Post new topic Reply to topic  [ 24 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Conditional indexing, multiple indexes / fields within index
PostPosted: Wed Jul 13, 2011 10:29 am 
Beginner
Beginner

Joined: Mon Apr 11, 2011 7:56 am
Posts: 38
I'm trying to find the best way to index entities of the same type, based on an entity's property value.

For example, we are currently working on a project to make our (and other) SVN repositories searchable for code. In this project, we have an entity CodeFragment , and each code fragment has a property code, and project (the project in which the code resides).

To search specfic within a project, we apply a filter, which filters on the project field. With this solution, the ranking of results is not scoped to one project, but this is currently not a big issue.
We also give autocompletion suggestions, which should be scoped to the project a user is currently searching. The autocompleter uses the Hibernate Search Lucene index as source for its terms and frequencies to build its own index.
And here is where we'd like to fragment the index based on a entity's property value, namely project in our case.

I've looked at the documentation, and found a solution that might work: using a class bridge, which checks the value of the project property, and based on that, prefixes the field name with that value. This results in a (single) index for the CodeFragment entity, but with frequencies and terms scoped to the project the code fragment belongs to, by using different fields for each project.

I'm not sure if this is the best solution. The requirement is that we should be able to get terms and frequencies scoped to the project (entity property) a CodeFragment belongs to. Any other ideas?


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Wed Jul 13, 2011 2:17 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
Hi Elmer,
I think you should use sharding: you create a sharding policy such that each project has it's own index, and then you can make use of the awesome filter implementation which actually selects the proper shard:
http://docs.jboss.org/hibernate/stable/ ... lter-shard

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Thu Jul 14, 2011 4:30 am 
Beginner
Beginner

Joined: Mon Apr 11, 2011 7:56 am
Posts: 38
Hmm, how could I've missed that :o
Let's see if I can set this up dynamically in our DSL.
Thanks!


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Thu Jul 14, 2011 4:33 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
almost forgot, there's one catch: it's currently not supported to create new shards dynamically: you'll have to restart it when adding a new project.

We have plans to introduce dynamic sharding; it's scheduled for the next release but it might slip as it is quite tricky.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Thu Jul 14, 2011 5:41 am 
Beginner
Beginner

Joined: Mon Apr 11, 2011 7:56 am
Posts: 38
That's actually a big catch in our case. Is it a problem, performancewise, to set the number of shards to a big number, if only one shard is searched at each query?
Still, then there is the problem of keeping a mapping from, in this case, string value to integer.
I think I'm gonna try dynamic fields instead of sharding, using a class bridge. That seems more robust to me at the moment.


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Thu Jul 14, 2011 6:06 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
Quote:
That's actually a big catch in our case. Is it a problem, performancewise, to set the number of shards to a big number, if only one shard is searched at each query?

Not a problem, you get actually great performance: better than a single index assuming they are all used frequently (a non "warmed up" index will not be very responsive as caches invalidate, so the first hit on such an index after some time might be slower, but overall better throughput).

Quote:
Still, then there is the problem of keeping a mapping from, in this case, string value to integer.

That's easy to solve in your custom sharding implementation, a simple solution could use a properties file listing the project names, a better one could run a query on the database.

Quote:
I think I'm gonna try dynamic fields instead of sharding, using a class bridge. That seems more robust to me at the moment.

sure, very reasonable. If it doesn't work out, let's discuss alternatives. You're also welcome in changing whatever you need and send patches to be discussed and integrated.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Fri Jul 15, 2011 7:49 am 
Beginner
Beginner

Joined: Mon Apr 11, 2011 7:56 am
Posts: 38
Eventually I'm now implementing it using index sharding. It's better to have separate indices for each 'namespace', if a namespace A is updated more frequently than others, caching is more effective because indices of other namespaces stay untouched (and thus don't need to warm up again), if I understand correctly. I'll post an update if finished.


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Fri Jul 15, 2011 9:14 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
Quote:
Eventually I'm now implementing it using index sharding. It's better to have separate indices for each 'namespace', if a namespace A is updated more frequently than others, caching is more effective because indices of other namespaces stay untouched (and thus don't need to warm up again), if I understand correctly.

Correct.
Quote:
I'll post an update if finished.

yes please let us know how it goes, that's very useful to see where/what to improve.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Mon Jul 18, 2011 8:33 am 
Beginner
Beginner

Joined: Mon Apr 11, 2011 7:56 am
Posts: 38
I'm still bussy implementing it in our DSL, but I was thinking, what about inheritance?
If B : A , and A has the sharding strategy, will entity B be sharded the same way as entity A?


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Mon Jul 18, 2011 8:49 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
sharding is coupled to the index name, so it depends if the two types are both @Indexed and are using the same @Indexed(index= name )

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Mon Jul 18, 2011 9:39 am 
Beginner
Beginner

Joined: Mon Apr 11, 2011 7:56 am
Posts: 38
s.grinovero wrote:
sharding is coupled to the index name, so it depends if the two types are both @Indexed and are using the same @Indexed(index= name )

Ok :) So if I search for entities of type A and namespace declarations (i.e. sharding) is defined on entity A, I also get B's for that particular namespace. This won't work if I search for B's, because my B is indexed using a different name. But that's not a big issue, I can inherit the same sharding strategy from B's superclass(es), and B might even override the property that is acting as namespace identifier (at most 1 namespace at each level of inheritance). Sounds cool!

(assuming that entities are indexed X times, where X is the number of ancestor classes (including itself) that have @Indexed annotation)


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Mon Jul 18, 2011 9:45 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
Quote:
So if I search for entities of type A and namespace declarations (i.e. sharding) is defined on entity A, I also get B's for that particular namespace. This won't work if I search for B's, because my B is indexed using a different name.

Don't mix up physical index configuration and sharding, which are performance options, with semantics. The search will always work the same, we make sure we target the proper shards/indexes and aggregate the needed results (or filter them out) according to the typesafe parameters provided at search time.
Maybe it's best if you write some tests to better understand how it will work ;)

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Wed Jul 20, 2011 8:31 am 
Beginner
Beginner

Joined: Mon Apr 11, 2011 7:56 am
Posts: 38
Quote:
The search will always work the same

Ok :)

Next question, how should I accomplish the following in a sharded environment where I use my name spaces?

Code:
SearchFactory searchFactory = getFullTextSession().getSearchFactory();
DirectoryProvider<?> provider = searchFactory.getDirectoryProviders(entityClass)[0];


How do I get a name space specific DirectoryProvider using the custom sharding strategy? Do I need to use the custom strategy method getDirectoryProvidersForQuery directly?


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Wed Jul 20, 2011 9:31 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2296
Location: Third rock from the Sun
Why are you trying to get a reference to the DirectoryProvider ?
you just have to run the queries, applying a custom filter if you want to: the filters are passed over to the ShardingStrategy, so your custom strategy can look into the enabled filters, possibly read some of your custom filter parameter, and tell us which directories shall be queried according to your needs by returning the appropriate DirectoryProviders.

a rather complete example is here:
http://docs.jboss.org/hibernate/stable/ ... lter-shard

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Conditional indexing, multiple indexes / fields within index
PostPosted: Wed Jul 20, 2011 9:36 am 
Beginner
Beginner

Joined: Mon Apr 11, 2011 7:56 am
Posts: 38
s.grinovero wrote:
Why are you trying to get a reference to the DirectoryProvider ?

Because there is more than querying :) We also need references to the lucene directory / readers for MoreLikeThis queries, spell correction and auto completion.

So there is currently no easy way to get a sharded strategy depending DirectoryProvider?

(btw, our custom sharding strategy is almost finished, I fully understand its working, and it's based on the code in the doc. you mentioned)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 24 posts ]  Go to page 1, 2  Next

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.