-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 
Author Message
 Post subject: DeleteLuceneWork and Sharding strategy
PostPosted: Fri Jul 10, 2009 10:50 am 
Newbie

Joined: Fri Jul 10, 2009 10:48 am
Posts: 2
We are using Hibernate Search 3.2.0 and using a custom sharding strategy. We've run into an issue where we cannot make updates to the index because updates are made by doing a delete + add. The problem is that our custom sharding strategy keys off a property in the Entity being indexed (other than the entity id).

This becomes a problem when updating because for an update, a DeleteLuceneWork and AddLuceneWork are added to the work queue. The problem occurs when the work processor calls our custom sharding strategy class to get the directory provider. Since the getDirectoryProvidersForDeletion() method does not take a Document (only the entity class and the entity id) there is no way to determine what directory provider to return. Consequently, the delete fails but the add succeeds, and we end up with multiple copies of the entity in the Lucene Index.

It seems to me that this is a problem that would be faced by anyone implementing a custom sharding strategy that depends on a value in the entity being indexed (apart from entity id).There are a few possible solutions I can think of.

a) Use a composite-key for the entity id, with one element of the id being the value used to determing sharding
b) Somehow, plug-in a custom DeleteLuceneWork class whose constructor takes in a document. Currently, the DeleteLuceneWork constructor does not take a document. There would also need to be an overloaded getDirectoryProvidersForDeletion() method.

Has anyone else had to deal with a similar situation and how did you solve it? Also, is this the correct approach to take or is there some other facility/method of accomplishing this that we may not be aware of?

[edit]I left out one part. By default, getDirectoryProvidersForDeletion() returns all directory providers. We can't do this because in our case the entityid is only unique within each shard (it could exist in multiple shards) and doing this would delete it from all shards, and not the shard in question[/edit]

Thanks and Regards

Will Kimeria


Top
 Profile  
 
 Post subject: Re: DeleteLuceneWork and Sharding strategy
PostPosted: Wed Jul 15, 2009 3:16 pm 
Newbie

Joined: Fri Jul 10, 2009 10:48 am
Posts: 2
Hello all,

I made a few changes to the Hibernate Search code in order to have the entity document added to the DeleteLuceneWork and available when retrieving the getDirectoryProvidersForDeletion from our custom Sharding Strategy implementation. This also necessitated a change to the IndexShardingStrategy to add an overloaded getDirectoryProvidersForDeletion method that takes a document. Below is the code. My question is twofold

a) Apart from breaking backwards compatibility due to the change to the IndexShardingStrategy interface, can anyone see any glaring problems with this?
b) Would there be any interest in integrating this changes into Hibernate Search?
c) Is there a better way of accomplishing this (getting hibernate search to work with a custom sharding strategy that depends on a value in the entity other than the id)?


Below is the code from the patch file created






Index: trunk/src/main/java/org/hibernate/search/backend/DeleteLuceneWork.java
===================================================================
--- trunk/src/main/java/org/hibernate/search/backend/DeleteLuceneWork.java (revision 17060)
+++ trunk/src/main/java/org/hibernate/search/backend/DeleteLuceneWork.java (working copy)
@@ -3,20 +3,26 @@

import java.io.Serializable;

+import org.apache.lucene.document.Document;
+
/**
* @author Emmanuel Bernard
*/
public class DeleteLuceneWork extends LuceneWork implements Serializable {
-
+
private static final long serialVersionUID = -854604138119230246L;

public DeleteLuceneWork(Serializable id, String idInString, Class entity) {
super( id, idInString, entity );
}

+ public DeleteLuceneWork(Serializable id, String idInString, Class entity, Document document) {
+ super( id, idInString, entity, document );
+ }
+
@Override
public <T> T getWorkDelegate(final WorkVisitor<T> visitor) {
return visitor.getDelegate( this );
}
-
+
}
Index: trunk/src/main/java/org/hibernate/search/backend/impl/lucene/DpSelectionVisitor.java
===================================================================
--- trunk/src/main/java/org/hibernate/search/backend/impl/lucene/DpSelectionVisitor.java (revision 17060)
+++ trunk/src/main/java/org/hibernate/search/backend/impl/lucene/DpSelectionVisitor.java (working copy)
@@ -14,11 +14,11 @@
* Only implementation of WorkVisitor<DpSelectionDelegate>,
* using a visitor/selector pattern for different implementations of addAsPayLoadsToQueue
* depending on the type of LuceneWork.
- *
+ *
* @author Sanne Grinovero
*/
public class DpSelectionVisitor implements WorkVisitor<DpSelectionDelegate> {
-
+
private final AddSelectionDelegate addDelegate = new AddSelectionDelegate();
private final DeleteSelectionDelegate deleteDelegate = new DeleteSelectionDelegate();
private final OptimizeSelectionDelegate optimizeDelegate = new OptimizeSelectionDelegate();
@@ -39,7 +39,7 @@
public DpSelectionDelegate getDelegate(PurgeAllLuceneWork purgeAllLuceneWork) {
return purgeDelegate;
}
-
+
private static class AddSelectionDelegate implements DpSelectionDelegate {

public void addAsPayLoadsToQueue(LuceneWork work,
@@ -54,7 +54,7 @@
}

}
-
+
private static class DeleteSelectionDelegate implements DpSelectionDelegate {

public void addAsPayLoadsToQueue(LuceneWork work,
@@ -62,7 +62,8 @@
DirectoryProvider<?>[] providers = shardingStrategy.getDirectoryProvidersForDeletion(
work.getEntityClass(),
work.getId(),
- work.getIdInString()
+ work.getIdInString(),
+ work.getDocument()
);
for (DirectoryProvider<?> provider : providers) {
queues.addWorkToDpProcessor( provider, work );
@@ -70,7 +71,7 @@
}

}
-
+
private static class OptimizeSelectionDelegate implements DpSelectionDelegate {

public void addAsPayLoadsToQueue(LuceneWork work,
@@ -82,7 +83,7 @@
}

}
-
+
private static class PurgeAllSelectionDelegate implements DpSelectionDelegate {

public void addAsPayLoadsToQueue(LuceneWork work,
Index: trunk/src/main/java/org/hibernate/search/engine/DocumentBuilderIndexedEntity.java
===================================================================
--- trunk/src/main/java/org/hibernate/search/engine/DocumentBuilderIndexedEntity.java (revision 17060)
+++ trunk/src/main/java/org/hibernate/search/engine/DocumentBuilderIndexedEntity.java (working copy)
@@ -322,19 +322,19 @@
}
else if ( workType == WorkType.DELETE || workType == WorkType.PURGE ) {
String idInString = idBridge.objectToString( id );
- queue.add( new DeleteLuceneWork( id, idInString, entityClass ) );
+ queue.add( createDeleteWork( entityClass, entity, id, idInString ) );
}
else if ( workType == WorkType.PURGE_ALL ) {
queue.add( new PurgeAllLuceneWork( entityClass ) );
}
else if ( workType == WorkType.UPDATE || workType == WorkType.COLLECTION ) {
String idInString = idBridge.objectToString( id );
- queue.add( new DeleteLuceneWork( id, idInString, entityClass ) );
+ queue.add( createDeleteWork( entityClass, entity, id, idInString ) );
queue.add( createAddWork( entityClass, entity, id, idInString, false ) );
}
else if ( workType == WorkType.INDEX ) {
String idInString = idBridge.objectToString( id );
- queue.add( new DeleteLuceneWork( id, idInString, entityClass ) );
+ queue.add( createDeleteWork( entityClass, entity, id, idInString ) );
queue.add( createAddWork( entityClass, entity, id, idInString, true ) );
}
else {
@@ -357,6 +357,14 @@
return addWork;
}

+ public DeleteLuceneWork createDeleteWork(Class<T> entityClass, T entity, Serializable id, String idInString) {
+ Map<String, String> fieldToAnalyzerMap = new HashMap<String, String>();
+ Document doc = getDocument( entity, id, fieldToAnalyzerMap );
+ DeleteLuceneWork deleteWork;
+ deleteWork = new DeleteLuceneWork( id, idInString, entityClass, doc );
+ return deleteWork;
+ }
+
/**
* Builds the Lucene <code>Document</code> for a given entity <code>instance</code> and its <code>id</code>.
*
Index: trunk/src/main/java/org/hibernate/search/store/IndexShardingStrategy.java
===================================================================
--- trunk/src/main/java/org/hibernate/search/store/IndexShardingStrategy.java (revision 17060)
+++ trunk/src/main/java/org/hibernate/search/store/IndexShardingStrategy.java (working copy)
@@ -35,6 +35,8 @@
*/
DirectoryProvider<?>[] getDirectoryProvidersForDeletion(Class<?> entity, Serializable id, String idInString);

+ DirectoryProvider<?>[] getDirectoryProvidersForDeletion(Class<?> entity, Serializable id, String idInString, Document document);
+
/**
* return the set of DirectoryProvider(s) where the entities matching the filters are stored
* this optional optimization allows queries to hit a subset of all shards, which may be useful for some datasets


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 2 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.