-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 
Author Message
 Post subject: IS Hibernate Search messing with lucene document ids ?
PostPosted: Sat Apr 24, 2010 3:50 pm 
Newbie

Joined: Fri Apr 11, 2008 11:45 pm
Posts: 7
Hi,

I think this is more a question for the Hibernate Search developpers :

I have some lucene filter and custom scoring code from the spatial contrib that creates a HashMap containing a lucene docid with a "search time" calculated value (geographic distance) .

Everything works fine when using lucene alone but when using Hibernate Search for indexing, it seems that the document id's are messed up. I don't get a hold on the lucene index docid anymore but rather on the entity's unique id.

Is hibernate search playing with the lucene index's document id's ?

Here is the code that populates the HashMap of doc ids with their corresponding computed distance.

Code:

/**
   * Iterates over the set bits in the given BitSet from the given start to end range, calculating the distance of the
   * documents and determining which are within the distance radius of the central point.
   *
   * @param dataSet LocationDataSet containing the document locations that can be used to calculate the distance each
   * document is from the central point
   * @param originalBitSet BitSet which has bits set identifying which documents should be checked to see if their
   * distance falls within the radius
   * @param start Index in the BitSet that the method will start at
   * @param end Index in the BitSet that the method will stop at
   * @param size Size the the resulting BitSet should be created at (most likely end - start)
   * @param reader IndexReader for checking if the document has been deleted
   * @return IterationResult containing all the results of the method.
   */
  protected IterationResult iterate(LocationDataSet dataSet, BitSet originalBitSet, int start, int end, int size, IndexReader reader) {
    BitSet bitSet = new BitSet(size);
if(true){
   try {
      throw new Exception();
   } catch (Exception e) {
      // TODO Auto-generated catch block
      e.printStackTrace();
   }
}
    Map<Integer, Double> distanceById = new HashMap<Integer, Double>();

    int docId = originalBitSet.nextSetBit(start);
 
    while (docId != -1 && docId < end) {
         System.out.println("iterationresult docid  = " + docId);
      if (reader.isDeleted(docId)) {
        docId = originalBitSet.nextSetBit(docId + 1);
        continue;
      }

      Point point = dataSet.getPoint(docId);
      double distance = distanceCalculator.calculate(lat, lng, point.getX(), point.getY(), unit);
      if (distance < radius) {
        bitSet.set(docId);
        System.out.println("docId + nextOffset" + (docId + nextOffset));
        distanceById.put(docId + nextOffset, distance);
      }

      docId = originalBitSet.nextSetBit(docId + 1);
    }
    return new IterationResult(bitSet, distanceById);
  }





It seems that the doc id is retreived with
int docId = originalBitSet.nextSetBit(start);

When Lucene alone is doing the indexing job I get the actual lucene document id but when Hibernate is doing the indexing job I get the entity's ID.

how can I modify the above code to make sure that the Lucene document id is affected to the docId variable ? I need this because the scoring code that is called later on uses the lucene index ID as a parameter in the customScore(int doc, float subQueryScore, float valSrcScore) method and therefore I can't retreive the values of the HashMap anymore.

I hope I was clear enough ...
Many thanks for helping me ! :)

Cheers,

Alex


Top
 Profile  
 
 Post subject: Re: IS Hibernate Search messing with lucene document ids ?
PostPosted: Sun Apr 25, 2010 6:02 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
When Lucene alone is doing the indexing job

What do you mean? Lucene alone doesn't "do the indexing job", you mean using your own low-level code right? How does this look like?

Hibernate Search doesn't mess with the Lucene documents ids, but these ids are potentially changing each time an index is changed and you reopen a new IndexReader; this is a Lucene limitation, Hibernate Search doesn't ever read or write the docid, but to have a reliable match with database entities it stores another identifier which doesn't change.

If you need the docids to not change for some time, make sure to not reopen the IndexReader and keep the same one: when you reuse the same indexreader you won't see updates to the index but the docids will be "stable": that's why filters are cached in a scope bound to the IndexReader lifecycle.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: IS Hibernate Search messing with lucene document ids ?
PostPosted: Mon Apr 26, 2010 3:49 am 
Newbie

Joined: Fri Apr 11, 2008 11:45 pm
Posts: 7
Thanks for your answer.

The problem is linked to this Lucene problem :

https://issues.apache.org/jira/browse/LUCENE-2190


The guys from Lucene pointed it out for me and after getting the latest version I'm working on a fix for my code.

Thanks again :)

Cheers,

Alex


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 3 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.