FullTextQuery not filling POJO instance with indexed fields

dimeo · **Joined:** Mon Nov 23, 2009 1:17 pm **Posts:** 5

I've tried searching online and through these forums, but I was unable to find a solution to my problem. Please forgive me if this is a trivial question!

I have objects called "Pages" where their metadata is stored in a SQL Server and their payload fields are indexed in Lucene. A "Bucket" is essentially a set of pages, or a topic. The class is annotated as follows:

Code:

@GenericGenerator(
   name="BigIntGenerator",
   strategy="package.BigIntGenerator"
)
@Entity
@Indexed(index="Pages")
@Table(name="Pages")
public class Page implements Cloneable {
   public enum PageType {
      TRAINING, CRAWLED, TRUE_POSITIVE, FALSE_POSITIVE, NO_CONTENT;
      public String getName()  { return name(); }
      public String getLower() { return name().toLowerCase(); }
   }
   
   @Id @GeneratedValue(strategy=GenerationType.AUTO, generator="BigIntGenerator")
   @Column(name="PageID", nullable=false)
   private BigInteger id;
   
   @Column(name="PageType", nullable=false)
   @Enumerated(EnumType.ORDINAL)
   @Field(index=Index.UN_TOKENIZED, store=Store.YES)
   private PageType type = PageType.TRAINING;
   
   @Column(name="URL")
   @Field(index=Index.UN_TOKENIZED, store=Store.YES)
   private String url;
   
   @Column(name="FetchTime")
   @Temporal(TemporalType.TIMESTAMP)
   private Date fetchTime;
   
   @Column(name="FetchResult")
   private Integer fetchResult;
   
   @Column(name="ContentType")
   private String contentType;
   
   @Transient
   @Field(index=Index.UN_TOKENIZED, store=Store.YES)
   @FieldBridge(impl=PrintBridge.class)
   private byte[] print;

   @Column(name="Confidence")
   private float confidence;
   
   @ManyToOne
   @JoinColumn(name="Bucket")
   @Field(index=Index.UN_TOKENIZED, store=Store.YES)
   @FieldBridge(impl=BucketBridge.class)
   private Bucket bucket;

   @Lob
   @Column(name="FeatureVector")
   @Fetch(FetchMode.SELECT)
   private TLongIntHashMap features;

   @Transient
   @Field(index=Index.TOKENIZED, store=Store.COMPRESS)
   private String text;
   
   @Transient
   @Field(index=Index.TOKENIZED, store=Store.YES)
   private String title;
   
   @Transient
   private String[] out;

        ....

I have marked the last 3 fields as "@Transient" so that they only exist in the index. My first question is, am I using Hibernate Search in an improper way? Should I be storing everything in SQL Server and marking everything as store=Store.NO? I set it up this way since about 4 million pages won't be storing/indexing plain text, and will only need a feature vector (the field "features") whereas the rest of the pages will need both features and plain text. I didn't want to add more columns to my DB schema when they wouldn't be used by all the pages.

I load all the pages in a Bucket with the following code:

Code:

      FullTextSession s = HibernateUtil.newSession();
      
      BooleanQuery q = new BooleanQuery();
      q.add(new TermQuery(new Term("bucket", bucket.getId())), Occur.MUST);

      FullTextQuery ftq = s.createFullTextQuery(q, Page.class);
      List<?> l = ftq.list();
      HibernateUtil.endSession(s);
      
      for (Object o : l) bucket.add((Page) o);

My second question is, when I run the above code, the Page instances only have the fields backed by SQL server initialized- the indexed fields are null. When I use projection, the indexed fields are successfully fetched from the index, but I'm trying to avoid using projection in the cases where I need the fully initialized POJO. thanks in advance for your help.

sanne.grinovero · **Posted:** Mon Nov 23, 2009 2:32 pm

Hi dimeo,
you can use a Lucene index to store information, but the recommended way is to store in a reliable database and always keep yourself the option to rebuild the index from the database.

Everytime you load a managed entity Search will help to identify which primary keys are relevant, but the POJO is created from the database fields: so your @Transient fields will never hold a value as Hibernate is in charge to initialize it.

Using projections is a good performance optimization to show previews of your results, but it can't be used to fully initialize a managed entity: this wouldn't be as safe.

In case you really don't want to add a column to schema, and you trust Lucene's index as a store, you could combine standard object loading and then add the projected values to the objects. I wouldn't recommend it, as I'm always afraid of my index getting corrupted or not being backed up properly.

dimeo · **Joined:** Mon Nov 23, 2009 1:17 pm **Posts:** 5

Thank you very much for your prompt reply and the "under the hood" explanation. I guess the solution is to just do it the right way and store everything in the DB... hopefully my manager won't be mad at me for yet another architectural change!