I've tried searching online and through these forums, but I was unable to find a solution to my problem. Please forgive me if this is a trivial question!
I have objects called "Pages" where their metadata is stored in a SQL Server and their payload fields are indexed in Lucene. A "Bucket" is essentially a set of pages, or a topic. The class is annotated as follows:
Code:
@GenericGenerator(
name="BigIntGenerator",
strategy="package.BigIntGenerator"
)
@Entity
@Indexed(index="Pages")
@Table(name="Pages")
public class Page implements Cloneable {
public enum PageType {
TRAINING, CRAWLED, TRUE_POSITIVE, FALSE_POSITIVE, NO_CONTENT;
public String getName() { return name(); }
public String getLower() { return name().toLowerCase(); }
}
@Id @GeneratedValue(strategy=GenerationType.AUTO, generator="BigIntGenerator")
@Column(name="PageID", nullable=false)
private BigInteger id;
@Column(name="PageType", nullable=false)
@Enumerated(EnumType.ORDINAL)
@Field(index=Index.UN_TOKENIZED, store=Store.YES)
private PageType type = PageType.TRAINING;
@Column(name="URL")
@Field(index=Index.UN_TOKENIZED, store=Store.YES)
private String url;
@Column(name="FetchTime")
@Temporal(TemporalType.TIMESTAMP)
private Date fetchTime;
@Column(name="FetchResult")
private Integer fetchResult;
@Column(name="ContentType")
private String contentType;
@Transient
@Field(index=Index.UN_TOKENIZED, store=Store.YES)
@FieldBridge(impl=PrintBridge.class)
private byte[] print;
@Column(name="Confidence")
private float confidence;
@ManyToOne
@JoinColumn(name="Bucket")
@Field(index=Index.UN_TOKENIZED, store=Store.YES)
@FieldBridge(impl=BucketBridge.class)
private Bucket bucket;
@Lob
@Column(name="FeatureVector")
@Fetch(FetchMode.SELECT)
private TLongIntHashMap features;
@Transient
@Field(index=Index.TOKENIZED, store=Store.COMPRESS)
private String text;
@Transient
@Field(index=Index.TOKENIZED, store=Store.YES)
private String title;
@Transient
private String[] out;
....
I have marked the last 3 fields as "@Transient" so that they only exist in the index. My first question is, am I using Hibernate Search in an improper way? Should I be storing everything in SQL Server and marking everything as store=Store.NO? I set it up this way since about 4 million pages won't be storing/indexing plain text, and will only need a feature vector (the field "features") whereas the rest of the pages will need both features and plain text. I didn't want to add more columns to my DB schema when they wouldn't be used by all the pages.
I load all the pages in a Bucket with the following code:
Code:
FullTextSession s = HibernateUtil.newSession();
BooleanQuery q = new BooleanQuery();
q.add(new TermQuery(new Term("bucket", bucket.getId())), Occur.MUST);
FullTextQuery ftq = s.createFullTextQuery(q, Page.class);
List<?> l = ftq.list();
HibernateUtil.endSession(s);
for (Object o : l) bucket.add((Page) o);
My second question is, when I run the above code, the Page instances only have the fields backed by SQL server initialized- the indexed fields are null. When I use projection, the indexed fields are successfully fetched from the index, but I'm trying to avoid using projection in the cases where I need the fully initialized POJO. thanks in advance for your help.