I am manually building my lucene Index for Hibernate Search and its working correctly. My database structure is the following:
Recording - Transcript < Line < Utterance
My issue is that writing the index takes a very long time. We currently have around 2 million utterances among 10000 transcripts that we are indexing. I compile the program into an executable jar that is run on our server. I also wanted to write it in such a way, that it can be stopped and re-run without leaving an index that is incomplete or unreadable.
Does anyone see a way I can optimize this code so it doesn't take days and days to index my data? Thanks so much. I'm open to any suggestions.
Main Class
Code:
public class Main {
public static void main(String[] args) {
LuceneIndexer luceneIndexer = new LuceneIndexer();
luceneIndexer.addUtteranceToIndex();
}
}
Indexer ClassCode:
class LuceneIndexer {
public LuceneIndexer() {
}
public void addUtteranceToIndex() {
Session session = getSessionFactory().openSession();
FullTextSession fts = Search.createFullTextSession(session);
File idx = new File("./RecordingSearchIndex");
System.out.println("Index Start Time: " + now());
List<Integer> transcriptIDs = session.createQuery("select UID from TempTranscript").list();
System.out.println("Number of Transcripts: " + transcriptIDs.size());
//foreach Transcript
for (Integer transcriptID : transcriptIDs) {
TempTranscript transcript = (TempTranscript) session.createQuery("from TempTranscript where UID = " + transcriptID).uniqueResult();
List<Integer> turnIDs = session.createQuery("select UID from Turn where tempTranscript = " + transcript.getUID()).list();
System.out.println("Transcript Start Time: " + transcript.getUID() + " : " + now());
for (Integer turnID : turnIDs) {
Turn turn = (Turn) session.createQuery("from Turn where UID = " + turnID).uniqueResult();
List<Utterance> utterances = turn.getUtterances();
for (Utterance utterance : utterances) {
org.hibernate.Query query = fts.createFullTextQuery(this.buildQuery(Integer.toString(utterance.getUID())));
if (query.list().size() == 0) {
fts.getTransaction().begin();
fts.index(utterance);
fts.getTransaction().commit();
}
}
}
System.out.println("Transcript Completed Time: " + transcript.getUID() + " : " + now());
}
session.close();
System.out.println("Index Completed Time: " + now());
}