Hi Sanne,
Maybe it is just a mis-understanding of how i expect HS to work.
Ok to refresh you on this. Lets say i have 500 docs which get indexed with a master configuration. When doing the first run, there is an empty index. A batch index occurs and the master index is say 163K. Now i can do multiple batch index runs in the same lifecycle but the master index shall stay at 163K at the end of the run.
Now i shutdown and do a second run but we now have the original master index from our first run with a size of 163K. I do a batch index of 500 records again but at end of this run i the master index has a size of 319K. Almost double!! My question is why?? I would have expected the index to be roughly the same size as the first run.
Here is the test case adapted for 4.1.1 so you can see if i have an error in my thinking....
Code:
import java.util.Properties;
import java.util.UUID;
import javax.persistence.EntityManager;
import javax.persistence.EntityManagerFactory;
import javax.persistence.Persistence;
import org.hibernate.Session;
import org.hibernate.search.FullTextSession;
import org.hibernate.search.Search;
import org.junit.Before;
import org.junit.Test;
public class FileLeakTest {
private EntityManagerFactory emf = null;
@Before
public void setup(){
emf = Persistence.createEntityManagerFactory("cors",getProperties());
}
private static Properties getProperties(){
Properties props = new Properties();
props.put("hibernate.connection.driver_class","org.h2.Driver");
props.put("hibernate.dialect","org.hibernate.dialect.H2Dialect");
props.put("hibernate.cache.provider_class","org.hibernate.cache.NoCacheProvider");
props.put("hibernate.jdbc.charSet","UTF-8");
props.put("hibernate.hbm2ddl.auto","create-drop");
props.put("hibernate.connection.url","jdbc:h2:mem:test");
props.put("hibernate.connection.username","sa");
props.put("hibernate.connection.password","");
props.put("hibernate.search.default.sourceBase","C:/lucene/fileleaktest/shared");
props.put("hibernate.search.default.indexBase","C:/lucene/fileleaktest/master");
props.put("hibernate.search.default.refresh","300");
props.put("hibernate.search.default.directory_provider","filesystem-master");
return props;
}
public FullTextSession getFulltextSession(){
EntityManager em = emf.createEntityManager();
Session session = (Session) em.getDelegate();
FullTextSession fullTextSession = Search.getFullTextSession(session);
return fullTextSession;
}
public void createDocuments(int size){
FullTextSession session = getFulltextSession();
try{
session.getTransaction().begin();
for(int i = 0; i < size; i++){
HibDocument d = new HibDocument(UUID.randomUUID().toString(),UUID.randomUUID().toString(),UUID.randomUUID().toString());
session.persist(d);
}
session.flush();
session.getTransaction().commit();
}finally{
session.close();
}
}
public void batchIndex(){
System.out.println("About to reindex");
FullTextSession session = getFulltextSession();
try {
//tried with trans and without- same effect
session.getTransaction().begin();
session.createIndexer().batchSizeToLoadObjects(30)
.threadsForSubsequentFetching(4)
.threadsToLoadObjects(2)
.startAndWait();
session.getTransaction().commit();
} catch (InterruptedException e) {
e.printStackTrace();
}finally{
session.close();
}
}
@Test
public void testForLeaks(){
//on second run we re-create again, not ideal, can use filebased Db here but i dont see this as an issue but can adjust it
createDocuments(500);
System.out.println("Created Docs");
int loops = 5;
int counter = 0;
while(counter < loops){
try {
batchIndex();
Thread.currentThread().sleep(5000);
} catch (InterruptedException e) {
e.printStackTrace();
}
counter++;
}
System.out.println("Finito");
}
}
When i run the above once with an empty index i see a master size of
Quote:
$ du -h .
163K .
ls -l
total 163
----------+ 1 0 mk 38896 Aug 8 14:05 _5.fdt
----------+ 1 0 mk 4004 Aug 8 14:05 _5.fdx
----------+ 1 0 mk 51 Aug 8 14:05 _5.fnm
----------+ 1 0 mk 15536 Aug 8 14:05 _5.frq
----------+ 1 0 mk 1504 Aug 8 14:05 _5.nrm
----------+ 1 0 mk 8500 Aug 8 14:05 _5.prx
----------+ 1 0 mk 991 Aug 8 14:05 _5.tii
----------+ 1 0 mk 78875 Aug 8 14:05 _5.tis
----------+ 1 0 mk 20 Aug 8 14:05 segments.gen
----------+ 1 0 mk 240 Aug 8 14:05 segments_c
----------+ 1 0 mk 0 Aug 8 14:05 write.lock
Now i run the test case again for the second time and note there is a master index existing from our first run. After it completes i see
Quote:
du -h .
319K .
ls -l
total 319
----------+ 1 0 mk 38896 Aug 8 14:05 _5.fdt
----------+ 1 0 mk 4004 Aug 8 14:05 _5.fdx
----------+ 1 0 mk 15536 Aug 8 14:05 _5.frq
----------+ 1 0 mk 1504 Aug 8 14:05 _5.nrm
----------+ 1 0 mk 8500 Aug 8 14:05 _5.prx
----------+ 1 0 mk 78875 Aug 8 14:05 _5.tis
----------+ 1 0 mk 38896 Aug 8 14:09 _b.fdt
----------+ 1 0 mk 4004 Aug 8 14:09 _b.fdx
----------+ 1 0 mk 51 Aug 8 14:09 _b.fnm
----------+ 1 0 mk 15547 Aug 8 14:09 _b.frq
----------+ 1 0 mk 1504 Aug 8 14:09 _b.nrm
----------+ 1 0 mk 8500 Aug 8 14:09 _b.prx
----------+ 1 0 mk 1008 Aug 8 14:09 _b.tii
----------+ 1 0 mk 78888 Aug 8 14:09 _b.tis
----------+ 1 0 mk 20 Aug 8 14:09 segments.gen
----------+ 1 0 mk 240 Aug 8 14:09 segments_n
----------+ 1 0 mk 0 Aug 8 14:05 write.lock
Now i would have expected the master index size to stay around 163K and not nearly double. (Also bear in mind, i have trimmed down this scenario, i see this with a 10g index being doubled to 20g, not ideal).
Is this expected behaviour from HS or is this a bug? As i said my thinking is, if i index 500 records, it should be in and around the same size every time.(Obviously without a big fluctuation on what it is storing in the 500).
Hope thats clearer for you.
Thanks for the support,
LL