-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 19 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Massive indexing problem
PostPosted: Sun Nov 20, 2011 4:46 pm 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi all,

When i try to index all my data in mysql with hibenate search i get the too many connections exception.

Can anybody tell me how can i call the massiveindexer without having this exception?

I read the index performance in some books and maybe it has to be per class. Other option is to adjust the massive indexer.

Which option is the best? anybody has an example of this working?

Thanks in advance.

Hibernator.


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Mon Nov 21, 2011 9:46 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi Hibernator,
yes the thread options you pass to the MassIndexer are per class, more specifically per class hierarchy (i.e. if you have two types, one being parent of the other, they will use the same pipeline). Each massindexing pipeline is using the amount of threads you specify: it might be useful to make sure you don't configure too many threads, or easier to implement is to scope the MassIndexer to a smaller group of entities each time.

I'm thinking in changing some details to make it simpler; I have two options:
1) Make sure different pipelines don't run concurrently, but in sequence
2) Share the same threadpools across different pipelines
3) Other?

let me know if you have some suggestion.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Wed Nov 23, 2011 3:56 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi,

I have tested several options and after this works for an entity with 100000 rows and lazy=true in entity.hbm.xml).


Code:
hibernate.cfg.xml:
      <property name="hibernate.search.default.indexwriter.batch.max_buffered_docs">20</property>
   <property name="hibernate.search.default.indexwriter.batch.merge_factor">2</property>
   <property name="hibernate.search.default.exclusive_index_use">true</property>
   <property name="hibernate.search.worker.backend">blackhole</property>



Code:
                sesion.purgeAll(Entity.class);
      sesion.flushToIndexes();
      sesion.getSearchFactory().optimize(Entity.class);
      sesion.clear();

      try {
         sesion.createIndexer(Entity.class)
                 .cacheMode(CacheMode.IGNORE)
               .startAndWait();


This morning, i am going to test another entity with more associations.....let's see if it works...

Regards,


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Wed Nov 23, 2011 5:25 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi all,

If i run the test everything works fine...but if i deploy the .ear in my jboss it gives an error like this one:

ERROR [org.hibernate.search.batchindexing.IdentifierConsumerEntityProducer] (Hibernate Search: entityloader-1) error during batch indexing: : java.lang.ClassCastException: com.abcde.entity.myEntity_$$_javassist_53 cannot be cast to javassist.util.proxy.ProxyObject
at org.hibernate.proxy.pojo.javassist.JavassistLazyInitializer.getProxy(JavassistLazyInitializer.java:147) [:3.6.6.Final]
at org.hibernate.proxy.pojo.javassist.JavassistProxyFactory.getProxy(JavassistProxyFactory.java:71) [:3.6.6.Final]
at org.hibernate.tuple.entity.AbstractEntityTuplizer.createProxy(AbstractEntityTuplizer.java:631) [:3.6.6.Final]
at org.hibernate.persister.entity.AbstractEntityPersister.createProxy(AbstractEntityPersister.java:3736) [:3.6.6.Final]
at org.hibernate.event.def.DefaultLoadEventListener.createProxyIfNecessary(DefaultLoadEventListener.java:360) [:3.6.6.Final]
at org.hibernate.event.def.DefaultLoadEventListener.proxyOrLoad(DefaultLoadEventListener.java:281) [:3.6.6.Final]
at org.hibernate.event.def.DefaultLoadEventListener.onLoad(DefaultLoadEventListener.java:152) [:3.6.6.Final]
at org.hibernate.impl.SessionImpl.fireLoad(SessionImpl.java:1090) [:3.6.6.Final]
at org.hibernate.impl.SessionImpl.internalLoad(SessionImpl.java:1038) [:3.6.6.Final]
at org.hibernate.type.EntityType.resolveIdentifier(EntityType.java:630) [:3.6.6.Final]
at org.hibernate.type.EntityType.resolve(EntityType.java:438) [:3.6.6.Final]
at org.hibernate.engine.TwoPhaseLoad.initializeEntity(TwoPhaseLoad.java:139) [:3.6.6.Final]
at org.hibernate.loader.Loader.initializeEntitiesAndCollections(Loader.java:982) [:3.6.6.Final]
at org.hibernate.loader.Loader.doQuery(Loader.java:857) [:3.6.6.Final]
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:274) [:3.6.6.Final]
at org.hibernate.loader.Loader.doList(Loader.java:2533) [:3.6.6.Final]
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2276) [:3.6.6.Final]
at org.hibernate.loader.Loader.list(Loader.java:2271) [:3.6.6.Final]
at org.hibernate.loader.criteria.CriteriaLoader.list(CriteriaLoader.java:119) [:3.6.6.Final]
at org.hibernate.impl.SessionImpl.list(SessionImpl.java:1716) [:3.6.6.Final]
at org.hibernate.impl.CriteriaImpl.list(CriteriaImpl.java:347) [:3.6.6.Final]
at org.hibernate.search.batchindexing.IdentifierConsumerEntityProducer.loadList(IdentifierConsumerEntityProducer.java:141) [:3.4.0.Final]
at org.hibernate.search.batchindexing.IdentifierConsumerEntityProducer.loadAllFromQueue(IdentifierConsumerEntityProducer.java:110) [:3.4.0.Final]
at org.hibernate.search.batchindexing.IdentifierConsumerEntityProducer.run(IdentifierConsumerEntityProducer.java:87) [:3.4.0.Final]
at org.hibernate.search.batchindexing.OptionallyWrapInJTATransaction.run(OptionallyWrapInJTATransaction.java:107) [:3.4.0.Final]
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [:1.6.0_26]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [:1.6.0_26]
at java.lang.Thread.run(Thread.java:662) [:1.6.0_26]


any ideas???

i didn't find anything that clear my mind....

I have javassist in my pom.xml maybe this is the problem.

EDIT: I also have problems when i have a property marked as @containedIn....it makes like a million of sqls to the database....any help?

thanks,


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Wed Nov 23, 2011 6:48 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
that looks like you are having multiple javassist instances on your classpath. I'd suggest to use tattletale to double-check your classpath.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Wed Nov 23, 2011 8:08 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Ok,

iI have change my pom.xml like this:

<dependency>
<groupId>org.javassist</groupId>
<artifactId>javassist</artifactId>
<version>3.14.0-GA</version>
<scope>test</scope>
</dependency>

and i think that way is going to work.

I'll let you know my results!

Thanks in advance,


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Wed Nov 23, 2011 9:49 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
everything seems to work fine. all my hbm files have laze = true;

thanks,


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Thu Nov 24, 2011 3:12 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi,

I have a mysql (Innodb schema) table with 200000 rows and several associations marked as lazy = true. I have the jboss in a machine with 2.5GB of Memory. I try to index all rows and after 24 hours only 25 % is done. The process is going more and more slow (0.5 documents/seconds). At this moment, I have 80 MB free of memory.

Everything seems to works fine but the problem is the speed.

any ideas? the memory? the massindexer performance?


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Thu Nov 24, 2011 10:05 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
there's a known limitation with MySQL: the database driver is so nice to ignore our request to scroll, and will load the full resultset in memory;
The design of the MassIndexer is so that it only opens a single scrollable resultset to load the primary key of to-be-indexed entities only, the other elements are loaded via paged queries.

Look into org.hibernate.search.batchindexing.impl.IdentifierProducer : I'm afraid that the scrollableresult in there when used with MySQL is going to load all your PKs in memory, so your system is performing badly as it's under high need of free memory.

Could you please try using

Code:
setFetchSize(Integer.MIN_VALUE);


instead of the .setFetchSize( 100 ); ??

Please let me know, so we can improve the MySQL compatibility: I've opened https://hibernate.onjira.com/browse/HSEARCH-983 to track this, but will need your feedback! thanks.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Fri Nov 25, 2011 3:44 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi,

I'll do the test and I'll let you know my results.

I have seen the scroll problem in mysql. There is a performance code that index each document at a time using the scroll but it didn't work in MYsql and that's the reason why i decided to use the massindexer.

The problem now is to optimize the process.

I'll let you know...

Hibernator.


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Fri Nov 25, 2011 8:16 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi,

I have created a new test to try to make this work. As you told me, I have used:


Code:
setFetchSize(Integer.MIN_VALUE);


it does not work giving an exception for the scroll. It seems that Integer.min_value is not correct for FecthSize.

I am trying with
Code:
setFetchSize(1);


I'll let you know.

Thanks,


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Fri Nov 25, 2011 8:54 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
this is the code that i have:

Code:
    @Test
    public void indexerLearningScroll()
    {
        FullTextSession fulltextsession = Search.getFullTextSession(session);
       
        fulltextsession.purgeAll(EntidadDocumentalEntity.class);
        fulltextsession.flushToIndexes();
        fulltextsession.getSearchFactory().optimize(myEntity.class);
        fulltextsession.clear();
       
        fulltextsession.beginTransaction();
       
        Criteria query = fulltextsession.createCriteria(myEntity.class)
            .setResultTransformer(CriteriaSpecification.DISTINCT_ROOT_ENTITY)
            .setCacheMode(CacheMode.IGNORE)
            .setFetchSize(1)
            .setFlushMode(FlushMode.MANUAL);
       
        ScrollableResults scroll = query.scroll(ScrollMode.FORWARD_ONLY);
       
        int batch = 0;
        scroll.beforeFirst();
        while (scroll.next())
        {
            batch++;
            fulltextsession.index(scroll.get(0));
            if(batch % BATCH_SIZE == 0)
            {
                fulltextsession.flushToIndexes();
                fulltextsession.clear();
            }
        }
        fulltextsession.getTransaction().commit();
   }


how could I see the statistics to know the speed and the progress?

regards,


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Mon Nov 28, 2011 3:56 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Hi,

I launched the process this weekend and it works with:

Code:
setFetchSize=1


But I had this property in my hibernate.cfg.xml:

<property name="hibernate.search.worker.backend">blackhole</property>

and the index was created but it has no rows.

So, now i have deleted this property and i have launched another time the process...it seems to work fine...i'll let you know.

regards,


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Mon Nov 28, 2011 5:51 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
I'm wondering why it worked with a value of 1. Possibly it happened because you where going very slowly, and so the memory was "just enough".
I'd suggest to have your vm log garbage collector while you run it again.. also consider that when really indexing you're going to need more memory.

The approach you're using now can be quite slower than using the MassIndexer; in an extreme case I had a reduction from 20h to 3 minutes.. it's not always that big of an improvement, but I'm just saying that it might be worth to investigate if setting the value to 1 would work as well in the MassIndexer.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Massive indexing problem
PostPosted: Mon Nov 28, 2011 6:22 am 
Regular
Regular

Joined: Thu Jun 16, 2011 12:03 pm
Posts: 94
Now it seems to be working this way...and it is a little bit slow..but i need the data indexed as sooner as possible..

How could i say to massindexer to use setFetchSize=1?

Thanks,


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 19 posts ]  Go to page 1, 2  Next

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.