s.grinovero wrote:
Hi Stéphane,
one year ago I was also complaining about some performance issues, when I managed to argument with Emmanuel (project leader) about some patches I wanted to integrate I've become a contributer and after a while commiter.
I've since then contributed more than 50 bugfixes and improvements, but I've a "daywork" too to manage, so sorry if I'm sometimes slow to answer but I'm very interested in your findings if you share them.
Hey, I'm never complaining when volunteers reply, don't apologize :)
s.grinovero wrote:
If you have code patches for open JIRA's it would be great if you could attach them, I'm sure Emmanuel will consider them for integration, or give some feedback when he'll be back (from much deserved vacations).
I have already sent several patches that helped us a lot:
-
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-5 in order to stop propagating index changes to @ContainedIn entities when the @IndexedEmbedded's index hasn't changed. This one made our code go from O(N) to O(1) when modifying non-indexed properties in @ContainedIn entities. This effectively means we simply cannot not use HS without this patch because if we have N objects in our database, adding a new one will cost N :(
-
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-183 fixes a bug where @IndexedEmbedded entities with no prefix would cause our index to be corrupted and delete random index documents
-
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-185 which enables us to select what we embed otherwise we end up with corrupted indexes.
In effect all these patches sent by me a while ago never got picked up by HS and I've had several talks about them with Emmanuel who had arguments against them. We cannot use HS without those patches so we have a forked version in house which everyone uses. I always sent my patches to open source projects, but without any feedback in return there is little motivation for me to keep sending patches in or modify those patches so that they are accepted. Indeed a discussion about why they still are not in and what I have to change to get them included would be a very good sign.
s.grinovero wrote:
If you are deepening yourself in the Search code I'd really suggest to checkout the trunk, where new features and speed improvements are coming. These patches are about performance and I've several experiments which are going to be committed during june:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-327http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-218 (partially committed already, discussions and test welcome)
I think you could help with this one, as it's not complex and you can look to the RAMDirectoryProvider
for an example of similar code:
http://opensource.atlassian.com/projects/hibernate/browse/HSEARCH-275https://www.hibernate.org/462.htmlI'd be happy to help, but I've looked at them and they do not seem directly related to my problem. My employer allows me to contribute to open source projects in my work time provided the work is tightly linked with our problems at hand. I'm going to try your fixes since they look very useful, but what I need right now is to be able to stop HS indexing while I do my batch upload to make it really asynchronous. I don't need to make it go faster, but to get out of the way ;)
Free time is something I just don't have anymore :(
s.grinovero wrote:
Quote:
When I enable it (with the same 100 batch_size) I get delays when flushing my session while HS is indexing, and my insertion rate drops at 15 entities per second.
You're loading additional data for @IndexedEmbedded right?
Not even in this case!
s.grinovero wrote:
looks reasonable, you're limited by the delay the database introduces when having to execute the second query. This doesn't mean you're limited to 15entities/seconds of course, if you start 10 different tests in parallel you're probably going to scale to 150entities/second, as the time your system is waiting for the database answer it's idle, you're not burning resources. I do exploit this concept for some impressive numbers for the automatic indexing routines (HSEARCH-218): 12000 entities/second for a simple graph, 4000 entities / second having 7 kinds of collections embedded (on a laptop).
This is a user uploading an excel spreadsheet where excel rows map to DB entities. This is J2EE and I don't think it's a wise place to start putting threads unfortunately. Also this is running in a single transaction which I need in case I roll back. I am very interested in your automatic indexing though since when we upgrade our web application our production DB is fairly small (100k records tops) and reindexing takes about 20 minutes. But once again I hope HS is going to manage the threads itself as it shouldn't be a user problem.
s.grinovero wrote:
You could try "isolating" the performance using the new blackhole in trunk, it doesn't have any other practical purpose than to make this kind of measurements. It's easily backported too:
http://fisheye.labs.jboss.com/viewrep/Hibernate/search/trunk/src/main/java/org/hibernate/search/backend/impl/blackhole/BlackHoleBackendQueueProcessorFactory.java?r=16310I can try, but I have a real suspicion that it's the frontend bothering me, not the backend ;)
s.grinovero wrote:
To express ideas about new features and improvement, it's better if you share them on the hibernate-dev list, so that we are sure that others will see them too.
Ah, perhaps this is why I had so little feedback from JIRA ;)
I'll look there, thanks a lot for all your answers once again :)