-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 14 posts ] 
Author Message
 Post subject: How I can rebuild index for new entries periodically?
PostPosted: Mon Jul 12, 2010 5:31 am 
Beginner
Beginner

Joined: Thu Jun 24, 2010 2:30 am
Posts: 23
My boss wants that search index rebuilt every month or week. He is afraid that sometimes, listeners will not update index, regards to bugs and etc. With the time the index can became incorrect. So he wants to rebuild index periodically and this rebuild should update all entries changed from last build. For example rebuild index every first days of the must, but rebuild only the entries updated during last month. How I can do it?


Top
 Profile  
 
 Post subject: Re: How I can rebuild index for new entries periodically?
PostPosted: Mon Jul 12, 2010 6:03 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

if you don't want to rely on the listeners and you only want to reindex entities which have changed during a certain time frame you will need a way to write a HQL/Criteria query to retrieve the entities which have changed in this time frame. This could for example be a last modified timestamp in the entity. One way or the other you need a way to identify the changed entities (unless you reindex all).
If real time indexing is not an issue you can always rebuild the whole index at given intervals. Using the MassIndexer API you can achieve quite amazing indexing speeds. Check the online manual for mass indexing.

--Hardy


Top
 Profile  
 
 Post subject: Re: How I can rebuild index for new entries periodically?
PostPosted: Thu Jul 15, 2010 4:58 am 
Beginner
Beginner

Joined: Thu Jun 24, 2010 2:30 am
Posts: 23
Thank you, hardy. I've tried to MassIndexer, but it fails on big DBs. Test DB has several small tables and a huge one with 30 million records in it. It fails with the following message:

java.lang.OutOfMemoryError: requested 131072000 bytes for GrET in C:\BUILD_AREA\jdk6_17\hotspot\src\share\vm\utilities\growableArray.cpp. Out of swap space?

I tried to run it with lower MassIndexer parameters ( batchSizeToLoadObjects=threadsToLoadObjects=threadsForSubSequentFetching=1, CacheMode.IGNORE ) to reduce memory usage. It doesn't help. what other way I had to reduce memory used by HibernateSearch?


Top
 Profile  
 
 Post subject: Re: How I can rebuild index for new entries periodically?
PostPosted: Thu Jul 15, 2010 5:10 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Hi,

Can you give some more details? The whole idea of the mass indexer is to handle the situation you are mentioning. I know that some people have used it successfully to index millions of records. What JVM are you using with which startup parameters. What are the specs of the machine you are indexing on? How do your annotated entities look like and how do you start the indexer? Also, how does the full stacktrace look like?

--Hardy


Top
 Profile  
 
 Post subject: Re: How I can rebuild index for new entries periodically?
PostPosted: Thu Jul 15, 2010 5:53 am 
Beginner
Beginner

Joined: Thu Jun 24, 2010 2:30 am
Posts: 23
the issue don't produce stackTrace, only message from JVM. Here is it:
Code:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# java.lang.OutOfMemoryError: requested 131072000 bytes for GrET in C:\BUILD_AREA\jdk6_17\hotspot\src\share\vm\utilities\growableArray.cpp. Out of swap space?
#
#  Internal Error (allocation.inline.hpp:39), pid=4560, tid=5348
#  Error: GrET in C:\BUILD_AREA\jdk6_17\hotspot\src\share\vm\utilities\growableArray.cpp
#
# JRE version: 6.0_17-b04
# Java VM: Java HotSpot(TM) Client VM (14.3-b01 mixed mode windows-x86 )
# An error report file with more information is saved as:
# C:\SintecDevel\workspace\OnBoard\hs_err_pid4560.log
#
# If you would like to submit a bug report, please visit:
#   http://java.sun.com/webapps/bugreport/crash.jsp
#


The message contains version of JVM and JRE. I'm run it on WindowsXP 32-bit with 3.5G of RAM and 300G of hard disk. Before the test I have 2.5G of RAM free and 250G free on hard disk.
The test executed from eclipse with '-Xmx1200M' argument. I've also find in one of the forums that '-Xss512k' can help and added it.

The index started with this code:
Code:
//Load hibernate configuration and receiving hibernate search session
Configuration config =
   new AnnotationConfiguration().configure( "hibernatealone.cfg.xml" );
config.setNamingStrategy( ImprovedNamingStrategy.INSTANCE );
SessionFactory sessions = config.buildSessionFactory();          
Session session = sessions.openSession();
FullTextSession fullTextSession = Search.getFullTextSession( session );
boolean result = false;
   
//Building index
Date date = new Date();
System.out.println(
   "Starting to build Lucene index for Hibernate Search: " + date );
try {
   fullTextSession.createIndexer().batchSizeToLoadObjects( 1
         ).threadsForSubsequentFetching( 1
            ).threadsToLoadObjects( 1
                  ).cacheMode( CacheMode.IGNORE
                     ).startAndWait();
   result = true;
} catch( Exception e) {         
   result = false;
}
fullTextSession.getSessionFactory().close();
System.out.println(
      ( result ? "Success" : "Error" ) +
      " building Lucene index for Hibernate Search. " +
      ( new Date().getTime() - date.getTime() ) + " ms" );


hibernatealone.cfg.xml:
Code:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hibernate-configuration PUBLIC
      "-//Hibernate/Hibernate Configuration DTD 3.0//EN"
      "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">
<hibernate-configuration>
   <session-factory>
      <property name="hibernate.connection.url">jdbc:mysql://127.0.0.1:3306/tesths?autoReconnect=true</property>
      <property name="hibernate.connection.driver_class">com.mysql.jdbc.Driver</property>
      <property name="hibernate.dialect">org.hibernate.dialect.MySQL5Dialect</property>
      <property name="hibernate.connection.username">test</property>
      <property name="hibernate.connection.password">test</property>      
      <property name="hibernate.show_sql">false</property>
      <!-- property name="hibernate.hbm2ddl.auto">validate</property -->
      <property name="hibernate.cache.use_query_cache">false</property>
      <property name="hibernate.cache.provider_class">org.hibernate.cache.EhCacheProvider</property>
      <property name="hibernate.cache.use_second_level_cache">false</property>                       
      <property name="hibernate.search.default.directory_provider">org.hibernate.search.store.FSDirectoryProvider</property>
      <property name="hibernate.search.default.indexBase">C:\\workspace\\Test\\HibernateSearch\\indexes</property>
      <property name="hibernate.search.default.exclusive_index_use">true</property>             
   
      <mapping class="com.mycompany.Groups" />
      <mapping class="com.mycompany.model.Company" />
      <mapping class="com.mycompany.model.CompanyType" />
      <mapping class="com.mycompany.model.Role" />
      <mapping class="com.mycompany.model.User" />                  
      <mapping class="com.mycompany.model.PrefixCurrency" />
      <mapping class="com.mycompany.model.PrefixLang" />
      <mapping class="com.mycompany.model.PrefixTimezone" />
      <mapping class="com.mycompany.model.DatePatternType" />
      <mapping class="com.mycompany.model.View" />
      <mapping class="com.mycompany.model.ObjectPermission" />
      <mapping class="com.mycompany.model.ObjectEntryPermission" />
      <mapping class="com.mycompany.model.ObjectClass" />
      <mapping class="com.mycompany.model.SpotOrder" />
      <mapping class="com.mycompany.model.ObjectEntry" />      
   </session-factory>
</hibernate-configuration>


There are many indexed classes, but only SpotOrders has 30 million records and fails. I see in log that indexing fails on this table. Morever, other classes has maximum 100 records, so I give here only code for classes related to SpotOrders. If you want to see additional classes write in forum and I'll give it too. May be it will be better if I'll attach sources to topic, but I don't know how do it in this forum.
Code:
@MappedSuperclass
public abstract class BaseEntityImpl extends HashCodeValidator
      implements BaseEntity {
   private static final long serialVersionUID = 1645059492245740266L;
   @Override
   @Transient   
   @Fields( {
      @Field( index=Index.TOKENIZED, store=Store.NO ),
      @Field( name="free_search", index=Index.TOKENIZED, store=Store.NO ) } )
   public Class<?> getClassType() {
      return
         ( this instanceof HibernateProxy ) ?
               this.getClass().getSuperclass() : this.getClass();
   }      
   
   @Override
   @Id   @GeneratedValue(strategy = GenerationType.AUTO)
   @Column(precision = 10, unique = true, nullable = false, updatable = false)
   public Long getId() {
      return super.getId();
   }

   @Override
   public void setId(Long id) {
      super.setId(id);
   }
      
}

Code:
@MappedSuperclass
public abstract class CompanyBasedSearchEntity extends BaseEntityImpl
      implements BaseEntity {
   
   private static final long serialVersionUID = 3953368748208085084L;

   @Fields( {
      @Field( index=Index.TOKENIZED, store=Store.NO ),      
      @Field( name="free_search", index=Index.TOKENIZED, store=Store.NO ) } )
   @FieldBridge(impl = CompanyBridge.class)
   @ManyToOne( cascade = { CascadeType.PERSIST, CascadeType.MERGE }, fetch = FetchType.EAGER)
   @org.hibernate.annotations.Cascade({org.hibernate.annotations.CascadeType.SAVE_UPDATE})
   @JoinColumn(name = "company_id", nullable = true  )      
   public Company getCompany(){ return company; };         
   public void setCompany( Company company ){ this.company = company; }   
   protected Company company;
   
}

Code:
@Entity
@Table(name = "spot_orders")
@Cache(usage =  CacheConcurrencyStrategy.READ_WRITE)
@Indexed
public class SpotOrder extends CompanyBasedSearchEntity implements Searchable {

   private static final long serialVersionUID = -1493851936988449752L;
   
   @Fields( {
      @Field( index=Index.TOKENIZED, store=Store.NO ),
      @Field( name="free_search", index=Index.TOKENIZED, store=Store.NO ) } )
   public String getName() { return this.name;   }
   public void setName(String name) { this.name = name; }
   private String name;

}

Code:
@Entity
@Table(name = "users")
@Indexed
@FullTextFilterDefs( {
   @FullTextFilterDef(
         name = "securityFilter", impl = SecurityFilterFactory.class ),
   @FullTextFilterDef(
         name = "companyFilter", impl = CompanyFilterFactory.class )
} )
@Cache(usage =  CacheConcurrencyStrategy.READ_WRITE)
public class User extends CompanyBasedSearchEntity
      implements Searchable, Comparable<User> {
   private static final long                  serialVersionUID   = -559024047L;

   private Boolean                           enabled;
   private Boolean                           status;
   private String                           name;
   private String                           password;
   private PrefixTimezone                     timeZone;
   private String                           defaultPattern;
   private String                           imageUrl = "sampleFace.jpg";

   public User() {}

   public User(Long id) { setId(id); }
   
   @ManyToMany(
      fetch = FetchType.EAGER,
      cascade = {CascadeType.DETACH, CascadeType.MERGE, CascadeType.PERSIST, CascadeType.REFRESH} )
   @JoinTable(
      name = "group_members",
      joinColumns = @JoinColumn(name = "user_id"),
      inverseJoinColumns = @JoinColumn(name = "group_id") )
   @IndexedEmbedded
   public Set<Groups> getGroups() { return groups; }

   public void setGroups(Set<Groups> groups) {
      this.groups = groups;
   }

   private Set<Groups>   groups;

   @Transient
   public Set<Role> getRoles() {
      Set<Role> roles = new HashSet<Role>();
      if (groups != null) {
         for (Groups group : groups) {
            roles.addAll((group.getRoles()));
         }
      }
      return roles;
   };

   @Transient
   public Set<View> getViews() {
      Set<View> views = new HashSet<View>();
      Set<Role> roles = getRoles();
      for (Role role : roles) {
         views.addAll((role.getViews()));
      }
      return views;
   };

   @Transient
   @IndexedEmbedded
   public Set<ObjectPermission> getObjectPermissions() {
      Set<ObjectPermission> perms = new HashSet<ObjectPermission>();
      Set<Role> roles = getRoles();
      for( Role role : roles ) {
         perms.addAll( role.getObjectPermissions().values() );
      }
      return perms;
   };

   public Boolean isEnabled() { return this.enabled; }

   public void setEnabled(final Boolean enabled) {   this.enabled = enabled;   }

   public Boolean isStatus() { return status;   }

   public void setStatus(final Boolean status) { this.status = status;    }

   @Column(length = 45)
   @Fields( {
      @Field( index=Index.TOKENIZED, store=Store.NO ),
      @Field( name="free_search", index=Index.TOKENIZED, store=Store.NO ) } )
   public String getName() { return this.name; }

   public void setName(final String name) {   this.name = name; }

   public String getPassword() {   return this.password; }

   public void setPassword(final String password) { this.password = password; }

   @ManyToOne(cascade = { CascadeType.PERSIST, CascadeType.MERGE }, fetch = FetchType.EAGER)
   @JoinColumn(name = "time_zone_id", nullable = true)
   public PrefixTimezone getTimeZone() {
      return this.timeZone == null ? getCompany().getTimeZone() : this.timeZone;
   }

   public void setTimeZone(final PrefixTimezone timeZone) { this.timeZone = timeZone; }

   @Column(name = "default_pattern")
   @Fields( {
      @Field( index=Index.TOKENIZED, store=Store.NO ),
      @Field( name="free_search", index=Index.TOKENIZED, store=Store.NO ) } )
   public String getDefaultPattern() {   return defaultPattern;   }

   public void setDefaultPattern(String defaultPattern) {
      this.defaultPattern = defaultPattern;
   }

   @Column(name = "image_url")
   @Fields( {
      @Field( index=Index.TOKENIZED, store=Store.NO ),
      @Field( name="free_search", index=Index.TOKENIZED, store=Store.NO ) } )
   public String getImageUrl() {
      return imageUrl;
   }

   public void setImageUrl(String imageUrl) {   this.imageUrl = imageUrl; }

   @Transient
   public Boolean getEnabled() {   return this.enabled; }
   
   @Override
   public int compareTo(User other) {   return name.compareTo(other.getName()); }
   
   @Override
   public String toString(){ return name; }
}


Last edited by igorg on Sun Jul 18, 2010 6:24 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: How I can rebuild index for new entries periodically?
PostPosted: Thu Jul 15, 2010 5:58 am 
Beginner
Beginner

Joined: Thu Jun 24, 2010 2:30 am
Posts: 23
I've forgot to write that my hibernate search version is 3.2.2 and hibernate version is 3.5.1-Final.
BTW here is the log before the test fails:
Code:
2010-07-15 12:42:51,140 INFO  (                  Version.java:40)     - Hibernate Search 3.2.0.Final
2010-07-15 12:42:51,140 INFO  (                  Version.java:40)     - Hibernate Search 3.2.0.Final
Starting to build Lucene index for Hibernate Search: Thu Jul 15 12:42:51 IDT 2010
2010-07-15 12:42:51,922 INFO  (SimpleIndexingProgressMonitor.java:65)     - Going to reindex 4 entities
2010-07-15 12:42:51,922 INFO  (SimpleIndexingProgressMonitor.java:65)     - Going to reindex 4 entities
2010-07-15 12:42:51,922 INFO  (SimpleIndexingProgressMonitor.java:65)     - Going to reindex 6 entities
2010-07-15 12:42:51,922 INFO  (SimpleIndexingProgressMonitor.java:65)     - Going to reindex 6 entities
2010-07-15 12:42:51,922 INFO  (SimpleIndexingProgressMonitor.java:65)     - Going to reindex 51 entities
2010-07-15 12:42:51,922 INFO  (SimpleIndexingProgressMonitor.java:65)     - Going to reindex 51 entities
2010-07-15 12:42:51,937 INFO  (SimpleIndexingProgressMonitor.java:65)     - Going to reindex 7 entities
2010-07-15 12:42:51,937 INFO  (SimpleIndexingProgressMonitor.java:65)     - Going to reindex 7 entities
2010-07-15 12:43:00,671 INFO  (SimpleIndexingProgressMonitor.java:74)     - 50 documents indexed in 8171 ms
2010-07-15 12:43:00,671 INFO  (SimpleIndexingProgressMonitor.java:74)     - 50 documents indexed in 8171 ms
2010-07-15 12:43:00,671 INFO  (SimpleIndexingProgressMonitor.java:77)     - Indexing speed: 6.119202 documents/second; progress: 73.52941%
2010-07-15 12:43:00,671 INFO  (SimpleIndexingProgressMonitor.java:77)     - Indexing speed: 6.119202 documents/second; progress: 73.52941%
2010-07-15 12:43:19,312 INFO  (SimpleIndexingProgressMonitor.java:65)     - Going to reindex 28830000 entities
2010-07-15 12:43:19,312 INFO  (SimpleIndexingProgressMonitor.java:65)     - Going to reindex 28830000 entities


Top
 Profile  
 
 Post subject: Re: How I can rebuild index for new entries periodically?
PostPosted: Sun Jul 18, 2010 5:44 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
Which database are you using? In case you are using MySQL there is a known issue with scrollable result sets.
Otherwise I would they that your machine seems to be at the low end regarding the hardware requirements. 1GB of heap space is quite low, but it should work.


Top
 Profile  
 
 Post subject: Re: How I can rebuild index for new entries periodically?
PostPosted: Sun Jul 18, 2010 5:55 am 
Beginner
Beginner

Joined: Thu Jun 24, 2010 2:30 am
Posts: 23
I've seen known issue using 'scroll', but my issue is index build.


Top
 Profile  
 
 Post subject: Re: How I can rebuild index for new entries periodically?
PostPosted: Sun Jul 18, 2010 6:01 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
the mass indexer uses a scrolling result set internally.


Top
 Profile  
 
 Post subject: Re: How I can rebuild index for new entries periodically?
PostPosted: Sun Jul 18, 2010 6:02 am 
Beginner
Beginner

Joined: Thu Jun 24, 2010 2:30 am
Posts: 23
So, what can I do with it. Is MySQL fixed this issue?


Top
 Profile  
 
 Post subject: Re: How I can rebuild index for new entries periodically?
PostPosted: Sun Jul 18, 2010 7:34 am 
Hibernate Team
Hibernate Team

Joined: Thu Apr 05, 2007 5:52 am
Posts: 1689
Location: Sweden
I don't think the problem is fixed. I think there are some parameters you can specify on the connection url. Have a look at the mysql forum.
If you can try to run your application against another database to confirm that the problem lies with mysql.


Top
 Profile  
 
 Post subject: Re: How I can rebuild index for new entries periodically?
PostPosted: Sun Jul 18, 2010 8:26 am 
Beginner
Beginner

Joined: Thu Jun 24, 2010 2:30 am
Posts: 23
I've checked at MySQL forum and they didn't fixed the issue. They thinks that it isn't issue: "There are very few cases where it makes sense to transfer the entirety of a table to a client. The WHERE clause, and projections are your friends. Make the database do the heavy data lifting, and leave the business logic to your application layer.
". Besides of that they wrote that right uses of JDBC prevents the problem: "((com.mysql.jdbc.Statement)statement).enableStreamingResults();". but I can't use it, since my statement executed by MassIndexer and not by my code. I prefer not to override the code that I don't understand. Can you hint me about any solution or workaround to this problem?


Top
 Profile  
 
 Post subject: Possible solution
PostPosted: Tue Jul 20, 2010 1:28 am 
Beginner
Beginner

Joined: Thu Jun 24, 2010 2:30 am
Posts: 23
I think, I found the solution. Adding
Code:
&useServerPrepStmts=true&useCursorFetch=true

to connection string, causes results streaming by default.


Top
 Profile  
 
 Post subject: Re: How I can rebuild index for new entries periodically?
PostPosted: Wed Jul 21, 2010 7:19 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi, was your problem solved?

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 14 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.