-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 9 posts ] 
Author Message
 Post subject: Batch data insertion issue. Search isn't indexing all fields
PostPosted: Fri Feb 24, 2012 8:42 am 
Newbie

Joined: Sat Nov 05, 2011 6:10 am
Posts: 12
Hi,
I experience an issue while indexing my data in a batch.
I want to index an Article list, with some @IndexedEmbedded on members where i need to get info

Here are my beans

Article.java
Code:
@Entity
@Table(name = "article", catalog = "test")
@Indexed(index="articleText")
@Analyzer(impl = FrenchAnalyzer.class)
public class Article implements java.io.Serializable {
   @Id
   @GeneratedValue(strategy = IDENTITY)
   @Column(name = "id", unique = true, nullable = false)
   @DocumentId
   private Integer id;
   
   @ManyToOne(fetch = FetchType.LAZY)
   @JoinColumn(name = "firstpageid", nullable = false)
   @IndexedEmbedded
   private Page page;

   @Column(name = "heading", length = 300)
   @Field(name= "title", index = Index.YES, store = Store.YES)
   @Boost(2.5f)
   private String heading;
   
   @Column(name = "subheading", length = 300)
   private String subheading;

   @OneToOne(fetch = FetchType.LAZY, mappedBy = "article")   
   @IndexedEmbedded
   private Articlefulltext articlefulltext;
[... bean method ...]


Page.java
Code:
@Entity
@Table(name = "page", catalog = "test")
public class Page implements java.io.Serializable {
   private Integer id;
   @IndexedEmbedded
   private Issue issue;
   @ContainedIn
   private Set<Article> articles = new HashSet<Article>(0);
[... bean method ...]



Articlefulltext.java
Code:
@Entity
@Table(name = "articlefulltext", catalog = "test")
@Analyzer(impl = FrenchAnalyzer.class)
public class Articlefulltext implements java.io.Serializable {
   @GenericGenerator(name = "generator", strategy = "foreign", parameters = @Parameter(name = "property", value = "article"))
   @Id
   @GeneratedValue(generator = "generator")
   @Column(name = "aid", unique = true, nullable = false)
   private int aid;

   @OneToOne(fetch = FetchType.LAZY)
   @PrimaryKeyJoinColumn
   @ContainedIn
   private Article article;
   
   @Column(name = "fulltextcontents", nullable = false)
   @Field(store=Store.YES, index=Index.YES, analyzer = @Analyzer(impl = FrenchAnalyzer.class), bridge= @FieldBridge(impl = FulltextSplitBridge.class))
   private String fulltextcontents;
[... bean method ...]


I set log4j logging level to debug :
Code:
2012-02-24;15:08:06;Version;[INFO];HSEARCH000034: Hibernate Search 4.0.0.Final
;2012-02-24;15:08:06;ConfigContext;[DEBUG];Setting Lucene compatibility to Version LUCENE_34
;2012-02-24;15:08:06;ConfigContext;[DEBUG];Using default similarity implementation: org.apache.lucene.search.DefaultSimilarity
;2012-02-24;15:08:06;DirectoryProviderHelper;[DEBUG];Initialize index: '/appli/chrusr/web/lucene/articleText'
;2012-02-24;15:08:06;WorkspaceFactory;[DEBUG];Starting workspace for index articleText using an exclusive index strategy
;2012-02-24;15:08:06;AvroSerializationProvider;[INFO];HSEARCH000079: Serialization protocol version 1.0
;2012-02-24;15:08:06;DocumentBuilderIndexedEntity;[DEBUG];Found JPA id and using it as document id
;2012-02-24;15:08:06;DocumentBuilderIndexedEntity;[DEBUG];Found JPA id and using it as document id
;2012-02-24;15:08:06;DocumentBuilderIndexedEntity;[DEBUG];Found JPA id and using it as document id
;2012-02-24;15:08:06;DocumentBuilderIndexedEntity;[DEBUG];Found JPA id and using it as document id
;2012-02-24;15:08:06;DocumentBuilderIndexedEntity;[DEBUG];Found JPA id and using it as document id
;2012-02-24;15:08:06;DocumentBuilderIndexedEntity;[DEBUG];Found JPA id and using it as document id
;2012-02-24;15:08:06;DocumentBuilderIndexedEntity;[DEBUG];Field selection in projections is set to true for entity org.litis.plair.domain.model.Article.
;2012-02-24;15:08:06;FullTextIndexEventListener;[DEBUG];Hibernate Search event listeners activated
;2012-02-24;15:08:06;FullTextIndexEventListener;[DEBUG];Hibernate Search dirty checks enabled
;2012-02-24;15:08:18;JobAltoOCRImport;[DEBUG];Beginning content update, looking in /home/plair/ptiff/xml/
;2012-02-24;15:08:20;JobAltoOCRImport;[DEBUG];done. Successfully added 1 xml files into Mysql for 1 issue(s)



When i look at the resulting lucene Index, i have some fields about both Article and Page objects, but none about ArticleFulltext, but i have correct data in my database, which means that the persist() operation is done correctly ... I really need some help here, because i don't see in what there is a difference between my Page and ArticleFullText ...


Last edited by t.palfray on Thu Dec 06, 2012 6:01 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: @IndexedEmbedded does't seems to work for one bean
PostPosted: Mon Feb 27, 2012 10:27 am 
Newbie

Joined: Sat Nov 05, 2011 6:10 am
Posts: 12
Here is the batch sourcecode.

Code:
   EntityManager em = null;
   
   @Override
   protected void executeInternal(JobExecutionContext arg0) throws JobExecutionException {
      ApplicationContext ap = null;
      EntityManagerFactory emf = null;
      EntityTransaction tx = null;

      
      try {
         ap = (ApplicationContext) arg0.getScheduler().getContext().get("applicationContext");
         emf = (EntityManagerFactory) ap.getBean("entityManagerFactory", EntityManagerFactory.class);
         em = emf.createEntityManager();
         tx = em.getTransaction();
         

         tx.begin();
                   // [... em.persist() some things which aren't lucene related, so i skip them ....]
         for(File xmlFile : xmlList){
            Reel reel = new Reel(title, reelpath);
            em.persist(reel);
                      Article article = new Article();
                           // [... set Article fields, so i skip them ....]
                      Articlefulltext ft = new Articlefulltext();
                           // [... set Articlefulltext fields, so i skip them ....]
                      ft.setArticle(article);
                      ft.setFulltextcontents(bufferBlock.toString());
                      em.persist(ft); // i persist ft before article because of FK issues
                      em.persist(article); // there, the Annotation update Lucene index, but there's not updating fultextContent (see my first post)
            if ( nbFileDone % 50 == 0 ) {
               //flush a batch of inserts and release memory:
               em.flush();
               em.clear();
            }
         }
              tx.commit();
      }
      catch(Exception e){
         tx.rollback();
      }
      em.close();
   }


Last edited by t.palfray on Thu Dec 06, 2012 6:03 am, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: @IndexedEmbedded does't seems to work for one bean
PostPosted: Wed Feb 29, 2012 10:42 am 
Newbie

Joined: Sat Nov 05, 2011 6:10 am
Posts: 12
If i use a MassIndexer, data are correctly indexed, but i don't need a full index rebuilding, as i have millions of documents inside it.


Last edited by t.palfray on Thu Dec 06, 2012 6:04 am, edited 2 times in total.

Top
 Profile  
 
 Post subject: Re: @IndexedEmbedded does't seems to work for one bean
PostPosted: Thu Dec 06, 2012 5:47 am 
Newbie

Joined: Sat Nov 05, 2011 6:10 am
Posts: 12
i rewrite this post, i hope my problem is more comprehensive now ?


Top
 Profile  
 
 Post subject: Re: Batch data insertion issue. Search isn't indexing all fields
PostPosted: Thu Dec 06, 2012 5:54 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi, did you really ask that in February? I'm embarassed that we didn't notice it.

On your code:
could you try using session.flushToIndexes() before clearing the session.

http://docs.jboss.org/hibernate/search/4.1/reference/en-US/html_single/#search-batchindex-flushtoindexes

I'm not sure why you would find some fields but not that one. I assume you're sure the field isn't an empty string or null?

And could you test a more recent version of Hibernate Search?

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Batch data insertion issue. Search isn't indexing all fields
PostPosted: Fri Dec 07, 2012 4:16 am 
Newbie

Joined: Sat Nov 05, 2011 6:10 am
Posts: 12
Yes i asked in feb ;)

I put a temp solution by using the MassIndexer, but the lucene index is growing, so i cannot use this solution anymore, this is why i'm back !
The example you give by link are using Session of Hibernate, the sample code i gave use JPA entityManager for updating. I'll first try to update Hibernate Search before moving the code from JPA to Hibernate ...

You're right, my field aren't empty. In fact, my bridge is never call.


Top
 Profile  
 
 Post subject: Re: Batch data insertion issue. Search isn't indexing all fields
PostPosted: Fri Dec 07, 2012 4:59 am 
Newbie

Joined: Sat Nov 05, 2011 6:10 am
Posts: 12
No change by updating both hibernate/hibernate search to 4.1.1

your solutions using flushtoIndexes is the same as using MassIndexer.... I don't want to use that because my users won't be able to search on the existing entire index while I rebuild the whole index and i could take days, since a I have millions of Documents...

I tried to use Hibernate Session by using :

Code:
      ApplicationContext ap = null;
      EntityManagerFactory emf = null;
      Transaction tx = null;

      
      try {
         ap = (ApplicationContext) arg0.getScheduler().getContext().get("applicationContext");
         emf = (EntityManagerFactory) ap.getBean("entityManagerFactory", EntityManagerFactory.class);
         em = Search.getFullTextEntityManager(emf.createEntityManager());
         
         HibernateEntityManager hem = em.unwrap(HibernateEntityManager.class);
         Session session = hem.getSession();
         
         FullTextSession fullTextSession = org.hibernate.search.Search.getFullTextSession(session);
         
         tx = fullTextSession.beginTransaction();


But it doesn't change anything ...


Top
 Profile  
 
 Post subject: Re: Batch data insertion issue. Search isn't indexing all fields
PostPosted: Sun Dec 09, 2012 10:11 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
I assume this mystery was resolved by Hardy on SO ?

http://stackoverflow.com/questions/13743915/hibernate-search-indexing-uncompleted-documents

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: Batch data insertion issue. Search isn't indexing all fields
PostPosted: Sun Dec 09, 2012 12:04 pm 
Newbie

Joined: Sat Nov 05, 2011 6:10 am
Posts: 12
Yes i forgot to say here that Hardy gave me the solution yesterday.
And your comment there answer to my other question. Thank you so much !


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 9 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.