-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 11 posts ] 
Author Message
 Post subject: What would be the best/fastest way to accomplish this?
PostPosted: Thu Oct 17, 2013 8:30 pm 
Newbie

Joined: Thu Oct 17, 2013 8:18 pm
Posts: 13
We have some indexing servers that read messages off a queue. For each one of these messages we index a document in Solr.

Question #1)
If we have a process that spawns off X amount of threads and each thread is pulling messages off a queue, what would be the preferred session context? I'm guessing thread?

Question #2)
When we do bulk indexing we want hibernate to be as fast as possible. Our processing looks something like where the iterator is a never-ending/blocking stream of ids.

Code:
while (iterator.hasNext()) {

    Session session = HibernateUtils.getSessionFactory().getCurrentSession();
    session.beginTransaction();

    try {
        long primaryId = iterator.next();
        Entity e = getSession().createCriteria(Entity.class)..add(Restrictions.eq("id", primaryId)).setReadOnly(true);
        indexEntity(e);
    } finally {
        session.close();
    }

}


I'm pretty sure that opening and closing the transaction is non-optimal. How can I improve this? Thoughts overall?


Top
 Profile  
 
 Post subject: Re: What would be the best/fastest way to accomplish this?
PostPosted: Fri Oct 18, 2013 11:35 am 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Hi,
being on the Hibernate Search team I'd love to know more about your Solr integration :-)

But let's see your case first:

1#
A thread scoped context would work fine, but it looks like just unneccessary overhead since you're closing it for each operation. You could as well just open a new one.
Opening a Session is not very costly, it's like instantiating a couple of HashMap instances.. so I guess you should try avoiding it for peak performance as it generates objects which need garbage collection.
Still, you're generating many more objects already so it might not winning you anything.

You could at least replace
Code:
session.close()

with
Code:
session.clear()

to allow Session reuse. OR just open a new Session each time, so you save the threadlocal lookup.

2#
First problem is you're not committing the TX, you might be leaking them?
You could try and see if there are more elements in the queue before actually returning committing the current transaction (and closing the Session). In that case I would still clear the Session, but you can reuse the same one.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: What would be the best/fastest way to accomplish this?
PostPosted: Fri Oct 18, 2013 12:54 pm 
Newbie

Joined: Thu Oct 17, 2013 8:18 pm
Posts: 13
Quote:
being on the Hibernate Search team I'd love to know more about your Solr integration :-)


Sure. We're currently using Solr and its DataimportHandler to incrementally update our indexes. We've recently integrate Apache Kafka into our infrastructure so we were going for more a real-time indexing solution by firing off indexing events any time one of our indexable entities changes.

Quote:
A thread scoped context would work fine, but it looks like just unneccessary overhead since you're closing it for each operation. You could as well just open a new one.
Opening a Session is not very costly, it's like instantiating a couple of HashMap instances.. so I guess you should try avoiding it for peak performance as it generates objects which need garbage collection.
Still, you're generating many more objects already so it might not winning you anything.


Let me try and clarify something. We are spawning off a small controlled number of threads that correspond to the number of partitions in Kafka. So if we have 6 Kafka partitions to read from then we are only spawning off 6 threads. We aren't spawning off a thread for each event. Here is an example of one of our threads. I've updated the dao with your suggestions.

Code:
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;

public class KafkaConsumerThread implements Runnable {

  private final KafkaStream stream;

  public KafkaConsumerThread(KafkaStream stream) {
    this.stream = stream;
  }

  @Override
  public void run() {
    ConsumerIterator iterator = stream.iterator();

    while (iterator.hasNext()) {
      try {
            long id = pullIdFromMessage(stream.next().message());
            Foo foo = fooDao.findById(id);
            Indexer fooIndexer.index(foo);
      } catch (Exception e) {
        logger.error("Run Exception", e);
      }
    }
  }

}

public class FooDao extends BaseDao<Foo> {

  @Override
  public Wantad findById(long id) {
    Session session = getSession();
    Transaction transaction = session.beginTransaction();

    Foo wantad = null;

    try {
      foo = (Foo) session.createCriteria(Foo.class)
              .add(Restrictions.eq("id", id))
              .setReadOnly(true)
              .setMaxResults(1)
              .uniqueResult();

      transaction.commit();
    } catch (Exception e) {
      transaction.rollback();
    } finally {
      session.clear();
    }
  }

}




Quote:
First problem is you're not committing the TX, you might be leaking them?
You could try and see if there are more elements in the queue before actually returning committing the current transaction (and closing the Session). In that case I would still clear the Session, but you can reuse the same one.


So is should always commit the transaction and then use session.clear? Should I somehow keep a session open for the lifetime of the thread?


Any more suggestions?


Top
 Profile  
 
 Post subject: Re: What would be the best/fastest way to accomplish this?
PostPosted: Fri Oct 18, 2013 1:19 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
Let me try and clarify something. We are spawning off a small controlled number of threads that correspond to the number of partitions in Kafka. So if we have 6 Kafka partitions to read from then we are only spawning off 6 threads. We aren't spawning off a thread for each event. Here is an example of one of our threads. I've updated the dao with your suggestions.

Right I was assuming something like that.

Quote:
So is should always commit the transaction and then use session.clear? Should I somehow keep a session open for the lifetime of the thread?

You shouldn't control transactions in your DAO, but control them out of your while loop: as long as there are [ .hasNext() ] elements to process you stay in the same transaction, but you commit the transaction before returning from _run()_ to not leak any resource.

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: What would be the best/fastest way to accomplish this?
PostPosted: Fri Oct 18, 2013 2:07 pm 
Newbie

Joined: Thu Oct 17, 2013 8:18 pm
Posts: 13
Quote:
So is should always commit the transaction and then use session.clear? Should I somehow keep a session open for the lifetime of the thread?

You shouldn't control transactions in your DAO, but control them out of your while loop: as long as there are [ .hasNext() ] elements to process you stay in the same transaction, but you commit the transaction before returning from _run()_ to not leak any resource.[/quote]

What if that run never ends? Is it ok to keep a transaction open that long?

I'm guessing something like this would be better?

Code:
  @Override
  public void run() {
    ConsumerIterator iterator = stream.iterator();

    Session session = HibernateUtils.getSessionFactory().getCurrentSession();
    Transaction transaction = null;

    try {
      transaction = session.beginTransaction();

      while (iterator.hasNext()) {
        try {
          process((Message) iterator.next().message());
        } catch (Exception e) {
          logger.error("Processing Exception", e);
        }
      }
    } catch (Exception e) {
      logger.error("Run Exception", e);
      if (transaction != null)
        transaction.rollback();
    } finally {
      session.close();
    }
  }


Top
 Profile  
 
 Post subject: Re: What would be the best/fastest way to accomplish this?
PostPosted: Fri Oct 18, 2013 3:04 pm 
Newbie

Joined: Thu Oct 17, 2013 8:18 pm
Posts: 13
Also. Would a StatelessSession be an option here? When should one consider using a StatelessSession?


Top
 Profile  
 
 Post subject: Re: What would be the best/fastest way to accomplish this?
PostPosted: Fri Oct 18, 2013 3:23 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Yes your last example looks better.

Quote:
What if that run never ends? Is it ok to keep a transaction open that long?

It might timeout. Either set a limit in your code to occasionally close and reopen it, or change the timeout configuration. Some TransactionManagers allow you to keep the current TX alive.

Make sure that you do a session.clear() regularly.

Quote:
Also. Would a StatelessSession be an option here? When should one consider using a StatelessSession?

It would be perfect for your case, but it won't work if you need to load collections or other relations connected to the entity you're loading.

Also feel free to take some inspiration reading:
https://github.com/hibernate/hibernate-search/blob/master/orm/src/main/java/org/hibernate/search/batchindexing/impl/IdentifierConsumerEntityProducer.java

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: What would be the best/fastest way to accomplish this?
PostPosted: Fri Oct 18, 2013 5:16 pm 
Newbie

Joined: Thu Oct 17, 2013 8:18 pm
Posts: 13
First off thanks for all of your help.

Quote:
Make sure that you do a session.clear() regularly.


Can you explain why this is needed?

Quote:
It might timeout. Either set a limit in your code to occasionally close and reopen it, or change the timeout configuration. Some TransactionManagers allow you to keep the current TX alive.


What if its pretty much guaranteed that it will be messages coming in a few/per second? Will that keep the connection alive?

Quote:
It would be perfect for your case, but it won't work if you need to load collections or other relations connected to the entity you're loading.


If I do a bunch of joins will that work?

Now if I go down the route of a StatelessSession there would be no need to open up a transaction outside the DAO correct? Or should I still open up the session outside of the DAO and push it down into the object since (as far as I know) there is no way to obtain a reference to an already opened StatelessSession like there is for normal sessions via getCurrentSession()

Thanks


Top
 Profile  
 
 Post subject: Re: What would be the best/fastest way to accomplish this?
PostPosted: Fri Oct 18, 2013 8:39 pm 
Hibernate Team
Hibernate Team

Joined: Fri Oct 05, 2007 4:47 pm
Posts: 2536
Location: Third rock from the Sun
Quote:
Can you explain why this is needed?

Otherwise the Session will grow keeping your entities in memory, and will continue growing until you run out of memory (but you'd probably slow down significantly even before that).

Quote:
What if its pretty much guaranteed that it will be messages coming in a few/per second? Will that keep the connection alive?

if you just have a few messages per second, just close the transaction for each. Any decent transaction manager handles many thousand tx per second.

Quote:
If I do a bunch of joins will that work?

Then the StatelessSession might not work. Try it out?

_________________
Sanne
http://in.relation.to/


Top
 Profile  
 
 Post subject: Re: What would be the best/fastest way to accomplish this?
PostPosted: Sat Oct 19, 2013 10:08 am 
Newbie

Joined: Thu Oct 17, 2013 8:18 pm
Posts: 13
Quote:
if you just have a few messages per second, just close the transaction for each. Any decent transaction manager handles many thousand tx per second.


Sorry but what do you mean by transaction manager? Is this in reference to JPA or something? Should I be using this?

Are you saying I should just open and close the transaction for each find? Similar to...

Code:
while (iterator.hasNext()) {

    Session session = HibernateUtils.getSessionFactory().getCurrentSession();
    Transaction txn = session.beginTransaction();

    try {
        long primaryId = iterator.next();
        Entity e = getSession().createCriteria(Entity.class)..add(Restrictions.eq("id", primaryId)).setReadOnly(true);
        indexEntity(e);
        txn.commit();
    } catch (Exception e) {
        if (txn != null)
           txn.rollback();
    } finally {
        session.close();
    }

}


Top
 Profile  
 
 Post subject: Re: What would be the best/fastest way to accomplish this?
PostPosted: Tue Oct 22, 2013 8:52 pm 
Newbie

Joined: Thu Oct 17, 2013 8:18 pm
Posts: 13
Bump


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 11 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.