-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 16 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Recursive CTE based sequence generators
PostPosted: Fri Aug 18, 2017 9:48 am 
Newbie

Joined: Wed Dec 16, 2015 8:09 am
Posts: 14
I'm wondering if ever consideration was given on using recursive CTE as sequence generators. On the surface they seem to combine almost all advantages of the best of the existing sequence generators (minimal database round trips due to prefetching, batch support, no impact on non-Hibernate database access code, the column values match the values generated by the sequence) with only limited database support as a disadvantage. The SQL is not as portable as one would like it to be but solving this is exactly the job Hibernate.

I wrote a prototype and it seems to work so far. I would appreciate feedback on the subject.


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Fri Aug 18, 2017 1:07 pm 
Hibernate Team
Hibernate Team

Joined: Thu Sep 11, 2014 2:50 am
Posts: 1628
Location: Romania
There are two problems with this approach:

1. When you need to restart the DB, the sequence number will start from 1 after a restart, right?
2. The identifier generator must work concurrently, so how can you make it available for multiple database connections? Otherwise, each DB connection uses its own generator so you get identifier conflicts.


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Sat Aug 19, 2017 4:54 am 
Newbie

Joined: Wed Dec 16, 2015 8:09 am
Posts: 14
vlad wrote:
1. When you need to restart the DB, the sequence number will start from 1 after a restart, right?

To the best of my knowledge the last value of a database sequence is persistent and not rolled back. This is even the case in the face of the confusingly named CACHE attribute. This actually causes sequence numbers to be preallocated on the server side. This also causes the last cached value to be committed. If you do not use all cached values or restart the DB they are not handed out again but lost/wasted.

https://docs.oracle.com/database/122/SQLRF/CREATE-SEQUENCE.htm#SQLRF01314 wrote:
If a system failure occurs, then all cached sequence values that have not been used in committed DML statements are lost. The potential number of lost values is equal to the value of the CACHE parameter.


https://www.postgresql.org/docs/current/static/sql-createsequence.html wrote:
So, any numbers allocated but not used within a session will be lost when that session ends, resulting in "holes" in the sequence.


https://docs.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql wrote:
When created with the CACHE option, an unexpected shutdown (such as a power failure) may result in the loss of sequence numbers remaining in the cache.


vlad wrote:
2. The identifier generator must work concurrently, so how can you make it available for multiple database connections? Otherwise, each DB connection uses its own generator so you get identifier conflicts.

True but hi/lo, pooled and pooledlo have the same issue. That's why they are all synchronized. I use a java.util.concurrent.locks.Lock to achieve the same thing. In theory a ConcurrentLinkedQueue could be used in order to avoid the locking and avoid a possible spot for contention. As this has not popped up so far for hi/lo, pooled and pooledlo I assumed this is not an issue.


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Sat Aug 19, 2017 10:58 am 
Hibernate Team
Hibernate Team

Joined: Thu Sep 11, 2014 2:50 am
Posts: 1628
Location: Romania
Quote:
So, any numbers allocated but not used within a session will be lost when that session ends, resulting in "holes" in the sequence.


Gaps in a Sequence values is never an issue. The only thing you care is getting unique values. Check out SQL Antipatterns by Bill Karwin.

Quote:
True but hi/lo, pooled and pooledlo have the same issue. That's why they are all synchronized. I use a java.util.concurrent.locks.Lock to achieve the same thing. In theory a ConcurrentLinkedQueue could be used in order to avoid the locking and avoid a possible spot for contention. As this has not popped up so far for hi/lo, pooled and pooledlo I assumed this is not an issue.


That's not true either. For pooled and pooled-lo, the sequence value is always going to be increasing, even in case of a crash. It does not matter if the in-between which are chosen by Hibernate are lost during a crash because the next sequence call will always provide a greater value, so the sequence is never restarted by a crash. The CTE approach will be.

Using java.util.concurrent for Concurrency Control to emulate a DB is not a good idea either. The Java locks are guaranteed for a single JVM. What if you run N nodes of your application in an auto-scaling fashion? The DB sequence and IDENTITY are guaranteed to work no matter how many application nodes or DB connections you are spawning.


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Sat Aug 19, 2017 11:59 am 
Newbie

Joined: Wed Dec 16, 2015 8:09 am
Posts: 14
I believe I didn't explain the approach well enough. The CTE would only be used to fetch multiple values from a sequence in one database round trip. The values itself would come from a sequence.

If you look at
Code:
WITH RECURSIVE t(n, level_num) AS (
    SELECT NEXT VALUE FOR seq_xxx as n, 1 as level_num
        UNION ALL
    SELECT NEXT VALUE FOR seq_xxx as n, level_num + 1 as level_num
    FROM t
    WHERE level_num <= ?)
SELECT n FROM t;

We're selecting n which the result of NEXT VALUE FOR seq_xxx.

vlad wrote:
Gaps in a Sequence values is never an issue.

That's I'm saying.

vlad wrote:
That's not true either.

They are very much synchronized
https://github.com/hibernate/hibernate-orm/blob/master/hibernate-core/src/main/java/org/hibernate/id/enhanced/PooledOptimizer.java#L69
https://github.com/hibernate/hibernate-orm/blob/master/hibernate-core/src/main/java/org/hibernate/id/enhanced/PooledLoOptimizer.java#L56

vlad wrote:
For pooled and pooled-lo, the sequence value is always going to be increasing, even in case of a crash.

Same for CTE as it uses sequence as well.

vlad wrote:
It does not matter if the in-between which are chosen by Hibernate are lost during a crash because the next sequence call will always provide a greater value, so the sequence is never restarted by a crash.

Same for CTE as it uses sequence as well.

vlad wrote:
The CTE approach will be.

No it won't it's relying on sequences just like pooled and pooled-lo. If sequences were restarted by a crash then pooled and pooled-lo would not work as well.

vlad wrote:
Using java.util.concurrent for Concurrency Control to emulate a DB is not a good idea either. The Java locks are guaranteed for a single JVM. What if you run N nodes of your application in an auto-scaling fashion? The DB sequence and IDENTITY are guaranteed to work no matter how many application nodes or DB connections you are spawning.

I'm not using java.util.concurrent for Concurrency Control to emulate a DB. I'm using java.util.concurrent to control in JVM access to objects shared across multiple threads like pooled and pooled-lo optimizers do.


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Sat Aug 19, 2017 12:51 pm 
Hibernate Team
Hibernate Team

Joined: Thu Sep 11, 2014 2:50 am
Posts: 1628
Location: Romania
But why do you need to fetch the values?

Pooled works with a single sequence call per batch of identifers. Check out this article for an explanation.

And both pooled and pooled-lo are crash safe, meaning you get monotonic increasing values. You can test it if you don't beleive it.


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Sat Aug 19, 2017 1:56 pm 
Newbie

Joined: Wed Dec 16, 2015 8:09 am
Posts: 14
vlad wrote:
But why do you need to fetch the values?

Mostly for batch inserts, so that I don't have to do a database round trip for every id.

vlad wrote:
Pooled works with a single sequence call per batch of identifers. Check out this article for an explanation.

The way I understand pooled is that it forces me to set the INCREMENT BY value on the sequence to how many values I want to have in the "pool". If somebody other than Hibernate is calling NEXT VALUE FOR on the sequence he has to be aware of the algorithm and able to implement it otherwise he'll end up with a lot wasted sequence numbers.

vlad wrote:
And both pooled and pooled-lo are crash safe, meaning you get monotonic increasing values. You can test it if you don't beleive it.

I believe that because I believe that database sequences are crash safe.


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Sat Aug 19, 2017 2:24 pm 
Hibernate Team
Hibernate Team

Joined: Thu Sep 11, 2014 2:50 am
Posts: 1628
Location: Romania
Quote:
The way I understand pooled is that it forces me to set the INCREMENT BY value on the sequence to how many values I want to have in the "pool". If somebody other than Hibernate is calling NEXT VALUE FOR on the sequence he has to be aware of the algorithm and able to implement it otherwise he'll end up with a lot wasted sequence numbers.


Nope. That was the case for hi/lo. The pooled and pooled-lo fixed that because the sequence value is the hi or the low boundary of the identifiers being assigned. If you read the article, there's actually an example with a 3rd party client that has no idea of this algorithm and everything works just fine.


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Sun Aug 20, 2017 7:11 am 
Newbie

Joined: Wed Dec 16, 2015 8:09 am
Posts: 14
vlad wrote:
Nope. That was the case for hi/lo. The pooled and pooled-lo fixed that because the sequence value is the hi or the low boundary of the identifiers being assigned. If you read the article, there's actually an example with a 3rd party client that has no idea of this algorithm and everything works just fine.

I had a look at this picture for pooled-lo Image

The sequence values Hibernate is getting are: 1, 6, 26. For every value Hibernate generates the 5 following values.
The values the external system is getting are: 11, 16, 21. The external system is generating a lot of identifier waste, to generate 3 ids it wastes 12 ids.

I don't understand how the sequence can be made to return values 5 apart other than setting INCREMENT BY to 5 on the sequence. Especially since the external system is getting values 5 apart it really looks to me as if INCREMENT BY is set 5.


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Sun Aug 20, 2017 3:08 pm 
Hibernate Team
Hibernate Team

Joined: Thu Sep 11, 2014 2:50 am
Posts: 1628
Location: Romania
Pooled and pooled-lo work on the fact that the SEQUENCE increments with a gap of N, where N is the amount of values you can generate with a single sequence call.

There's no waste. You can have gaps and that doesn't hurt your application in any way. Actually, you can always have gaps, even with an INCREMENT BY of 1 when you rollback a transaction that already incremented the sequence value (sequences are non-transactional for very good reasons).


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Mon Aug 21, 2017 5:23 am 
Newbie

Joined: Wed Dec 16, 2015 8:09 am
Posts: 14
vlad wrote:
Pooled and pooled-lo work on the fact that the SEQUENCE increments with a gap of N, where N is the amount of values you can generate with a single sequence call.

How is that done? To the best of my knowledge a single sequence call always returns exactly one value. This value being the last generated value incremented by INCREMENT BY.
vlad wrote:
There's no waste. You can have gaps and that doesn't hurt your application in any way.

Doesn't that depend on the amount of waste you have? If you have 90% or more waste you may have to increase the size of the datatype of the column.


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Wed Aug 23, 2017 3:44 am 
Hibernate Team
Hibernate Team

Joined: Thu Sep 11, 2014 2:50 am
Posts: 1628
Location: Romania
With a 32-bit integer, if you start from 0, you have 2147483647 values. If you have an increment size of 10, you can still save 200 million rows.

If you start from the minimum absolute value, then you have 4 billion values available. If you have an increment size of 10, you can still save 400 million rows.

So, it depends on how much data you plan on saving in your tables. If you really have billions of rows, then you probably want to use bigint anyway.


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Wed Aug 23, 2017 6:18 am 
Newbie

Joined: Wed Dec 16, 2015 8:09 am
Posts: 14
vlad wrote:
With a 32-bit integer, if you start from 0, you have 2147483647 values. If you have an increment size of 10, you can still save 200 million rows.

So you have to set the increment by on the sequence?


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Wed Aug 23, 2017 6:49 am 
Hibernate Team
Hibernate Team

Joined: Thu Sep 11, 2014 2:50 am
Posts: 1628
Location: Romania
For pooled and pooled-lo, yes, of course.


Top
 Profile  
 
 Post subject: Re: Recursive CTE based sequence generators
PostPosted: Wed Aug 23, 2017 8:44 am 
Newbie

Joined: Wed Aug 23, 2017 8:28 am
Posts: 1
Hi Vlad

Of course a sequence does not guarantee values without gaps as pointed out previously. But an important point to note is that gaps only occur in exceptional cases. The most common cause is transaction rollbacks. But it can of course also happen on database crashes or when the application by design or because of a crash doesn't persist already fetched values. The hilo/pooled/pooledlo algorithms on the other hand causes a massive amount of wasted identifiers in case the amount of cached values is fairly high and other systems (not using hibernate) are using this sequence.

I definitely prefer the method used by pmm to get sequence ranges. It does not have the above mentioned sequence waste and is also a lot easier to understand.

Last but not least, the definition of a sequence is a lot more self-explaining when using it as it is intended (namely define how much you want the identifier to increment).

Btw. How do you use hilo/pooled/pooledlo when you actually want to use a sequence which increments by another value than 1?


Cheers Thomas


Last edited by thomas-a on Wed Aug 23, 2017 10:16 am, edited 1 time in total.

Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 16 posts ]  Go to page 1, 2  Next

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.