-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: Storing an array of doubles in a single column
PostPosted: Wed Mar 04, 2009 2:51 pm 
Newbie

Joined: Wed Mar 04, 2009 1:41 pm
Posts: 3
Hi, from searching the forums I know loads of people have struggled with this but I can't seem to get it to work either.

I have an application that is storing historic stock market price data for a range of items. The amount of data available for each item varies so what I was planning to do was store the data as a double[] in a single column keyed on the item code. In other words a class like this:

Code:
public class Item {
private int key;
private double[] data;
... other stuff...
}


An initial test run with three points of data worked fine. The Item table was created with the data column as a varbinary(255) on MS SQL Server 2005. I then tried again with 60 points of data and it failed because, of course, the data would be truncated.

To try to get round this I annotated the data field with @Lob which caused the data column to change type to "image" (which I presume is a SQL Server BLOB). Unfortunately I can't seem to insert any data into the table now. The insert just fails with this unhelpful exception:

Code:
javax.transaction.RollbackException: Transaction marked for rollback.                                                                                                               
        at com.sun.enterprise.transaction.JavaEETransactionImpl.commit(JavaEETransactionImpl.java:436) [na:na]                                                                     
        at com.sun.enterprise.transaction.JavaEETransactionManagerSimplified.commit(JavaEETransactionManagerSimplified.java:843) [na:na]


the code doing the insert is:

Code:
try {
         utx.begin();
         EntityManager em = emf.createEntityManager();
         em.persist( item );
         utx.commit();
      } catch( Exception e ) {
         logger.error( "Failed to persist item: " + data, e );
         throw new DataStoreException( "Failed to persist item: " + data, e );
      }


I realize that storing an array of data in a single column means the data isn't normalized but I need to store roughly 500 million points of data and all accesses are whole-item-at-a-time only. I don't think a separate table storing a point of data per row would be quick enough and a wide table won't work because I don't know how many data points each row will have.

I really hope someone can help because not finding a solution to this problem means going back and hacking on the horrific piece of code that is currently running! :-D


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 04, 2009 3:45 pm 
Newbie

Joined: Tue Jul 08, 2008 11:09 am
Posts: 12
First of all, it's bad practice to store amounts in floating point data types (float, double). Always use integer types (int, long). Then pick the smallest one possible: int is 4 bytes, long is 8. I'm not sure what kind of prices you're storing, but int is probably good enough.

Then, you're optimizing prematurely. You're not normalizing because you think it will not perform enough. Maybe you're right, maybe you're not. Generally, it's best to optimize only when you find out that you must. Optimizing generally increases the complexity of the solution, which is exactly why you're posting this message :)

I would simply use a separate table to store the prices; one per record.

500 million is an awful lot, but you're not telling how many "Items" you have, and thus not what the ratio Items/Prices is: how many prices per item do you store?

But still I think that a well configured database should have no problem querying your separate prices table. That's exactly what a database is made for :) It sure keeps your code a lot simpler.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Mar 04, 2009 4:44 pm 
Newbie

Joined: Wed Mar 04, 2009 1:41 pm
Posts: 3
Thanks for the help. I understand about storing prices in floating point variable and loss of precision. To simplify the question I cut out some of the other information but suffice it to say double is the best choice for me.

I've actually discovered a way to store a double[]. I've annotated the field with:

Code:
@Column( columnDefinition="varbinary( max )" )


which makes it SQL Server specific but I'll live with that for now.

I'm actually running an small speed test for data inserts at the moment (inserting 500,000 rows with 60 points per row for a total of 30 million data points) and getting about 100 rows a second to a database connected over the network. I would have liked more but I'm actually most interested in query speeds which I'm hoping will be much higher.

As for the full shape of the data. Each item typically has between about 50 and 250 data points but it could be more (rarely less). A small database would be about 75000 items a (current) large database about 1 million.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Mar 07, 2009 2:37 am 
Newbie

Joined: Tue Mar 03, 2009 2:58 pm
Posts: 10
I would be quite surprised if the overhead of a JOIN is more than the overhead of converting back and forth from the array. At least in Postgres, a large BLOB-like column will be stored (transparently) in a separate table anyway. And your design loses date information, or recomputes it.

I agree that this looks like premature optimization.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.