-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: What do you do when the database is just slow?
PostPosted: Thu Jan 27, 2005 10:20 pm 
Senior
Senior

Joined: Sun Oct 26, 2003 5:05 am
Posts: 139
I've been wrestling with performance issues for the last fe wdays now. I know this stuff can be tricky and hard. I know it's not trivial. So therefore I've really done some homework, read some books and read as many posts and articles about performance as I can. But I'm just baffled with these few things about Postgresql.

1. Dealing with large table sizes where the database uses sequential scans
2. Dealing with the query planner not chosing the right join order, even when you know the correct one yourself and wish you could just specify it.

Issue #1 is the one that is bothering me the most. For example, let's say I want to query for a list of objects of type A who have 100's of children of object type B. I want to use a filter on B to arrive at these sets of A's. To me, this seems really straight forward. If there is an instead from B to A, the query should be fast. But because the distribution is so bad, I'm basically forced to put this filter on A instead to avoid joining (and reading in on occasion) to type B.

I do Dan Tow's joining diagrams (amazing book) since his method seems to work for many queries). Sometimes though, the only filter is a limit since the other filters have very bad filter ratios. I have to make up requirements just to get these results down to size so that postgres isn't looping. However, even with a pleasent 0.08 filter ratio on a 32,000k table, the query takes 250-275 milliseconds with all the proper indexes.

Yes, I've even eliminated the indexes and started adding at random, running explain several times. Dan tow's method makes sense. That's the one that should work. But postgres is just stupid, or I am. This leads me to believe the filters aren't good enough, but I muck with the queries anymore without really changing the business reason for using them now (for example, that 0.08 filter ratio came from getting only the last 7 days of entries. The 7 days is crutial to business requirements).

This does alarm me however. In Dan's book, he mentions that you should only be tuning about 5% of the big queries in the app. I find myself tuning practically everything. I used to think it might be Hibernate being inefficient. I used to think maybe I was fetching too little, or maybe not enough. But no, it's just postgres.

But part of me really wonders why it takes 1.8 seconds to look up a unique 3 column value (that has an implied index) in a table with 80,000 rows. I don't get it. The index is actually being used in this case too. I drop the entire database, thinking it might be fudged, and reload it. However, I still get the same results. I vacuum full analyze and the results are the same.

I'm not sure how big websites can serve so much content. Heck, even blogs sometimes join and serve quite a bit per page. However, I can refresh those pages a lot and get great results and I'm not even local to where the application is running. However, I run this stuff on my local machine with 1GB ram, a P4 3.0 ghz processor and a decent WD 120GB harddrive and I get poor performance. How can I feel confident that this stuff is going to perform correctly?

Issue #2 is the really bogus one. It's just frustrating when I take great care calculating the diagrams from Dan Tow's book. I know I'm doing it right since I got all of the answers correct in all of his exercises. Surely I'm skilled enough to tune my modest application, right? Perhaps I put too much faith in the logic of his techniques, but my best guess is that Postgres' query planner has other plans, choosing join orders that are very different from the optimal plan.

So, this leads me to one of two things:

1. I'm either an idiot; or
2. It's time to choose a new database

I really respect OSS and love it. Hibernate is fantastic once you get past the learning curve (and perhaps write down special cases into a log file somewhere since my memory is just horrible). But when it comes to performance, I'm really worried.

Anyway, thanks for listening. If you have some comments, feel free to share.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jan 27, 2005 10:22 pm 
Senior
Senior

Joined: Sun Oct 26, 2003 5:05 am
Posts: 139
Feel free to comment here:
http://jroller.com/page/egervari/200501 ... e_troubles


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jan 27, 2005 10:34 pm 
Hibernate Team
Hibernate Team

Joined: Tue Aug 26, 2003 12:50 pm
Posts: 5130
Location: Melbourne, Australia
Why not try running your app against a diff database (eg. MySQL), to see if it is really the DB, or just the queries that are the problem. (It should be quite easy to port your app over, since you are using HB.)


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jan 27, 2005 11:19 pm 
Senior
Senior

Joined: Sun Oct 26, 2003 5:05 am
Posts: 139
Thanks for your response Gavin. That'll be the first thing I do on the weekend. I still have some last minute features that I have to add, which must go in first, but I will definately do this on the weekend. I don't think my queries are doing any sub-selects behind the scenes, so perhaps it's worth a try.

I do have a lot of transactional content though, and it's important data (a missing record could potentially cost a client a lot of money). I thought postgres was the real winner when it came to this. And to it's defense, it hasn't failed on it's promise here since I can insert about 100 or so rows in a second, which is good enough I suppose since they usually appear in infrequent batches. I usually do about 50 or so inserts every 5 minutes everywhere else in the application when analyzing the production data.

But I find it hard to believe there is something wrong my queries. I'm not a genious or anything, believe me. But my queries are all very straightforward. They just happen to join 8 or 9 tables to get what it needs.

Sometimes I know why it's not very efficient and I can't do anything about it (like a very large table size using poor filters). Other times I do the math the query planner should be doing and my results on paper come out very, very differently than the reality. Of course, the plan is completely different than the one I did on paper too, so no wonder.

But what really freaks me out the most is that I read that many people are using PostgreSQL with tables exceeding 5 million records and I'm baffled as to how they manage to do it. Perhaps they are lucky and the business domain gives them 0.0006 filter ratios on obvious join points. I did realize that my query diagrams are not perflect symmetical trees (no matter how you draw them) compared to Dan Tows. I have a few zig-zags every now and then that can possibly make it hard for the query planner to know to drive on one of those tables. The data just has to be this way though, or I have to denormalize it.

Anyway, thanks for responding. Maybe I'll get rich and buy a license to Oracle. I can't wait.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.