-->
These old forums are deprecated now and set to read-only. We are waiting for you on our new forums!
More modern, Discourse-based and with GitHub/Google/Twitter authentication built-in.

All times are UTC - 5 hours [ DST ]



Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 
Author Message
 Post subject: Performance problem with postgres
PostPosted: Thu Jan 27, 2005 1:50 pm 
Senior
Senior

Joined: Sun Oct 26, 2003 5:05 am
Posts: 139
I'm having a pretty severe performance problem with postgres/hibernate fetching where the query takes around 250ms and I'd rather it takes around 20 or 30ms if possible since this query is used the most in the entire system.

Here is the HQL. As you can see, I'm fetching a collection called shipment.statusMessages, which usually contains just 2 or 3 rows. However, this query typically returns 100 or 150 results, so it does add up. Since this pulls up duplicate results, I have to add them to a linked hash set to maintain order but remove the duplicates.

Code:
               Query query = session.createQuery(
                  "select shipment " +
                  "from Shipment shipment " +
                  "   inner join shipment.cargoControlNumber.carrierCode.carrier.persons person " +
                  "   inner join shipment.currentStatus currentStatus " +
                  "   inner join currentStatus.releaseCode releaseCode " +
                  "   left join fetch shipment.currentStatus " +
                  "   left join fetch shipment.statusMessages " +
                  "where " +
                  "   person.id = :personId and " +
                  "   shipment.isPurged = false and " +
                  "   releaseCode.number = :releaseCodeNumber and " +
                  "   currentStatus is not null and " +
                  "   currentStatus.date >= current_date - 7 " +
                  "order by currentStatus.date desc"
               );

               query.setParameter( "personId", personId );
               query.setParameter( "releaseCodeNumber", releaseCodeNumber );

               query.setFirstResult( firstResult );
               query.setMaxResults( maxResults );

               List list = query.list();
               Set distinctShipments = new LinkedHashSet( list );
               return Arrays.asList( distinctShipments.toArray() );


Here is the generated SQL (I replaced the question marks with actual data so I could run it outside of the application). I also took out the stuff in the select clause because it occupies too much space and is messy. It's not very important anyway.

Code:
select [shipment0_.*, .... ]

from shipment shipment0_
   inner join shipment_status shipmentst5_ on shipment0_.current_status_id = shipmentst5_.id
   inner join release_code releasecod6_ on shipmentst5_.release_code_id=releasecod6_.id
   inner join carrier_code carriercod1_ on shipment0_.carrier_code_id=carriercod1_.id
   inner join carrier carrier2_ on carriercod1_.carrier_id=carrier2_.id
   inner join carrier_to_person persons3_ on carrier2_.id=persons3_.carrier_id
   inner join person person4_ on persons3_.person_id=person4_.id
   left outer join shipment_status statusmess7_ on shipment0_.id=statusmess7_.shipment_id

where (person4_.id=255 )
   and(shipment0_.is_purged=false )
   and(releasecod6_.number='4' )
   and(shipmentst5_.id is not null )
   and(shipmentst5_.date>=current_date-7 )

order by shipmentst5_.date desc
limit 100


Using Dan Tow's diagram method, The query should really start with the person since it has 0.0045 filter ration (122 person rows). The shipment table has about 18,000 rows and shipment_status has about 33,000, which is the main reason the query takes so long. The rest of the rows are modest in size and won't get large.

The filter ratio on shipment_status is 0.083, which is very good considering there are 33k rows in that table. I would like to filter those out as fast as possible.

The filter ration on release_code is 0.125, but there are only 8 entries in this table. Ironically, this filter isn't very good because when the number = '4', it will match up to 54% on average of the rows in shipment_status anyway.

The filter on shipment.is_purge is very bad, roughly being 0.98, hence sequencial scans the optimizer is doing.

Here is the explain on this query with the current indexes I'm using.

Code:
Limit  (cost=1090.48..1090.56 rows=30 width=112)
  ->  Sort  (cost=1090.48..1090.56 rows=30 width=112)
        Sort Key: shipmentst5_.date
        ->  Nested Loop Left Join  (cost=615.95..1089.75 rows=30 width=112)
              ->  Nested Loop  (cost=615.95..982.99 rows=17 width=87)
                    ->  Index Scan using person_pkey on person person4_  (cost=0.00..5.61 rows=1 width=4)
                          Index Cond: (id = 255)
                    ->  Hash Join  (cost=615.95..977.21 rows=17 width=91)
                          Hash Cond: ("outer".release_code_id = "inner".id)
                          ->  Hash Join  (cost=614.84..975.28 rows=131 width=91)
                                Hash Cond: ("outer".id = "inner".current_status_id)
                                ->  Index Scan Backward using current_status_date_idx on shipment_status shipmentst5_  (cost=0.00..305.46 rows=10735 width=21)
                                      Index Cond: (date >= (('now'::text)::date - 7))
                                      Filter: (id IS NOT NULL)
                                ->  Hash  (cost=613.86..613.86 rows=393 width=70)
                                      ->  Hash Join  (cost=7.43..613.86 rows=393 width=70)
                                            Hash Cond: ("outer".carrier_code_id = "inner".id)
                                            ->  Seq Scan on shipment shipment0_  (cost=0.00..509.10 rows=17908 width=66)
                                                  Filter: (is_purged = false)
                                            ->  Hash  (cost=7.42..7.42 rows=2 width=8)
                                                  ->  Hash Join  (cost=4.74..7.42 rows=2 width=8)
                                                        Hash Cond: ("outer".id = "inner".carrier_id)
                                                        ->  Seq Scan on carrier carrier2_  (cost=0.00..2.44 rows=44 width=4)
                                                        ->  Hash  (cost=4.74..4.74 rows=2 width=16)
                                                              ->  Hash Join  (cost=3.03..4.74 rows=2 width=16)
                                                                    Hash Cond: ("outer".carrier_id = "inner".carrier_id)
                                                                    ->  Seq Scan on carrier_code carriercod1_  (cost=0.00..1.46 rows=46 width=8)
                                                                    ->  Hash  (cost=3.02..3.02 rows=1 width=8)
                                                                          ->  Index Scan using carrier_to_person_person_id_idx on carrier_to_person persons3_  (cost=0.00..3.02 rows=1 width=8)
                                                                                Index Cond: (255 = person_id)
                          ->  Hash  (cost=1.10..1.10 rows=1 width=4)
                                ->  Seq Scan on release_code releasecod6_  (cost=0.00..1.10 rows=1 width=4)
                                      Filter: ((number)::text = '4'::text)
              ->  Index Scan using shipment_status_shipment_id_release_code_id_ukey on shipment_status statusmess7_  (cost=0.00..6.26 rows=2 width=25)
                    Index Cond: ("outer".id = statusmess7_.shipment_id)


However, I don't think this plan resembles the optimal plan using Dan Tow's method and I don't know how to fix it. The plan should be

person -> carrier_to_person -> carrier -> carrier_code -> shipment -> current_status -> release_code -> shipment_status (the fetch join)

Or, if that is not performant enough, I would probably want this one:

currnet_status -> release_code -> shipment -> carrier_code -> carrier -> carrier_to_person -> person -> shipment_status (the fetch join)

I'm not very good at reading plans in postgresql, but it doesn't look like either of these plans are being used. I see that it's using an index on shipment_status that covers release_code_id and shipment_id. Even though Dan Tow's method does not say to use this index, it speeds the query from 650ms to 250ms.

Can anyone please help me out? I'm an object guy, not a dba. But there is nobody here to ask. Thanks.


Top
 Profile  
 
 Post subject: A suggestion?
PostPosted: Tue Feb 15, 2005 2:35 pm 
Newbie

Joined: Wed Nov 05, 2003 9:10 am
Posts: 6
Your post above contains a lot of ideas I'm not familiar with, but maybe this might be of use.

It looks like there are several nested elements to your comparison. Have you considered using a standard HashSet (to eliminate duplicates) then a Comparator (to do the sort) in two seperate phases? I'm rusty with my O() analyses, but I doubt you need the collection to be sorted at all times but rather just at the end.

I had a similar situation where this got me the needed speed increase.


Top
 Profile  
 
 Post subject: Re: Performance problem with postgres
PostPosted: Tue Feb 15, 2005 4:47 pm 
CGLIB Developer
CGLIB Developer

Joined: Thu Aug 28, 2003 1:44 pm
Posts: 1217
Location: Vilnius, Lithuania
Try to run "vacuum analyse", it collects statistics for optimizer. Probably docs can help too: http://www.postgresql.org/docs/7.4/stat ... joins.html


Top
 Profile  
 
 Post subject:
PostPosted: Tue Apr 12, 2005 11:27 am 
Newbie

Joined: Thu Dec 04, 2003 5:53 am
Posts: 11
Location: Toulouse, France
Your problem is that the inner join clauses are creating explicit inner joins on Postgresql, which forces a certain join order.

What you want is a SQL query that looks like:

1) SELECT a.* from a, b, c where a.fkb = b.pkb and b.fkc = c.pkc
which allows postgresql to determine the appropriate join order (usually correct, especially after vacuum analyze)

instead of:
2) SELECT a.* from a inner join b on a.fkb = b.pkb inner join c on b.fkc = c.pkc
which forces a join order a then b then c (as per the link in the previous post).

Question: is there any way in HQL to ensure that we get (1) above instead of (2)?

Best regards,
Assaf


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 4 posts ] 

All times are UTC - 5 hours [ DST ]


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
© Copyright 2014, Red Hat Inc. All rights reserved. JBoss and Hibernate are registered trademarks and servicemarks of Red Hat, Inc.