Under very high concurrency I am getting all threads using hibernate stuck on:
Name: http-8080-Processor22
State: RUNNABLE
Total blocked: 183,622 Total waited: 49,453
Stack trace:
java.util.HashMap.containsKey(Unknown Source)
org.apache.commons.collections.SequencedHashMap.containsKey(Unknown Source)
org.apache.commons.collections.LRUMap.put(Unknown Source)
org.hibernate.util.SoftLimitMRUCache.put(SoftLimitMRUCache.java:60)
org.hibernate.engine.query.QueryPlanCache.getNativeSQLQueryPlan(QueryPlanCache.java:121)
org.hibernate.impl.AbstractSessionImpl.getNativeSQLQueryPlan(AbstractSessionImpl.java:140)
org.hibernate.impl.AbstractSessionImpl.list(AbstractSessionImpl.java:147)
org.hibernate.impl.SQLQueryImpl.list(SQLQueryImpl.java:164)
there is a race condition possible within Hashmap, that is well described in
http://blogs.opensymphony.com/plightbo/ ... nfini.html
in this case we are getting stuck in loop in:
(from java.util.Hashmap for jdk1.5.0_04)
public boolean containsKey(Object key) {
Object k = maskNull(key);
int hash = hash(k);
int i = indexFor(hash, table.length);
Entry e = table[i];
while (e != null) {
if (e.hash == hash && eq(k, e.key))
return true;
e = e.next;
}
return false;
}
SoftLimitMRUCache does not seem to have enough synchronisation.
javadoc for LRUMap suggests:
* A synchronized version can be obtained with:
* <code>Collections.synchronizedMap( theMapToSynchronize )</code>
* If it will be accessed by multiple threads, you _must_ synchronize access
* to this Map. Even concurrent get(Object) operations produce indeterminate
* behaviour.
I have tested under very high concurent load with the following changes:
- removed methods: entries() and softEntries() - they are not used and give access in dangerous way to the Map's iterators.
- synchronized methods: softSize(), size(), put() - get() was already synchronised.
testing load = 10 concurrent users, 100,000 requests each over 90 mins
- it'll take a bunch of testing to be sure I get no more of these
- previously I would run c.500,000 requests before it would hit, but the problem would then escalate fast as the busy loop takes all processor cycles.)