Perryn, Have you any more information on your situation? I am wondering if this is due to the RAC environment of the Oracle database.
I have been experiencing the same behavior. Like you, I have an action (the Index action) that displays a list of items. When the user selects to delete an item, the request is handle by the Delete action. After the item is deleted, the Delete action redirects the user back to the Index action. Initially the list still includes the deleted item, but a quick refresh by the user causes it to disappear from the list. I get a similar result with adding a new Item. After the item is added, the Add action redirects to the Index action, and the new item is not included in the list. This behavior is not consistent, however, and sometimes things work just fine.
Also like you, the item class is not set up to use the second level cache, nor is the query used by the Index action being cached.
We are using Oracle 9i, but it is a RAC setup.
After discussing this with the DBA, he suggested that this is a result of the RAC environment. With the RAC environment, there are multiple nodes, each containing a copy of the database. Any given database connection works directly with just one node. When the delete transaction is committed, the node for the database connection is changed immediately, but the commit is not broadcasted to the other nodes immediately. Since there is a redirect between the Delete action and the Index action, the Index action gets its own database connection from the pool, which may not be the same connection as the Delete action. More importantly, the database connection for the Index action may be for a different node than the one for the Delete action. The query for the Index action may not be seeing that the item was deleted because the commit has not yet been propogated to that node.
*** Of course, this is JUST A THEORY at this point.
BUT IF THIS IS THE PROBLEM, there is a setting for the RAC that would eliminate it. There is a configuration setting named MAX_COMMIT_PROPAGATION_DELAY. This setting determines the maximum amount of time that may pass before a commit to one node is propogated to the other nodes. It is specified in hundreds of a second. It defaults to 700, which means the maximum time is 7 seconds. If this value is set to zero (or apparently anything less than 100), a commit will be immediatley propagated to all nodes. This is known as "broadcast on commit."
There is, of course, a performance trade-off for broadcast-on-commit. Accordingly to the DBA, Oracle recommends against using broadcast-on-commit. Therefore, he is unwilling to consider it. Instead, he is suggesting one of the following changes be made to the code:
(1) A 7 second delay be placed in the code after the delete trx is committed and before the redirect is issued. This would ensure that the commit has been propogated to all other nodes before any subsequent queries are executed for the user. (Unless the user doesn't actually wait for the response and navigates to another page.) I find this unacceptable. Of course, a shorter delay like 1 or 2 seconds would probably be long enough for the commit to be propagated most of the time. Perhaps a compromised could be reached, and the MAX_COMMIT_PROPAGATION_DELAY could be reduced, but not all the way to zero.
(2) Change the redirect to a forward, so the query to get the list would use the same database connection that was used by the delete transaction. Again, I do not find this acceptable. I consider the Redirect-After-Post (aka Post-Redirect-Get) idiom to be a best practice.
In either case, I don't like that the RAC environment causes a change in code like this.
Does anyone else have any experience with this? In particular, does the theory sound reasonable? Is this really the reality of a RAC environment? If so, what is the best way to deal with it? Is changing the Oracle setting to broadcast-on-commit really that big of a performance hit, or is it the way to go?
I apologize that this appears to have nothing to do with Hibernate specifically, but my first reaction was that this was a caching issue, too.
-John
|