Hey,
I have been working with NHibernate now for a couple of weeks and so far we only used it on very small scale to save dynamic application data which of course is working just fine.
Now I am also starting to use NHibernate to grab huge amounts of non-changing (static) data from different content implementors. Before, I used XML to fetch about 20000 objects with about 130 fields, which took roughly 5-10 seconds.
I was kind of suspicious that my first approach with NHibernate resulted in a shocking 1 minute waiting time when grabbing the same data from a MySQL DB!
So I dug out the good old profiler and quickly found the problem: NHibernate actually calls GetOrdinal(columnName) for *every single* cell afer it fetched them from the DB which, as far as I understood, gets the index of a column by name from the DB. Now if I do that 3 million times, I can easily go and grab a coffee before my application even started up - mind you I will have to grab a total of probably 5-10 times that data once the content-handling is finished.
After an in-depth look at the code, I was able to add a quick work-around:
Get the schema-table (which NHibernate already stored) cache and store the indices of the columns in an array whose indices correspond to those of the suffixedNames-array (and correspondingly the types-array) in AbstractEntityPersister and change:
Code:
values[i] = type.Hydrate(propertyResultSet, cols, session, obj);
to:
Code:
int[] cachedIndices;
// ...
if (type is NullableType)
{
if (cachedIndices == null)
{
RecacheColumnIndices(suffixedPropertyColumns, propertyResultSet.GetSchemaTable());
}
values[i] = ((NullableType)type).NullSafeGet(propertyResultSet, cachedIndices[i]);
}
else
{
values[i] = type.Hydrate(propertyResultSet, cols, session, obj);
}
I made these changes and voila: From 1 minute, right down to 6 seconds.
The whole idea of getting the data from the result-set by Column-names seems very dubious and wasteful. I would recommend to add the index-caching to an earlier point and get all data by those indices, rather than having to ask the DB for every single Cell or having to use strings for lookup in the first place.
Now that is quite a performance-gain already, but I still found that it could be faster.
Also I realized that memory usage increased by a lot. After some more research, I realized that this is caused by an internal array of all objects, representing the Persistor's state right after the last update, in order to only submit changed cells during an update. In addition, for that, it makes an awful lot of unnecessary iterations and comparisons over all cells, even right after the first fetch from the DB.
As solution to this I would suggest a config option that can be turned off for data that rarely or never changes.
I understand that most people never do theese kind of insane queries, which is probably why never anyone realized or profiled through this. I for example need it because the code-region where it is accessed (and its accessed a lot) is highly performance- and time-critical. Adding those queries to an extra message queue would raise complexity too much to justify it.
If anyone is interested in a complete solution to the fetch-problem, I am willing to write a patch with a clean solution (better than the dirty work-around I posted above) into the NHibernate core.
Where to the memory problem I will have a simple working DB-API to grab huge chunks of data (with the help of NHibernate) from DB and easily map them to any kind of object, using XML-definitions. I have to define my own XML-schema because I want to map several tables in different ways to a single object, which NHibernate doesn't seem to cover (yet).
Thats my first 2 cents, sorry for the long read :)