I need to process big Excel files. My problem is heap space especially with XLS format. I need to retrieve the file from the database in chuncks.
So far I have the file in the database in chunks of 40kb. I have an Import table (storing general information related to the import, for instance start and end time, data type etc.) and an ImportData table (containing the chunks of data as blobs). I have a one to many relation between Import and ImportData:
Code:
<hibernate-mapping>
<class name="com.company.import.pojos.Import table="IMPORT_TABLE">
<id name="id" type="integer">
<column name="ID" />
<generator class="some.id.generator.IdGenerator"></generator>
</id>
<property name="startTime" type="timestamp">
<column name="START" />
</property>
<property name="endTime" type="timestamp">
<column name="END" />
</property>
<property lazy="false" name="datatzpe" type="com.company.import.enums.ImportDataType">
<column name="DATATYPE" />
</property>
<bag name="importDataList" table="IMPORT_DATA" lazy="true" cascade="all" inverse="false">
<key column="IMPORT_TABLE_ID"/>
<one-to-many class="com.company.import.pojos.ImportData"/>
</bag>
</class>
</hibernate-mapping>
<hibernate-mapping> <class name="com.company.import.pojos.ImportData" table="IMPORT_DATA">
<id name="id" type="integer">
<column name="ID" />
<generator class="some.id.generator.IdGenerator"></generator>
</id>
<property name="importTableID" type="integer">
<column name="IMPORT_TABLE_ID" />
</property>
<property name="data" type="binary">
<column name="DATA" />
</property>
<property name="order" type="integer">
<column name="ORDER" />
</property> </class>
</hibernate-mapping>
I use Hibernate (4.2.3.Final), but even lazy loading in not an option because then I will have the whole list of ImportData objects in memory which is why I get an OutOfMemoryError and I want to avoid that.
Providing more memory is not an option (the application runs on a container that runs other applications as well, and several users might process files at the same time so the heaps space problem will eventually happen), therefore I am looking for a way to always read the next chunk of data and provide it to the stream that is given to the Aspose API.