The following problem concerns a web crawler.
Let's say you have the entity class "Page" to represent a Page downloaded from the internet:
Code:
@Entity
class Page
{
int id;
String pageUrl;
String pageText; //contains the whole page contents
@OneToMany(cascade = CascadeType.ALL)
List<Page> children;
}
As you can see, a web page can link to other web pages, which are stored as the "children". Now the problem is, children can also have children
ad infinitum . Currently what I do is:
Page rootPage = new Page("http://www.msn.com");
entityManager.persist(rootPage);
rootPage = crawl(rootPage);
During the crawl memory usage goes up rapidly, until and "out of memory" exception is thrown.
Any recommendations on how to limit memory usage?