Some additional thoughts
Quote:
Sorry I disagree on that, wall posts on facebook are definitely not requiring realtime updates.
Ok, realtime is not the right word. What I mean is that if a User changes a photo or nickname, you have to have a reasonably quick reflection of these changes in all their WallPosts.
Quote:
Finally the relation Person->[many]WallPost can easily be remapped as WallPost->[one]Person and avoid the problem completely, again by limiting what kind of queries are available. Think about gmail: their search engine on emails is very bad, still I don't think it's fair to say that Google doesn't know how to implement a search engine. I'm pretty sure they face a similar technical limitation due on aggregation and write frequency.
Yes, it can and I did. I'm personally not familiar with Hadoop and if it alleviates any of these issues. However, the User is basically @ContainedIn the WallPost. So if the user changes, all the WallPosts will have to change, assuming you index this user data with the WallPost. Obviously, I'm going to have to do some refactoring to change this, as I don't see a way out. But you can see how this kind of sucks. The duplication of storing user data with a WallPost I see as a small price to pay for the performance. But if I refactor the User data out and get it separately, I will have to run an extra query that retrieves all the users in the result of WallPost and all of its Comments and then splice that data into the result again. It just seems like a very ugly way to deal with this.
Caching User objects will alleviate some of this pain, but still.
I guess if there is no updateable document or workable join concept available, a more simple workaround might be to use a "fetch style join". HSearch could retrieve the associated User document during the query process. This way the User document could stay atomic and reflect its actual state.
The question is how to implement such a fetch operation. HSearch would have to retain a list of @DocumentIds and Entities that are requested this way during the query process, retrieve the associated Documents separately and integrate them with the search result.
AnnotationsA developer could just annotate the User reference with @IndexedEmbedded(includePaths={"id"}) to make sure the @DocumentId is stored.
Otherwise, I don't thing anything would have to change.
HSearch would have to know which entities should be retrieved this way. This could be part of the FullTextQuery
Code:
FullTextQuery ftq = fts
.createFullTextQuery(mj.createQuery(), WallPost.class)
.setFetchMode("user", FetchMode.JOIN)
.setProjection(Projections.Document)
.setResultTransformer(new WallPostResultTransformer())
.setFirstResult(start)
.setMaxResults(max);
By indicating setFetchMode, HSearch would retrieve the entire associated Document and insert this as the user Tuple in the WallPost Document in case of projection or as an entity in case of entity retrieval.
Of course, HSearch would have to make sure that this retrieval goes as efficiently as possible, getting all the associated Users at the end of the query process.