Hi Ben,
thanks for starting this.
First thing to consider is that the filter is going to be applied on other sessions; being multi-threaded it's not going to use the current session but you have to provide a means to apply the filtering to a session, but the API isn't exposing the Session instance.
Have a look at
org.hibernate.search.batchindexing.IdentifierProducer, method
loadAllIdentifiers. That method is responsible to produce the identifiers of all objects which are going to be indexed; also consider that it's streaming them and not loading them all in memory to be able to cope with huge datasets.
Two things are done: a count to know the final number of results to be indexed, then a select on the identifiers.
Only one instance of IdentifierProducer will be created per type, but when indexing more types several
IdentifierProducers might be active at the same time.
To properly change the criteria, restrictions must be applied to both the count and select statements in a consistent way, please try with the Criteria API first; I guess that to cover the SQL and HQL cases we could add some alternate implementations of IdentifierProducer.
I'd suggest to try
DetachedCriteria, that should cover the use case quite well, but we're open to open suggestions.
if you think it's important to provide also SQL and HQL alternatives, then you should make sure that both the count query and the identifier-loading queries are provided.