I am not sure if a simple formula exist. The index size will depend on many factors, for example whether or not you keep the indexed data (store=Store.YES) and how the data is indexed, meaning which analyzer you are using. Imagine for example an synonym analyzer which will add additional tokens into the token stream.
Given that the Lucene index format is available on the Lucene website, I guess with some effort one could derive some estimates given that you make some assumptions about your data, but it is for sure not easy.
Probably best to just build a test index.
--Hardy
|