New Lucene Notes
Some notes I picked up listening to a presentation on Lucene 2.9 from Lucid Imagination:
- Number Fields
- Use lower precision to reduce index sizes.
- Precision is supported in the NumericField
- Big speed boost to range queries.
- Query Parsing
- Query Rewriting – takes symbolic tokens (like *) and expands them to a large query that the back end processes. Should improve wild-card queries.
- There is a new Query Parser framework in the contrib section.
- Developers can more easily create a query parser.
- There is an Advanced Query Parser – look to the unit tests currently for what it is able to do.
- Payload Queries
- Byte arrays associated with a term in the index.
- For example, associate “Noun” with the word fox.
- You can then query on things like this.
- Payload parsing can be used to index NLP information at index time for later searching.
- Flexible Indexing – JIRA 1458
- Due in v3.0.
- Put a strongly-typed token streams to assist in indexing.
- Reverse string filters. Leading ‘*’.
- Arabic support is coming. (Light 8 stemmer).
- Persian support.
- Collectors
- HitCollector is deprecated. Basic Collector is given.
- GeoSpacial support is coming.
- New Term Vector
- Term Vectors are used (or computed on the fly).
- Term Vectors use loads of disk space.
- FieldCache
- Has been hacked by others to do joins like a database.
- There are more validation checks in 2.9. Better introspection.
- N-Gram Spell Check
- Bottle Necks Removed – General Improvements
- Lockless String “interning“
Categories: Uncategorized
java, solr