Home > Uncategorized > New Lucene Notes

New Lucene Notes

Some notes I picked up listening to a presentation on Lucene 2.9 from Lucid Imagination:

  1. Number Fields
    1. Use lower precision to reduce index sizes.
    2. Precision is supported in the NumericField
    3. Big speed boost to range queries.
  2. Query Parsing
    1. Query Rewriting – takes symbolic tokens (like *) and expands them to a large query that the back end processes. Should improve wild-card queries.
    2. There is a new Query Parser framework in the contrib section.
    3. Developers can more easily create a query parser.
    4. There is an Advanced Query Parser – look to the unit tests currently for what it is able to do.
    5. Payload Queries
      1. Byte arrays associated with a term in the index.
      2. For example, associate “Noun” with the word fox.
      3. You can then query on things like this.
      4. Payload parsing can be used to index NLP information at index time for later searching.
    6. Flexible Indexing – JIRA 1458
      1. Due in v3.0.
      2. Put a strongly-typed token streams to assist in indexing.
    7. Reverse string filters. Leading ‘*’.
    8. Arabic support is coming. (Light 8 stemmer).
    9. Persian support.
  3. Collectors
    1. HitCollector is deprecated. Basic Collector is given.
  4. GeoSpacial support is coming.
  5. New Term Vector
    1. Term Vectors are used (or computed on the fly).
    2. Term Vectors use loads of disk space.
  6. FieldCache
    1. Has been hacked by others to do joins like a database.
    2. There are more validation checks in 2.9. Better introspection.
  7. N-Gram Spell Check
  8. Bottle Necks Removed – General Improvements
    1. Lockless String “interning

Categories: Uncategorized Tags: ,
  1. No comments yet.
  1. No trackbacks yet.