关于lucene norms这个概念

illu 2010-02-04
在创建Field时 都会用到Index
Index.NO
Index.ANALYZED
Index.NOT_ANALYZED

这三个我明白
但是其余两个
Index.NOT_ANALYZED_NO_NORMS
Index.ANALYZED_NO_NORMS
我还是搞不清楚。。
特别是norms这个概念 不知道是干什么
请知道的同学告诉我下

附上Index源码
/** Expert: Index the field's value without an Analyzer,
     * and also disable the storing of norms.  Note that you
     * can also separately enable/disable norms by calling
     * {@link Field#setOmitNorms}.  No norms means that
     * index-time field and document boosting and field
     * length normalization are disabled.  The benefit is
     * less memory usage as norms take up one byte of RAM
     * per indexed field for every document in the index,
     * during searching.  Note that once you index a given
     * field <i>with</i> norms enabled, disabling norms will
     * have no effect.  In other words, for this to have the
     * above described effect on a field, all instances of
     * that field must be indexed with NOT_ANALYZED_NO_NORMS
     * from the beginning. */
    NOT_ANALYZED_NO_NORMS {
      @Override
      public boolean isIndexed()  { return true;  }
      @Override
      public boolean isAnalyzed() { return false; }
      @Override
      public boolean omitNorms()  { return true;  }   	
    },

    /** Expert: Index the tokens produced by running the
     *  field's value through an Analyzer, and also
     *  separately disable the storing of norms.  See
     *  {@link #NOT_ANALYZED_NO_NORMS} for what norms are
     *  and why you may want to disable them. */
    ANALYZED_NO_NORMS {
      @Override
      public boolean isIndexed()  { return true;  }
      @Override
      public boolean isAnalyzed() { return true;  }
      @Override
      public boolean omitNorms()  { return true;  }   	
    };


看了注释基本等于没看。。 囧
illu 2010-02-05
求答案 =。=
TonyLian 2010-02-10
英文不好,大概是说比普通的省一些内存,但必须从建立索引的第一次就使用才有效。
如果同一个Field,第一次是ANALYZED,以后即使是ANALYZED_NO_NORMS,也节约不了内存。
但记得doc中说,这两个是比较“底层”的用法,不推荐使用,好像。
illu 2010-02-11
答案在这里
http://forfuture1978.iteye.com/blog/591804
virusswb 2011-10-26
norm是标准,规范,定额,定量的意思。

在lucene的环境,应该是定额,定量的意思。

如果你设置了document和field的boost,在indexing的时候,会将所有field的boost都合并计算为一个float。

当你不推荐boost这个field或者是document的时候,你可以选择no norms,可以节省一些存储空间。


详细情况可以参考《Lucene in Action 2nd Edition》的2.5.3 Norms章节。
Global site tag (gtag.js) - Google Analytics