关于lucene norms这个概念
illu
2010-02-04
在创建Field时 都会用到Index
Index.NO Index.ANALYZED Index.NOT_ANALYZED 这三个我明白 但是其余两个 Index.NOT_ANALYZED_NO_NORMS Index.ANALYZED_NO_NORMS 我还是搞不清楚。。 特别是norms这个概念 不知道是干什么 请知道的同学告诉我下 附上Index源码 /** Expert: Index the field's value without an Analyzer, * and also disable the storing of norms. Note that you * can also separately enable/disable norms by calling * {@link Field#setOmitNorms}. No norms means that * index-time field and document boosting and field * length normalization are disabled. The benefit is * less memory usage as norms take up one byte of RAM * per indexed field for every document in the index, * during searching. Note that once you index a given * field <i>with</i> norms enabled, disabling norms will * have no effect. In other words, for this to have the * above described effect on a field, all instances of * that field must be indexed with NOT_ANALYZED_NO_NORMS * from the beginning. */ NOT_ANALYZED_NO_NORMS { @Override public boolean isIndexed() { return true; } @Override public boolean isAnalyzed() { return false; } @Override public boolean omitNorms() { return true; } }, /** Expert: Index the tokens produced by running the * field's value through an Analyzer, and also * separately disable the storing of norms. See * {@link #NOT_ANALYZED_NO_NORMS} for what norms are * and why you may want to disable them. */ ANALYZED_NO_NORMS { @Override public boolean isIndexed() { return true; } @Override public boolean isAnalyzed() { return true; } @Override public boolean omitNorms() { return true; } }; 看了注释基本等于没看。。 囧 |
|
illu
2010-02-05
求答案 =。=
|
|
TonyLian
2010-02-10
英文不好,大概是说比普通的省一些内存,但必须从建立索引的第一次就使用才有效。
如果同一个Field,第一次是ANALYZED,以后即使是ANALYZED_NO_NORMS,也节约不了内存。 但记得doc中说,这两个是比较“底层”的用法,不推荐使用,好像。 |
|
illu
2010-02-11
|
|
virusswb
2011-10-26
norm是标准,规范,定额,定量的意思。
在lucene的环境,应该是定额,定量的意思。 如果你设置了document和field的boost,在indexing的时候,会将所有field的boost都合并计算为一个float。 当你不推荐boost这个field或者是document的时候,你可以选择no norms,可以节省一些存储空间。 详细情况可以参考《Lucene in Action 2nd Edition》的2.5.3 Norms章节。 |