求解高亮显示中TokenStream类中的getTokenStream方法的第二个参数的含义 - lucene - lucene爱好者

[lucene] 求解高亮显示中TokenStream类中的getTokenStream方法的第二个参数的含义

xuehaipeng 2010-07-15

public static org.apache.lucene.analysis.TokenStream

getTokenStream(org.apache.lucene.index.TermPositionVector tpv,boolean tokenPositionsGuaranteedContiguous)

Low level api. Returns a token stream or null if no offset info available in index. This can be used to feed the highlighter with a pre-parsed token stream In my tests the speeds to recreate 1000 token streams using this method are: - with TermVector offset only data stored - 420 milliseconds - with TermVector offset AND position data stored - 271 milliseconds (nb timings for TermVector with position data are based on a tokenizer with contiguous positions - no overlaps or gaps) The cost of not using TermPositionVector to store pre-parsed content and using an analyzer to re-parse the original content: - reanalyzing the original content - 980 milliseconds The re-analyze timings will typically vary depending on - 1) The complexity of the analyzer code (timings above were using a stemmer/lowercaser/stopword combo) 2) The number of other fields (Lucene reads ALL fields off the disk when accessing just one document field - can cost dear!) 3) Use of compression on field storage - could be faster due to compression (less disk IO) or slower (more CPU burn) depending on the content.

上述是api原话，getTokenStream 方法的第二个参数是什么意思，也就是boolean类型的那个，我看的不是很明白，忘指点一二，谢谢

发表回复

>>返回群组首页

[lucene] 求解高亮显示中TokenStream类中的getTokenStream方法的第二个参数的含义

相关讨论

相关资源推荐