对于lucene实现的全文检索如何测试写索引的效率问题.
stta04
2008-08-25
lucene实现的全文检索如何测试写索引的效率?对于这种测试问题,其思路是什么? |
|
wsnet
2008-09-22
看过一个国外的Benchmarking Indexing,参考一下
1 Hardware Environment Dedicated machine for indexing: yes CPU: Dual processor dual core Xeon CPU 3.00GHz; hyperthreading ON for 8 virtual cores RAM: 8GB Drive configuration: Dell EMC AX150 storage array fibre channel 2 Software environment Lucene Version: 2.3.1 Java Version: Java(TM) SE Runtime Environment (build 1.6.0_02-b05) Java VM: Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_02-b05, mixed mode) OS Version: Linux OpenSUSE 10.2 (64-bit X86-64) Location of index: Filesystem, on attached storage Lucene indexing variables Number of source documents: 6,404,464 Total filesize of source documents: 141GB; Note that this is only the full-text: the metadata (title, author(s), abstract, keywords, journal name) are in addition to this Average filesize of source documents: 22KB + metadata (see above) Source documents storage location: Filesystem File type of source documents: text (PDFs converted to text then gzipped) Parser(s) used, if any: None, but text files GZIPed & had to be un-gziped by Java application which also did indexing Analyzer(s) used: StandardAnalyzer Number of fields per document: 24 Type of fields: all text; 20 stored; 3 of indexed tokenized with term vector (full-text [not stored], title, abstract); 10 stored with no parsing; Index persistence: FSDirectory Index size: 83GB Number of terms: 143,298,010 Figures Time taken (in ms/s as an average of at least 3 indexing runs): 20.5 hours Time taken / 1000 docs indexed: 11.5 seconds Memory consumption: -Xms4000m -Xmx6000m Query speed: average time a query takes, type of queries (e.g. simple one-term query, phrase query), not measuring any overhead outside Lucene: <.01s |