有关lucene 索引pdf格式文档的问题

xxwang1984 2008-07-11

环境: lucene2.0+pdf.0.7.3+je-analysis-1.4.0.jar+eclipse3.2
//索引文档LuceneInActionCH.pdf
File indexDir = new File("C:\\index"); // 索引文件存放路径
File dataDir = new File("C:\\file"); // 文件所存路径

Analyzer analyzer = new MMAnalyzer();
IndexWriter writer = new IndexWriter(indexDir, analyzer, true);

Document doc = LucenePDFDocument.getDocument(new File("C:\\file\\LuceneInActionCH.pdf"));

writer.close();
索引是成功生成,索引文件_1.cfs大小18k.
//查询
QueryParser queryParser = null;
Query query = null;
IndexSearcher indexSearcher = null;
Hits hits = null;
String queryStr = null;

queryParser = new QueryParser("contents", new MMAnalyzer());
queryStr = "使用";
query = queryParser.parse(queryStr);
indexSearcher = new IndexSearcher("c:\\index");
hits = indexSearcher.search(query);

但查询不到结果,hits.length=0
请问高人,这可能是什么原因造成的?紧急,谢谢!

fys124974704 2008-07-11

你搜索别的东西可以搜索到吗？我怀疑你在建立索引的时候出了问题！你的代码给得不全面，所以也看不出什么问题

xxwang1984 2008-07-11

尝试搜索别的关键字，如关键字"web"，就可以搜索到，但中文的好像不可以,我用的分词jar包是je-analysis-1.4.0.jar,别人推荐的,也是很常用的.

xxwang1984 2008-07-11

关键的代码已经贴出来啊,还需要知道哪些代码?

fys124974704 2008-07-11

你看看你代码断
Document doc = LucenePDFDocument.getDocument(new File("C:\\file\\LuceneInActionCH.pdf"));

writer.close();
？？
是不是少了
writer.addDocument(doc)??

xxwang1984 2008-07-11

不好意思，那行代码遗漏了，但程序中是写了的，索引文件大小18k，我也用lukeall那个索引查看工具，查看一下索引文件，没有见中文的词组，只见英文的单词，不知什么原因？

fys124974704 2008-07-11

估计你将 field 放到 document的时候出了问题吧！因为除了这个，我想不到别的可能性了！

xxwang1984 2008-07-11

在索引pdf格式文档时，不会显式地调用doc.add(new Field(**))方法,
函数利用LucenePDFDocument的getDocument函数，从一个PDF文件直接返回一个Lucene
的Document，其中包含有path、url、modified、contents、summary,creator,uid等Field

fys124974704 2008-07-11

new Field("content",content,Field.Store.Yes,Field.Index.TOKENIZED)
我一般是这样写的！

kiki 2008-09-02

下个luke，打开生成的索引看下就再清楚不过了。

发表回复

>>返回群组首页

有关lucene 索引pdf格式文档的问题

相关讨论

相关资源推荐