[lucene] 大家帮看看,为什么中文信息检索不出来?
aaa641521
2011-08-25
最近刚刚接触Lucene,写了个测试程序,但是中文怎么都是检索不出来,不知道什么原因,希望有人指教一下: public class LuceneDemo { public static String index_dir = "D:/luceneIndex"; public static void main(String[] args) throws IOException, Exception { new LuceneDemo().createIndex("D:/WB/doc"); new LuceneDemo().readIndex(); } /** * 为某个文件夹中的所有文件建立索引, * * @throws IOException */ public void createIndex(String path) throws IOException { // 存放索引的目录 File indexDir = new File(index_dir); // 要建立索引的目录 File dataDir = new File(path); Analyzer analyzer = new IKAnalyzer(); if (!dataDir.exists()) { System.out.println("该目录不存在"); } if (dataDir.isDirectory()) { if (indexDir.exists()) { indexDir.delete(); } IndexWriter indexWriter = new IndexWriter( FSDirectory.open(indexDir), new IndexWriterConfig( Version.LUCENE_CURRENT, analyzer)); long startTime = System.currentTimeMillis(); indexFile(indexWriter, dataDir); indexWriter.close(); long endTime = System.currentTimeMillis(); System.out.println((endTime - startTime) / 1000.0 + "s used!"); } } private void indexFile(IndexWriter indexWriter, File file) throws IOException { if (file.canRead()) { if (file.isDirectory()) {// 如果是文件夹 File[] files = file.listFiles(); if (files != null) { for (int i = 0; i < files.length; i++) { indexFile(indexWriter, files[i]); } } } else { System.out.println("add " + file.getCanonicalPath()); Document document = new Document(); document.add(new Field("contents", new FileReader(file))); document.add(new Field("path", file.getPath(), Field.Store.YES, Field.Index.NOT_ANALYZED)); indexWriter.addDocument(document); } } } private void readIndex() throws CorruptIndexException, IOException, ParseException { IndexReader indexReader = IndexReader.open( FSDirectory.open(new File(index_dir)), true); IndexSearcher searcher = new IndexSearcher(indexReader); searcher.setSimilarity(new IKSimilarity()); Query query = IKQueryParser.parse("contents", "黑龙江"); TopDocs topDocs = searcher.search(query, 5); System.out.println("搜索到包含 '" + queryString + "'的结果一共" + topDocs.totalHits + "处"); // 输出结果 ScoreDoc[] scoreDocs = topDocs.scoreDocs; for (int i = 0; i < scoreDocs.length; i++) { Document doc = searcher.doc(scoreDocs[i].doc); System.out.println(doc.toString()); } } }
搜索到包含 '黑龙江'的结果一共0处。 但是我选定的文件夹中的文件里包含 黑龙江 这几个字,怎么搜索不出来? |
|
RobustTm
2011-08-26
document.add(new Field("contents", new FileReader(file)));可能有问题,建立索引有问题
索引查看工具Luke,你试试看 |