[lucene] 大家帮看看,为什么中文信息检索不出来?

aaa641521 2011-08-25

最近刚刚接触Lucene,写了个测试程序,但是中文怎么都是检索不出来,不知道什么原因,希望有人指教一下:

public class LuceneDemo {

    public static String index_dir = "D:/luceneIndex";

    public static void main(String[] args) throws IOException, Exception {
        new LuceneDemo().createIndex("D:/WB/doc");
        new LuceneDemo().readIndex();
    }

    /**
     * 为某个文件夹中的所有文件建立索引,
     * 
     * @throws IOException
     */
    public void createIndex(String path) throws IOException {
        // 存放索引的目录
        File indexDir = new File(index_dir);

        // 要建立索引的目录
        File dataDir = new File(path);

        Analyzer analyzer = new IKAnalyzer();

        if (!dataDir.exists()) {
            System.out.println("该目录不存在");
        }
        if (dataDir.isDirectory()) {
            if (indexDir.exists()) {
                indexDir.delete();
            }
            IndexWriter indexWriter = new IndexWriter(
                    FSDirectory.open(indexDir), new IndexWriterConfig(
                            Version.LUCENE_CURRENT, analyzer));
            long startTime = System.currentTimeMillis();
            indexFile(indexWriter, dataDir);

            indexWriter.close();
            long endTime = System.currentTimeMillis();

            System.out.println((endTime - startTime) / 1000.0 + "s used!");
        }
    }

    private void indexFile(IndexWriter indexWriter, File file)
            throws IOException {
        if (file.canRead()) {
            if (file.isDirectory()) {// 如果是文件夹
                File[] files = file.listFiles();
                if (files != null) {
                    for (int i = 0; i < files.length; i++) {
                        indexFile(indexWriter, files[i]);
                    }
                }
            } else {
                System.out.println("add " + file.getCanonicalPath());
                Document document = new Document();
                document.add(new Field("contents", new FileReader(file)));
                document.add(new Field("path", file.getPath(), Field.Store.YES,
                        Field.Index.NOT_ANALYZED));
                indexWriter.addDocument(document);
            }
        }
    }

    private void readIndex() throws CorruptIndexException,
            IOException, ParseException {
        IndexReader indexReader = IndexReader.open(
                FSDirectory.open(new File(index_dir)), true);
        IndexSearcher searcher = new IndexSearcher(indexReader);
        searcher.setSimilarity(new IKSimilarity());
        Query query = IKQueryParser.parse("contents", "黑龙江");
        TopDocs topDocs = searcher.search(query, 5);
        System.out.println("搜索到包含 '" + queryString + "'的结果一共"
                + topDocs.totalHits + "处");
        // 输出结果
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        for (int i = 0; i < scoreDocs.length; i++) {
            Document doc = searcher.doc(scoreDocs[i].doc);
            System.out.println(doc.toString());
        }
    }
}

 

搜索到包含 '黑龙江'的结果一共0处。   但是我选定的文件夹中的文件里包含  黑龙江  这几个字,怎么搜索不出来?

RobustTm 2011-08-26
document.add(new Field("contents", new FileReader(file)));可能有问题,建立索引有问题
索引查看工具Luke,你试试看
Global site tag (gtag.js) - Google Analytics