我搜索的记录有重复 - 其他 - lucene爱好者

群组首页 → 企业架构 → lucene爱好者 → 论坛

发表回复

[其他] 我搜索的记录有重复

lijie250 2009-11-10

这个问题是我在生成的索引有重复，还是我的查询的时候没用过滤啊？

bit6211 2009-11-10

可能是创建索引时。

lijie250 2009-11-10

bit6211 写道

可能是创建索引时。

这些是我部分代码，中文分词是用的IKAnalyzer，

		Analyzer analyzer = new IKAnalyzer();
        FSDirectory directory=null;
        IndexWriter writer=null;
		HttpServletRequest request = Utils.getCurrentRequest();
        try
        {
        	directory = FSDirectory.getDirectory(request.getSession().getServletContext().getRealPath(NEWS_INDEX_PATH));  
        	writer = new IndexWriter(directory, analyzer, IndexWriter.MaxFieldLength.LIMITED);
        	int lastNewsID=readNewsLastID();
        	List<News> listNews=newsDAO.getIndexNews(lastNewsID);
        	if(listNews==null){
        		log.error("no new news");
        		return false;
        	}
        	int currentMaxID=0;
        	for(int i=0;i<listNews.size();i++){
        		Document doc = new Document();
        		doc.add(new Field("NewsID",listNews.get(i).getNID().toString(),Field.Store.YES,Field.Index.ANALYZED_NO_NORMS));
        		doc.add(new Field("ClassName",listNews.get(i).getClassName(),Field.Store.YES,Field.Index.ANALYZED_NO_NORMS));
        		doc.add(new Field("Title",listNews.get(i).getTitle(),Field.Store.YES,Field.Index.ANALYZED));
        		doc.add(new Field("Content",Utils.htmlToStr(listNews.get(i).getContent()),Field.Store.YES,Field.Index.ANALYZED));
        		doc.add(new Field("PermanentLink",listNews.get(i).getPermanentLink(),Field.Store.YES,Field.Index.ANALYZED_NO_NORMS));
        		doc.add(new Field("CreateTime",listNews.get(i).getCreateTime(),Field.Store.YES,Field.Index.ANALYZED));
        		doc.add(new Field("Hits",""+listNews.get(i).getHits(),Field.Store.YES,Field.Index.ANALYZED_NO_NORMS));
        		doc.add(new Field("CommentCount",""+listNews.get(i).getCommentCount(),Field.Store.YES,Field.Index.ANALYZED_NO_NORMS));
        		doc.add(new Field("UserID",""+listNews.get(i).getUserID(),Field.Store.YES,Field.Index.ANALYZED_NO_NORMS));
        		doc.add(new Field("UserName",listNews.get(i).getUserName(),Field.Store.YES,Field.Index.ANALYZED_NO_NORMS));
        		doc.add(new Field("IsPass",""+listNews.get(i).getIsPass(),Field.Store.YES,Field.Index.ANALYZED_NO_NORMS));
        		writer.addDocument(doc);
        		currentMaxID=listNews.get(i).getNID();
        	}
        	//writer.optimize();
        	writeNewsLastID(currentMaxID);
        	writer.close();
        	return true;
        }

我的是想增量索引，是不是我创建的方式有问题？

bit6211 2009-11-10

你是不是原来有具有相同内容的索引，你现在创建时，并没有删去或者覆盖原来的索引，导致搜索有重复的结果？

你把原来的索引文件删去，重新建立一个新的索引（就建一次），搜索试试看。

lijie250 2009-11-10

恩是的，我是直接添加了，没有删除，不是可以增量添加吧，我的lucene版本是2.4.1

bit6211 2009-11-10

是可以增量添加的，在solr（它底层用的是lucene）中，你只要定义索引的一个主键（在配置文件中定义），那么主键相同的索引，新的会自动替换旧的索引。
在lucene中，我想会有同样的机制，只是需要你手工去操作罢了。我没有实际动手过，你看看有没有相关的资料。

luckaway 2009-11-10

indexWriter.updateDocument(new Term(Constant.FIELD_ID, id), document);
找不到符合new Term(Constant.FIELD_ID, id)它就执行添加操作

其实writer.addDocument(doc);也是调用updateDocument

writer.addDocument(doc)==writer.updateDocument(null,doc)

lijie250 2009-11-10

luckaway 写道

这个我回去试下

kernaling.wong 2009-11-10

其实可以这样考滤,在添加记录到lucene的时候,首先先用IndexReader去查一下是否已经存在了相同的键值,有就把原来的Docuemnt删除,再新增一个或者用updateDocument也可以,这样就能保证不会出现重复的Document了.

luckaway 2009-11-10

IndexReader是查不到还未刷到硬盘的索引的！！

是该定义一个类似于solr的主键的字段！

发表回复

>>返回群组首页

[其他] 我搜索的记录有重复

相关讨论

相关资源推荐