lucene Highlighter2.0做高亮显示中文时遇到的奇怪问题

baidongli 2008-05-09
高亮显示的中文不见了!!!!


比如:中华人民共和国
搜索:中华
却得到了 人民共和国,中华不见了!
我用的lucene 2.2.0 + highlighter2.0,中文分词是:org.mira.lucene.analysis.IK_CAnalyzer()

代码片段如下:

TermPositionVector tpv ;
String fragmentSeparator = "..." ;
String title ;
TokenStream tokenStream;
User user=new User();
String price;
for (int i = j - searchForm.getPageSize(); i < j && i < hits.length(); i++) {
good = new Goods();
doc  =  hits.doc(i) ;
good.setId(new Long(doc.get(GoodsDocs.FIELD_GOODS_ID)));

/*
* 高亮显示标题
*/
title = doc.get(GoodsDocs.FIELD_GOODS_NAME);
if(searchForm.getGoodsName()!=null && !searchForm.getGoodsName().equals(""))
{
int maxNumberFragmentsRequired = 5 ;
tpv = (TermPositionVector)IndexReader.open(IndexDomain.NEW_GOODS_INDEX_PATH)
.getTermFreqVector(hits.id(i),GoodsDocs.FIELD_GOODS_NAME);
tokenStream = TokenSources.getTokenStream(tpv);
title = highlighter.getBestFragments(tokenStream,title,maxNumberFragmentsRequired,fragmentSeparator) ;
}
good.setTradeName(title);
user.setName(doc.get(GoodsDocs.FIELD_GOODS_OWNER_NAME));
good.setUser(user) ;
price = doc.get(GoodsDocs.FIELD_GOODS_PRICE) ;
if(price==null)price="0";
good.setPrice(new Double(price));
good.setAreaCode(new Long(doc.get(GoodsDocs.FIELD_GOODS_AREA_CODE)));
good.setAplipayEmail(doc.get(GoodsDocs.FIELD_ORIGINAL_ALIPAY_EMAIL));

ThumbAttachment att = new ThumbAttachment();
att.setDetail(doc.get(GoodsDocs.FIELD_GOODS_IMAGE));
good.setGoodsDefaultImg(att);

ids.add(good);
}



这个是高亮分词类:

public class BoldColofulFormatter implements Formatter {

public String highlightTerm(String arg0, TokenGroup arg1) {
if (arg1.getTotalScore() <= 0) {
return arg0;
}
return "<span style='text-decoration:bold;backgound-color:#fff;color:00f;'>" + arg0 + "</span>";
}

}
Global site tag (gtag.js) - Google Analytics