lucene Highlighter2.0做高亮显示中文时遇到的奇怪问题
baidongli
2008-05-09
高亮显示的中文不见了!!!!
比如:中华人民共和国 搜索:中华 却得到了 人民共和国,中华不见了! 我用的lucene 2.2.0 + highlighter2.0,中文分词是:org.mira.lucene.analysis.IK_CAnalyzer() 代码片段如下: TermPositionVector tpv ; String fragmentSeparator = "..." ; String title ; TokenStream tokenStream; User user=new User(); String price; for (int i = j - searchForm.getPageSize(); i < j && i < hits.length(); i++) { good = new Goods(); doc = hits.doc(i) ; good.setId(new Long(doc.get(GoodsDocs.FIELD_GOODS_ID))); /* * 高亮显示标题 */ title = doc.get(GoodsDocs.FIELD_GOODS_NAME); if(searchForm.getGoodsName()!=null && !searchForm.getGoodsName().equals("")) { int maxNumberFragmentsRequired = 5 ; tpv = (TermPositionVector)IndexReader.open(IndexDomain.NEW_GOODS_INDEX_PATH) .getTermFreqVector(hits.id(i),GoodsDocs.FIELD_GOODS_NAME); tokenStream = TokenSources.getTokenStream(tpv); title = highlighter.getBestFragments(tokenStream,title,maxNumberFragmentsRequired,fragmentSeparator) ; } good.setTradeName(title); user.setName(doc.get(GoodsDocs.FIELD_GOODS_OWNER_NAME)); good.setUser(user) ; price = doc.get(GoodsDocs.FIELD_GOODS_PRICE) ; if(price==null)price="0"; good.setPrice(new Double(price)); good.setAreaCode(new Long(doc.get(GoodsDocs.FIELD_GOODS_AREA_CODE))); good.setAplipayEmail(doc.get(GoodsDocs.FIELD_ORIGINAL_ALIPAY_EMAIL)); ThumbAttachment att = new ThumbAttachment(); att.setDetail(doc.get(GoodsDocs.FIELD_GOODS_IMAGE)); good.setGoodsDefaultImg(att); ids.add(good); } 这个是高亮分词类: public class BoldColofulFormatter implements Formatter { public String highlightTerm(String arg0, TokenGroup arg1) { if (arg1.getTotalScore() <= 0) { return arg0; } return "<span style='text-decoration:bold;backgound-color:#fff;color:00f;'>" + arg0 + "</span>"; } } |