使用Java的Lucene搜索工具对检索结果进行分组和分页

作者：小檀时间：2022-07-27 05:21:17　

使用GroupingSearch对搜索结果进行分组
Package org.apache.lucene.search.grouping Description

这个模块可以对Lucene的搜索结果进行分组，指定的单值域被聚集到一起。比如，根据”author“域进行分组，“author”域值相同的的文档分成一个组。

进行分组的时候需要输入一些必要的信息：

1、groupField：根据这个域进行分组。比如，如果你使用“author”域进行分组，那么每一个组里面的书籍都是同一个作者。没有这个域的文档将被分到一个单独的组里面。

2、groupSort：组排序。

3、topNGroups：保留多少组。比如，10表示只保留前10组。

4、groupOffset：对排在前面的哪些分组组进行检索。比如，3表示返回7个组（假设opNGroups等于10）。在分页里面很有用，比如每页只显示5个组。

5、withinGroupSort：组内文档排序。注意：这里和groupSort的区别

6、withingroupOffset：对每一个分组里面的哪些排在前面的文档进行检索。

使用GroupingSearch 对搜索结果分组比较简单

GroupingSearch API文档介绍：

Convenience class to perform grouping in a non distributed environment.

非分布式环境下分组

WARNING: This API is experimental and might change in incompatible ways in the next release.

这里使用的是4.3.1版本

一些重要的方法：

GroupingSearch：setCaching(int maxDocsToCache, boolean cacheScores) 缓存
GroupingSearch：setCachingInMB(double maxCacheRAMMB, boolean cacheScores) 缓存第一次搜索结果，用于第二次搜索
GroupingSearch：setGroupDocsLimit(int groupDocsLimit) 指定每组返回的文档数，不指定时，默认返回一个文档
GroupingSearch：setGroupSort(Sort groupSort) 指定分组排序

示例代码：

1.先看建索引的代码

public class IndexHelper {
private Document document;
private Directory directory;
private IndexWriter indexWriter;

public Directory getDirectory(){
directory=(directory==null)? new RAMDirectory():directory;
return directory;
}

private IndexWriterConfig getConfig() {
return new IndexWriterConfig(Version.LUCENE_43, new IKAnalyzer(true));
}

private IndexWriter getIndexWriter() {
try {
return new IndexWriter(getDirectory(), getConfig());
} catch (IOException e) {
e.printStackTrace();
return null;
}
}

public IndexSearcher getIndexSearcher() throws IOException {
return new IndexSearcher(DirectoryReader.open(getDirectory()));
}

/**
* Create index for group test
* @param author
* @param content
*/
public void createIndexForGroup(int id,String author,String content) {
indexWriter = getIndexWriter();
document = new Document();
document.add(new IntField("id",id, Field.Store.YES));
document.add(new StringField("author", author, Field.Store.YES));
document.add(new TextField("content", content, Field.Store.YES));
try {
indexWriter.addDocument(document);
indexWriter.commit();
indexWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}

2.分组：

public class GroupTest

public void group(IndexSearcher indexSearcher,String groupField,String content) throws IOException, ParseException {
GroupingSearch groupingSearch = new GroupingSearch(groupField);
groupingSearch.setGroupSort(new Sort(SortField.FIELD_SCORE));
groupingSearch.setFillSortFields(true);
groupingSearch.setCachingInMB(4.0, true);
groupingSearch.setAllGroups(true);
//groupingSearch.setAllGroupHeads(true);
groupingSearch.setGroupDocsLimit(10);

QueryParser parser = new QueryParser(Version.LUCENE_43, "content", new IKAnalyzer(true));
Query query = parser.parse(content);

TopGroups<BytesRef> result = groupingSearch.search(indexSearcher, query, 0, 1000);

System.out.println("搜索命中数：" + result.totalHitCount);
System.out.println("搜索结果分组数：" + result.groups.length);

Document document;
for (GroupDocs<BytesRef> groupDocs : result.groups) {
System.out.println("分组：" + groupDocs.groupValue.utf8ToString());
System.out.println("组内记录：" + groupDocs.totalHits);

//System.out.println("groupDocs.scoreDocs.length:" + groupDocs.scoreDocs.length);
for (ScoreDoc scoreDoc : groupDocs.scoreDocs) {
System.out.println(indexSearcher.doc(scoreDoc.doc));
}
}
}

3.简单的测试：

public static void main(String[] args) throws IOException, ParseException {
IndexHelper indexHelper = new IndexHelper();
indexHelper.createIndexForGroup(1,"红薯", "开源中国");
indexHelper.createIndexForGroup(2,"红薯", "开源社区");
indexHelper.createIndexForGroup(3,"红薯", "代码设计");
indexHelper.createIndexForGroup(4,"红薯", "设计");
indexHelper.createIndexForGroup(5,"觉先", "Lucene开发");
indexHelper.createIndexForGroup(6,"觉先", "Lucene实战");
indexHelper.createIndexForGroup(7,"觉先", "开源Lucene");
indexHelper.createIndexForGroup(8,"觉先", "开源solr");

indexHelper.createIndexForGroup(9,"散仙", "散仙开源Lucene");
indexHelper.createIndexForGroup(10,"散仙", "散仙开源solr");
indexHelper.createIndexForGroup(11,"散仙", "开源");
GroupTest groupTest = new GroupTest();

groupTest.group(indexHelper.getIndexSearcher(),"author", "开源");
}
}

4.测试结果：

两种分页方式
Lucene有两种分页方式：

1、直接对搜索结果进行分页，数据量比较少的时候可以用这种方式，分页代码核心参照：

ScoreDoc[] sd = XXX;
// 查询起始记录位置
int begin = pageSize * (currentPage - 1);
// 查询终止记录位置
int end = Math.min(begin + pageSize, sd.length);
for (int i = begin; i < end && i <totalHits; i++) {
//对搜索结果数据进行处理的代码
}

2、使用searchAfter(...)

Lucene提供了五个重载方法，可以根据需要使用

ScoreDoc after：为上次搜索结果ScoreDoc总量减1；

Query query：查询方式

int n：为每次查询返回的结果数，即每页的结果总量

一个简单的使用示例：

//可以使用Map保存必要的搜索结果
Map<String, Object> resultMap = new HashMap<String, Object>();
ScoreDoc after = null;
Query query = XX
TopDocs td = search.searchAfter(after, query, size);

//获取命中数
resultMap.put("num", td.totalHits);

ScoreDoc[] sd = td.scoreDocs;
for (ScoreDoc scoreDoc : sd) {
//经典的搜索结果处理
}
//搜索结果ScoreDoc总量减1
after = sd[td.scoreDocs.length - 1];
//保存after用于下次搜索，即下一页开始
resultMap.put("after", after);

return resultMap;

标签：Java,Lucene

投稿

使用Java的Lucene搜索工具对检索结果进行分组和分页

猜你喜欢

C#使用iTextSharp从PDF文档获取内容的方法

在C#中调用VBScript、javascript等脚本的实现代码

spring @Validated 注解开发中使用group分组校验的实现

Android检查手机网络状态及网络类型的方法

对Java ArrayList的自动扩容机制示例讲解

浅谈SpringBoot中的@Conditional注解的使用

Android Studio 2022.1.1创建项目的Gradle配置问题

Java如何通过枚举实现有限状态机

MyBatis之自查询使用递归实现 N级联动效果(两种实现方式)

WebSocket实现Web聊天室功能

c#接口使用示例分享

java动态导出excel压缩成zip下载的方法

SpringMVC中@RequestMapping注解用法实例

Android 开发使用PopupWindow实现弹出警告框的复用类示例

RocketMQ特性Broker存储事务消息实现

Java中Thread类详解及常用的方法

Android自定义双向滑动控件

C#中Razor模板引擎简单使用

Spring MVC的优点与核心接口_动力节点Java学院整理

关于springboot2整合lettuce启动卡住问题的解决方法