Term&Phrase Suggester
# 一、Term Suggester
# 1、概念
搜索推荐或者搜索建议,不光是根据用户输入的文本,找到对应的内容,还要根据用户输入的文本进行一定的纠错、改错。
搜索引擎中这样的功能,在Elasticsearch中是通过Suggester API实现的,它的原理是通过将输入的文本分解为Token,然后在索引的字典里查找相似的Term并返回。
"suggest": {
"term-suggester": {
"text": "汉包",
"term": {
"suggest_mode": "title_completion",
"field": "body"
}
}
}
- Suggester就是一种特殊类型的搜索,其中“text”是调用时候提供的文本,也就是用户页面上输入的文本
- 用户输入的文本如果是一个错误的拼写,会到指定的字段“body”上搜索,当无法搜索的结果时,返回建议的词。
# 2、案例
数据准备
DELETE articles
POST articles/_bulk
{ "index" : { } }
{ "body": "lucene is very cool"}
{ "index" : { } }
{ "body": "Elasticsearch builds on top of lucene"}
{ "index" : { } }
{ "body": "Elasticsearch rocks"}
{ "index" : { } }
{ "body": "elastic is the company behind ELK stack"}
{ "index" : { } }
{ "body": "Elk stack rocks"}
{ "index" : {} }
{ "body": "elasticsearch is rock solid"}
查询
POST /articles/_search
{
"size": 1,
"query": {
"match": {
"body": "lucen rock"
}
},
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "missing",
"field": "body"
}
}
}
}
suggest_mode有如下几种:
- Missing - 如索引中已经存在,就不提供建议
- Popular - 推荐出现频率更加高的词
- Always - 无论是否存在,都提供建议
{
"took" : 20,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.5904956,
"hits" : [
{
"_index" : "articles",
"_type" : "_doc",
"_id" : "QPlkjogB1hYm8koHEygM",
"_score" : 1.5904956,
"_source" : {
"body" : "elasticsearch is rock solid"
}
}
]
},
"suggest" : {
"term-suggestion" : [
{
"text" : "lucen",
"offset" : 0,
"length" : 5,
"options" : [
{
"text" : "lucene",
"score" : 0.8,
"freq" : 2
}
]
},
{
"text" : "rock",
"offset" : 6,
"length" : 4,
"options" : [ ]
}
]
}
}
由服务端返回的结果可以发现,lucen给出了推荐的内容lucene,而rock并没有给出推荐,因为rock在搜索中已经存在了。
popular
POST /articles/_search
{
"suggest": {
"term-suggestion": {
"text": "lucen rock",
"term": {
"suggest_mode": "popular",
"field": "body"
}
}
}
}
{
"took" : 32,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"suggest" : {
"term-suggestion" : [
{
"text" : "lucen",
"offset" : 0,
"length" : 5,
"options" : [
{
"text" : "lucene",
"score" : 0.8,
"freq" : 2
}
]
},
{
"text" : "rock",
"offset" : 6,
"length" : 4,
"options" : [
{
"text" : "rocks",
"score" : 0.75,
"freq" : 2
}
]
}
]
}
}
由于是popular,所以推荐出现频率更加高的词,除了lucen进行了推荐,rock推荐了出现频率更高的rocks。
# 三、Phrase Suggester
Phrase Suggester在Term Suggester上增加了一些额外的逻辑:
- Suggest Mode: missing、popular、always
- Max Erros:最多可以拼错的Terms数
- Confidence:限制返回结果数
POST /articles/_search
{
"suggest": {
"my-suggestion": {
"text": "lucne and elasticsear rock hello world ",
"phrase": {
"field": "body",
"max_errors":2,
"confidence":0,
"direct_generator":[{
"field":"body",
"suggest_mode":"always"
}],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
# 参考
上次更新: 2024/06/29, 15:13:44