单字符串多字段查询

# 一、Dis Max Query

# 1、基础案例

首先来看一个简单的查询案例：

PUT /blogs/_doc/1
{
    "title": "Quick brown rabbits",
    "body":  "Brown rabbits are commonly seen."
}

PUT /blogs/_doc/2
{
    "title": "Keeping pets healthy",
    "body":  "My quick brown fox eats rabbits on a regular basis."
}

POST /blogs/_search
{
    "query": {
        "bool": {
            "should": [
                { "match": { "title": "Brown fox" }},
                { "match": { "body":  "Brown fox" }}
            ]
        }
    }
}

由于数据一中title和body字段都含有Brown，而数据二只有body字段含有brown fox，should的查询过程如下：

查询should语句中的子查询
加和子查询的评分
乘以匹配语句的总数
除以所有语句的总数

通过这样的算分流程，得到数据一的匹配程度更高，但是如果我们仅仅想要的某一个字段匹配最高，比如在这个例子中，数据二的body字段匹配程度最高。

# 2、Dis Max Query

POST blogs/_search
{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Quick pets" }},
                { "match": { "body":  "Quick pets" }}
            ]
        }
    }
}

数据一只有title字段能够匹配Quick，但是Quick在title字段匹配程度高。
数据二title字段和body字段能够匹配pets，但是pets在两个字短匹配程度相对较低。

这种情况的竞争就是当字段之间相互竞争，又相互关联的时候，我们只想要查询出最佳字段，也就是我们最匹配的字段。

# 二、Multi Match

# 1、基础语法结构

{
   "query": {
        "multi_match": {
            "query":  "barking dogs",
            "type":   "most_fields",
            "fields": [ "title", "title.std" ]
        }
    }
}

# 2、案例

DELETE /titles

PUT /titles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english"
      }
    }
  }
}

POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }


GET titles/_search
{
  "query": {
    "match": {
      "title": "barking dogs"
    }
  }
}

在title字段的匹配上，id为1的语句整体要简短，同时采用了英文分词器，barking dogs会被分成bark和dogs两个单词，所以id为1的语句匹配程度更高。

DELETE /titles
PUT /titles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "english",
        "fields": {"std": {"type": "text","analyzer": "standard"}}
      }
    }
  }
}

POST titles/_bulk
{ "index": { "_id": 1 }}
{ "title": "My dog barks" }
{ "index": { "_id": 2 }}
{ "title": "I see a lot of barking dogs on the road " }

GET /titles/_search
{
   "query": {
        "multi_match": {
            "query":  "barking dogs",
            "type":   "most_fields",
            "fields": [ "title", "title.std" ]
        }
    }
}

GET /titles/_search
{
   "query": {
        "multi_match": {
            "query":  "barking dogs",
            "type":   "most_fields",
            "fields": [ "title^10", "title.std" ] // 权重
        }
    }
}

通过为title字段增加一个子字段std，这个字段采用标准的分词器，来对算分进行综合的影响。

# 参考

Elasticsearch 核心技术与实战 (opens new window)

#Elasticsearch

上次更新: 2024/06/29, 15:13:44