Elasticsearch 干货入门篇（一）-阿里云开发者社区

Elasticsearch 干货入门篇（一）

2022-05-13 146

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

检索分析服务 Elasticsearch 版，2核4GB开发者规格 1个月

简介： Elasticsearch 干货入门篇（一）

root object#

mapping json中包含了诸如properties,matadata(_id,_source,_type),settings(analyzer)已经其他的settings

PUT my_index
{
  "mappings": {
    "my_index": {
      "properties": {
        "my_field1": {
          "type": "integer"
        },
        "my_field2": {
          "type": "float"
        },
        "my_field2": {
          "type": "scaled_float",
          "scaling_factor": 100
        }
      }
    }
  }
}

mate-field 元数据字段#

_all#

当我们往ES中插入一条document时,它里面包含了多个field, 此时,ES会自动的将多个field的值,串联成一个字符串,作为_all属性,同时会建立索引,当用户再次检索却没有指定查询的字段时,就会在这个_all中进行匹配

_field_names#

按照指定的field进行检索,所有含有指定field并且field不为空的document全部会被检索出来

示例:

# Example documents
PUT my_index/_doc/1
{
  "title": "This is a document"
}
PUT my_index/_doc/2?refresh=true
{
  "title": "This is another document",
  "body": "This document has a body"
}
GET my_index/_search
{
  "query": {
    "terms": {
      "_field_names": [ "title" ] 
    }
  }
}

禁用:

PUT tweets
{
  "mappings": {
    "_doc": {
      "_field_names": {
        "enabled": false
      }
    }
  }
}

_id#

document的唯一标识信息

_index#

标识当前的doc存在于哪个index中,并且ES支持跨域index进行检索,详情见官网点击进入官网

_routing#

路由导航需要的参数,这是它的计算公式shard_num = hash(_routing) % num_primary_shards

可以像下面这样定制路由规则

PUT my_index/_doc/1?routing=user1&refresh=true 
{
  "title": "This is a document"
}
GET my_index/_doc/1?routing=user1

_source#

这个元数据中定义的字段,就是将要返回给用户的doc的中字段,比如说一个type = user类型的doc中存在100个字段,但是前端并不是真的需要这100个字段,于是我们使用_source去除一些字段,注意和filter是不一样的,filter不会影响相关性得分

禁用

PUT tweets
{
  "mappings": {
    "_doc": {
      "_source": {
        "enabled": false
      }
    }
  }
}

_type#

这个字段标识doc的类型,是一个逻辑上的划分, field中的value在顶层的lucene建立索引的时候,全部使用的opaque bytes类型,不区分类型的lucene是没有type概念的, 在document中,实际上将type作为一个document的field,什么field呢? _type

ES会通过_type进行type的过滤和筛选,一个index中是存放的多个type实际上是存放在一起的,因此一个index下,不可能存在多个重名的type

_uid#

在ES6.0中被弃用

mapping-parameters#

首先一点,在ES5中允许创建多个index,这在ES6中继续被沿用,但是在ES7将被废弃,甚至在ES8中将被彻底删除

其次:在一开始我们将ElasticSearch的index比作Mysql中的database, 将type比作table,其实这种比喻是错误的,因为在Mysql中不同表之间的列在物理上是没有关系的,各自占有自己的空间,但是在ES中不是这样,可能type=Student中的name和type=Teacher中的name在存储在完全相同的字段中,换句话说,type是在逻辑上的划分,而不是在物理上的划分

copy_to#

这个copy_to实际上是在允许我们自定义一个_all字段, 程序员可以将多个字段的值复制到一个字段中,然后再次检索时目标字段就使用我们通过copy_to创建出来的_all新字段中

它解决了一个什么问题呢? 假设我们检索的field的value="John Smith",但是doc中存放名字的field却有两个,分别是firstName和lastName中,就意味着cross field检索,这样一来再经过TF-IDF算法一算,可能结果就不是我们预期的样子,因此使用copy_to 做这件事

示例:

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "first_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "last_name": {
          "type": "text",
          "copy_to": "full_name" 
        },
        "full_name": {
          "type": "text"
        }
      }
    }
  }
}
PUT my_index/_doc/1
{
  "first_name": "John",
  "last_name": "Smith"
}
GET my_index/_search
{
  "query": {
    "match": {
      "full_name": { 
        "query": "John Smith",
        "operator": "and"
      }
    }
  }
}

动态mapping(dynamic mapping)#

ES使用_type来描述doc字段的类型,原来我们直接往ES中存储数据,并没有指定字段的类型,原因是ES存在类型推断,默认的mapping中定义了每个field对应的数据类型以及如何进行分词

null       --> no field add
true flase --> boolean
123        --> long
123.123    --> double
1999-11-11 --> date
"hello world" --> string
Object       --> object

定制dynamic mapping 策略#

ture: 语法陌生字段就进行dynamic mapping
false: 遇到陌生字段就忽略
strict: 遇到默认字段就报错

示例

PUT /my_index/
{
    "mappings":{
        "dynamic":"strict"
    }
}

禁用ES的日期探测

PUT my_index
{
  "mappings": {
    "_doc": {
      "date_detection": false
    }
  }
}
PUT my_index/_doc/1 
{
  "create": "2015/09/02"
}

定制日期发现规则

PUT my_index
{
  "mappings": {
    "_doc": {
      "dynamic_date_formats": ["MM/dd/yyyy"]
    }
  }
}
PUT my_index/_doc/1
{
  "create_date": "09/25/2015"
}

定制数字类型的探测规则

PUT my_index
{
  "mappings": {
    "_doc": {
      "numeric_detection": true
    }
  }
}
PUT my_index/_doc/1
{
  "my_float":   "1.0", 
  "my_integer": "1" 
}

Elasticsearch 干货入门篇（一）

root object#

mate-field 元数据字段#

_all#

_field_names#

_id#

_index#

_routing#

_source#

_type#

_uid#

mapping-parameters#

copy_to#

动态mapping(dynamic mapping)#

定制dynamic mapping 策略#

热门文章

最新文章

相关课程

相关电子书

相关实验场景

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

Elasticsearch 干货 入门篇 （一）

root object#

mate-field 元数据字段#

_all#

_field_names#

_id#

_index#

_routing#

_source#

_type#

_uid#

mapping-parameters#

copy_to#

动态mapping(dynamic mapping)#

定制dynamic mapping 策略#

热门文章

最新文章

相关课程

相关电子书

相关实验场景

Elasticsearch 干货入门篇（一）