TEXT类型索引列默认使用SingleWord分词器,按“单个汉字”切分中文,按“单个单词”切分英文,大小写字母不敏感,且单词不会被拆分为子词。例如,字段值"Xiaomi/小米redmi note 7 pro 红米索尼4800万智能手机"会被切分为词条:"xiaomi", "小", "米", "redmi", "note", "7", "pro", "红", "米", "索", "尼", "4800", "万", "智", "能", "手", "机",并建立倒排索引。 这个分词结果从哪里获取呢?
这个分词结果可以通过Elasticsearch的API接口获取。具体来说,可以使用_analyze API对文本进行分词分析,并返回分词结果。例如:
GET /my_index/_analyze
{
"text": "Xiaomi/小米redmi note 7 pro 红米索尼4800万智能手机",
"analyzer": "singleword"
}
返回的结果类似于:
```
{
"tokens": [
{
"token": "xiaomi",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 1
},
{
"token": "小",
"start_offset": 6,
"end_offset": 7,
"type": "word",
"position": 2
},
{
"token": "米",
"start_offset": 8,
"end_offset": 9,
"type": "word",
"position": 3
},
{
"token": "redmi",
"start_offset": 10,
"end_offset": 15,
"type": "word",
"position": 4
},
{
"token": "note",
"start_offset": 16,
"end_offset": 18,
"type": "word",
"position": 5
},
{
"token": "7",
"start_offset": 19,
"end_offset": 20,
"type": "number",
"position": 6
},
{
"token": "pro",
"start_offset": 21,
"end_offset": 23,
"type": "word",
"position": 7
},
{
"token": "红",
"start_offset": 24,
"end_offset": 25,
"type": "word",
"position": 8
},
{
"token": "米",
"start_offset": 26,
"end_offset": 27,
"type": "word",
"position": 9
},
{
"token": "索",
"start_offset": 28,
"end_offset": 29,
"type": "word",
"position": 10
},
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。