elasticsearch-jdbc实现MySQL同步到ElasticSearch深入详解-阿里云开发者社区

elasticsearch-jdbc实现MySQL同步到ElasticSearch深入详解

2019-07-09 3677

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

云数据库 RDS MySQL，集群系列 2核4GB

检索分析服务 Elasticsearch 版，2核4GB开发者规格 1个月

RDS MySQL Serverless 基础系列，0.5-2RCU 50GB

简介： 本文是elasticsearch-jdbc实现MySQL同步到ElasticSearch深入详解。

1、如何实现mysql与elasticsearch的数据同步？

逐条转换为json显然不合适，需要借助第三方工具或者自己实现。核心功能点：同步增、删、改、查同步。

2、mysql与elasticsearch同步的方法有哪些？优缺点对比？

目前该领域比较牛的插件有：

1）elasticsearch-jdbc，严格意义上它已经不是第三方插件。已经成为独立的第三方工具。https://github.com/jprante/elasticsearch-jdbc
2）elasticsearch-river-mysql插件 https://github.com/scharron/elasticsearch-river-mysql
3）go-mysql-elasticsearch（国内作者siddontang） https://github.com/siddontang/go-mysql-elasticsearch

1-3同步工具/插件对比：

go-mysql-elasticsearch仍处理开发不稳定阶段。
为什么选择elasticsearch-jdbc而不是elasticsearch-river-mysql插件的原因？
（参考：http://stackoverflow.com/questions/23658534/using-elasticsearch-river-mysql-to-stream-data-from-mysql-database-to-elasticsea）

1）通用性角度：elasticsearch-jdbc更通用，
2）版本更新角度：elasticsearch-jdbc GitHub活跃度很高，最新的版本2.3.3.02016年5月28日兼容Elasticsearch2.3.3版本。
而elasticsearch-river-mysql 2012年12月13日后便不再更新。
综上，选择elasticsearch-jdbc作为mysql同步Elasticsearch的工具理所当然。

elasticsearch-jdbc的缺点与不足（他山之石）：

1）go-mysql-elasticsearch作者siddontang在博客提到的：
elasticsearch-river-jdbc的功能是很强大，但并没有很好的支持增量数据更新的问题，它需要对应的表只增不减，而这个几乎在项目中是不可能办到的。
http://www.jianshu.com/p/05cff717563c

2）博主leotse90在博文中提到elasticsearch-jdbc的缺点：那就是删除操作不能同步（物理删除）！
http://leotse90.com/2015/11/11/ElasticSearch与MySQL数据同步以及修改表结构/

我截止2016年6月16日没有测试到，不妄加评论。

3、elasticsearch-jdbc如何使用？要不要安装？

3.1 和早期版本不同点

elasticsearch-jdbcV2.3.2.0版本不需要安装。以下笔者使用的elasticsearch也是2.3.2测试。
操作系统：CentOS release 6.6 (Final)
看到这里，你可能会问早期的版本有什么不同呢？很大不同。从我搜集资料来看，不同点如下：

1）早期1.x版本，作为插件，需要安装。
2）配置也会有不同。

3.2 elasticsearch-jdbc使用(同步方法一）

前提：
1）elasticsearch 2.3.2 安装成功，测试ok。
2）mysql安装成功，能实现增、删、改、查。
可供测试的数据库为test，表为cc，具体信息如下：

mysql> select * from cc;
+----+------------+
| id | name |
+----+------------+
| 1 | laoyang |
| 2 | dluzhang |
| 3 | dlulaoyang |
+----+------------+
3 rows in set (0.00 sec)

第一步：下载工具。
地址：http://xbib.org/repository/org/xbib/elasticsearch/importer/elasticsearch-jdbc/2.3.2.0/elasticsearch-jdbc-2.3.2.0-dist.zip

第二步：导入Centos。路径自己定，笔者放到根目录下，解压。unzip elasticsearch-jdbc-2.3.2.0-dist.zip

第三步：设置环境变量。

[root@5b9dbaaa148a /]# vi /etc/profile 
export JDBC_IMPORTER_HOME=/elasticsearch-jdbc-2.3.2.0

使环境变量生效：

[root@5b9dbaaa148a /]# source /etc/profile

第四步：配置使用。详细参考：

https://github.com/jprante/elasticsearch-jdbc

1）根目录下新建文件夹odbc_es 如下：

[root@5b9dbaaa148a /]# ll /odbc_es/ 
drwxr-xr-x 2 root root 4096 Jun 16 03:11 logs 
-rwxrwxrwx 1 root root 542 Jun 16 04:03 mysql_import_es.sh

2）新建脚本mysql_import_es.sh，内容如下；

[root@5b9dbaaa148a odbc_es]# cat mysql_import_es.sh
’#!/bin/sh
bin=$JDBC_IMPORTER_HOME/bin
lib=$JDBC_IMPORTER_HOME/lib
echo '{
"type" : "jdbc",
"jdbc": {
"elasticsearch.autodiscover":true,
"elasticsearch.cluster":"my-application", #簇名，详见：/usr/local/elasticsearch/config/elasticsearch.yml
"url":"jdbc:mysql://10.8.5.101:3306/test", #mysql数据库地址
"user":"root", #mysql用户名
"password":"123456", #mysql密码
"sql":"select * from cc",
"elasticsearch" : {
  "host" : "10.8.5.101",
  "port" : 9300
},
"index" : "myindex", #新的index
"type" : "mytype" #新的type
}
}'| java \
  -cp "${lib}/*" \
  -Dlog4j.configurationFile=${bin}/log4j2.xml \
  org.xbib.tools.Runner \
  org.xbib.tools.JDBCImporter

3）为 mysql_import_es.sh 添加可执行权限。

[root@5b9dbaaa148a odbc_es]# chmod a+x mysql_import_es.sh

4）执行脚本mysql_import_es.sh

[root@5b9dbaaa148a odbc_es]# ./mysql_import_es.sh

第五步：测试数据同步是否成功。
使用elasticsearch检索查询：

[root@5b9dbaaa148a odbc_es]# curl -XGET 'http://10.8.5.101:9200/myindex/mytype/_search?pretty'
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
  "total" : 8,
  "successful" : 8,
  "failed" : 0
  },
  "hits" : {
  "total" : 3,
  "max_score" : 1.0,
  "hits" : [ {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWH",
  "_score" : 1.0,
  "_source" : {
  "id" : 1,
  "name" : "laoyang"
  }
  }, {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWI",
  "_score" : 1.0,
  "_source" : {
  "id" : 2,
  "name" : "dluzhang"
  }
  }, {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWJ",
  "_score" : 1.0,
  "_source" : {
  "id" : 3,
  "name" : "dlulaoyang"
  }
  } ]
  }
}

出现以上包含mysql数据字段的信息则为同步成功。

4、 elasticsearch-jdbc 同步方法二

[root@5b9dbaaa148a odbc_es]# cat mysql_import_es_simple.sh
#!/bin/sh
bin=$JDBC_IMPORTER_HOME/bin
lib=$JDBC_IMPORTER_HOME/lib
  java \
  -cp "${lib}/*" \
  -Dlog4j.configurationFile=${bin}/log4j2.xml \
  org.xbib.tools.Runner \
  org.xbib.tools.JDBCImporter statefile.json

[root@5b9dbaaa148a odbc_es]# cat statefile.json
{
"type" : "jdbc",
"jdbc": {
"elasticsearch.autodiscover":true,
"elasticsearch.cluster":"my-application",
"url":"jdbc:mysql://10.8.5.101:3306/test",
"user":"root",
"password":"123456",
"sql":"select * from cc",
"elasticsearch" : {
  "host" : "10.8.5.101",
  "port" : 9300
},
"index" : "myindex_2",
"type" : "mytype_2"
}
}

脚本和json文件分开，脚本执行前先加载json文件。
执行方式：直接运行脚本 ./mysql_import_es_simple.sh 即可。

5、Mysql与elasticsearch等价查询

目标：实现从表cc中查询id=3的name信息。
1）MySQL中sql语句查询：

mysql> select * from cc where id=3;
+----+------------+
| id | name |
+----+------------+
| 3 | dlulaoyang |
+----+------------+
1 row in set (0.00 sec)

2）elasticsearch检索：

[root@5b9dbaaa148a odbc_es]# curl http://10.8.5.101:9200/myindex/mytype/_search?pretty -d '
{
"filter" : { "term" : { "id" : "3" } }
}'
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
  "total" : 8,
  "successful" : 8,
  "failed" : 0
  },
  "hits" : {
  "total" : 1,
  "max_score" : 1.0,
  "hits" : [ {
  "_index" : "myindex",
  "_type" : "mytype",
  "_id" : "AVVXKgeEun6ksbtikOWJ",
  "_score" : 1.0,
  "_source" : {
  "id" : 3,
  "name" : "dlulaoyang"
  }
  } ]
  }
}

常见错误：
错误日志位置：/odbc_es/logs
日志内容：

[root@5b9dbaaa148a logs]# tail -f jdbc.log
04:03:39,570org.xbib.elasticsearch.helper.client.BaseTransportClient after auto-discovery connected to [{5b9dbaaa148a}{aksn2ErNRlWjUECnp_8JmA}{10.8.5.101}{10.8.5.101:9300}{master=true}]

Bug1、02:46:23,894importer.jdbc error while processing request: cluster state is RED and not YELLOW, from here on, everything will fail!

原因：
you created an index with replicas but you had only one node in the cluster. One way to solve this problem is by allocating them on a second node. Another way is by turning replicas off.
你创建了带副本 replicas 的索引，但是在你的簇中只有一个节点。

解决方案：
方案一：允许分配‘它们’到第二个节点。
方案二：关闭副本replicas（非常可行）。如下：

curl -XPUT 'localhost:9200/_settings' -d '
{
  "index" : {
  "number_of_replicas" : 0
  }
}

Bug2、13:00:37,137importer.jdbc error while processing request: no cluster nodes available, check settings {autodiscover=false, client.transport.ignore_cluster_name=false, client.transport.nodes_sampler_interval=5s, client.transport.ping_timeout=5s, cluster.name=elasticsearch,
org.elasticsearch.client.transport.NoNodeAvailableException: no cluster nodes available, check

解决方案：
见上脚本中新增：

“elasticsearch.cluster”:”my-application”, #簇名，和/usr/local/elasticsearch/config/elasticsearch.yml 簇名保持一致。

参考：
http://stackoverflow.com/questions/11944915/getting-an-elasticsearch-cluster-to-green-cluster-setup-on-os-x

作者：铭毅天下
转载请标明出处，原文地址：http://blog.csdn.net/laoyang360/article/details/51694519

elasticsearch-jdbc实现MySQL同步到ElasticSearch深入详解

1、如何实现mysql与elasticsearch的数据同步？

2、mysql与elasticsearch同步的方法有哪些？优缺点对比？

目前该领域比较牛的插件有：

1-3同步工具/插件对比：

elasticsearch-jdbc的缺点与不足（他山之石）：

3、elasticsearch-jdbc如何使用？要不要安装？

3.1 和早期版本不同点

3.2 elasticsearch-jdbc使用(同步方法一）

4、 elasticsearch-jdbc 同步方法二

5、Mysql与elasticsearch等价查询

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

elasticsearch-jdbc实现MySQL同步到ElasticSearch深入详解

1、如何实现mysql与elasticsearch的数据同步？

2、mysql与elasticsearch同步的方法有哪些？优缺点对比？

目前该领域比较牛的插件有：

1-3同步工具/插件对比：

elasticsearch-jdbc的缺点与不足（他山之石）：

3、elasticsearch-jdbc如何使用？要不要安装？

3.1 和早期版本不同点

3.2 elasticsearch-jdbc使用(同步方法一）

4、 elasticsearch-jdbc 同步方法二

5、Mysql与elasticsearch等价查询

热门文章

最新文章

相关课程

相关电子书

相关实验场景

推荐镜像