满心欢喜的测试高大上的PXC,还没折腾几天就碰到了Bug,主要的错误提示为[ERROR] WSREP: FSM: no such a transition REPLICATING -> REPLICATING,后面的描述是碰到了Bug。本文是具体描述及其解决方案。
一、故障现象
以下为mysql error log日志捕获到的信息
2018-01-26T15:00:00.736954+08:00 2109 [Warning] WSREP: Percona-XtraDB-Cluster doesn't recommend using SERIALIZABLE isolation with pxc_strict_mode = PERMISSIVE
2018-01-26T15:00:00.737164+08:00 2116 [Warning] WSREP: Percona-XtraDB-Cluster doesn't recommend using SERIALIZABLE isolation with pxc_strict_mode = PERMISSIVE
2018-01-26T15:00:00.738619+08:00 2113 [Warning] WSREP: Percona-XtraDB-Cluster doesn't recommend using SERIALIZABLE isolation with pxc_strict_mode = PERMISSIVE
2018-01-26T15:00:00.738717+08:00 2112 [Warning] WSREP: Percona-XtraDB-Cluster doesn't recommend use of DML command on a table (S70.T7048) without an explicit primary key with pxc_strict_mode = PERMISSIVE
2018-01-26T15:00:00.738809+08:00 2112 [Warning] Event Scheduler: [root@localhost][S70.EVT_T7048] Percona-XtraDB-Cluster doesn't recommend use of DML command on a table (S70.T7048) without an explicit primary key with pxc_strict_mode = PERMISSIVE
2018-01-26T15:00:00.739163+08:00 2111 [Warning] WSREP: Percona-XtraDB-Cluster doesn't recommend using SERIALIZABLE isolation with pxc_strict_mode = PERMISSIVE
2018-01-26T15:00:00.739465+08:00 2114 [Warning] WSREP: Percona-XtraDB-Cluster doesn't recommend using SERIALIZABLE isolation with pxc_strict_mode = PERMISSIVE
2018-01-26T15:00:00.741091+08:00 2110 [ERROR] WSREP: FSM: no such a transition REPLICATING -> REPLICATING
07:00:00 UTC - mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
Attempting to collect some information that could help diagnose the problem.
As this is a crash and something is definitely wrong, the information
collection process might fail.
Please help us make Percona XtraDB Cluster better by reporting any
bugs at https://bugs.launchpad.net/percona-xtradb-cluster
key_buffer_size=67108864
read_buffer_size=2097152
max_used_connections=0
max_threads=1001
thread_count=14
connection_count=0
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18512159 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7f6f7800e110
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7f6fe0059abf thread_stack 0x40000
/usr/sbin/mysqld(my_print_stacktrace+0x3b)[0xf3de9b]
/usr/sbin/mysqld(handle_fatal_signal+0x471)[0x7adfc1]
/lib64/libpthread.so.0(+0xf370)[0x7f71318fb370]
/lib64/libc.so.6(gsignal+0x37)[0x7f712fa051d7]
/lib64/libc.so.6(abort+0x148)[0x7f712fa068c8]
/usr/lib64/galera3/libgalera_smm.so(_ZN6galera3FSMINS_9TrxHandle5StateENS1_10TransitionENS_10EmptyGuardENS_11EmptyActionEE8shift_toES2_+0x1c1)[0x7f711d7b80b1]
/usr/lib64/galera3/libgalera_smm.so(_ZN6galera13ReplicatorSMM9replicateEPNS_9TrxHandleEP14wsrep_trx_meta+0x1c8)[0x7f711d7abdf8]
/usr/lib64/galera3/libgalera_smm.so(galera_replicate+0xdf)[0x7f711d7c143f]
/usr/sbin/mysqld(_Z15wsrep_replicateP3THD+0x897)[0xde0ee7]
/usr/sbin/mysqld(_Z14ha_prepare_lowP3THDb+0x76)[0x829206]
/usr/sbin/mysqld(_Z15ha_commit_transP3THDbb+0x19f)[0x828a1f]
/usr/sbin/mysqld(_Z17trans_commit_stmtP3THD+0x38)[0xdb37b8]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x825)[0xcfd155]
/usr/sbin/mysqld(_ZN13sp_instr_stmt9exec_coreEP3THDPj+0x50)[0xc7bb40]
/usr/sbin/mysqld(_ZN12sp_lex_instr23reset_lex_and_exec_coreEP3THDPjb+0x374)[0xc7d774]
/usr/sbin/mysqld(_ZN12sp_lex_instr29validate_lex_and_execute_coreEP3THDPjb+0xbb)[0xc7e15b]
/usr/sbin/mysqld(_ZN13sp_instr_stmt7executeEP3THDPj+0x330)[0xc7f510]
/usr/sbin/mysqld(_ZN7sp_head7executeEP3THDb+0x53b)[0xc7737b]
/usr/sbin/mysqld(_ZN7sp_head17execute_procedureEP3THDP4ListI4ItemE+0x7a7)[0xc7b017]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THDb+0x7b51)[0xd04481]
/usr/sbin/mysqld(_ZN13sp_instr_stmt9exec_coreEP3THDPj+0x50)[0xc7bb40]
/usr/sbin/mysqld(_ZN12sp_lex_instr23reset_lex_and_exec_coreEP3THDPjb+0x374)[0xc7d774]
/usr/sbin/mysqld(_ZN12sp_lex_instr29validate_lex_and_execute_coreEP3THDPjb+0xbb)[0xc7e15b]
/usr/sbin/mysqld(_ZN13sp_instr_stmt7executeEP3THDPj+0x330)[0xc7f510]
/usr/sbin/mysqld(_ZN7sp_head7executeEP3THDb+0x53b)[0xc7737b]
/usr/sbin/mysqld(_ZN7sp_head17execute_procedureEP3THDP4ListI4ItemE+0x7a7)[0xc7b017]
/usr/sbin/mysqld(_ZN14Event_job_data7executeEP3THDb+0xabb)[0xdd9ccb]
/usr/sbin/mysqld(_ZN19Event_worker_thread3runEP3THDP28Event_queue_element_for_exec+0x188)[0xe95cd8]
/usr/sbin/mysqld(event_worker_thread+0x57)[0xe95da7]
/usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xfc2394]
/lib64/libpthread.so.0(+0x7dc5)[0x7f71318f3dc5]
/lib64/libc.so.6(clone+0x6d)[0x7f712fac776d]
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7f6f64106460): is an invalid pointer
Connection ID (thread ID): 2110
Status: NOT_KILLED
You may download the Percona XtraDB Cluster operations manual by visiting
http://www.percona.com/software/percona-xtradb-cluster/. You may find information
in the manual which will help you identify the cause of the crash.
二、故障分析
按上面的描述了看,当前触发了mysql bug,更令人杯具的是,改小了参数之后竟然起不来了,我晕。
这个不好玩啊,Oracle数据库参数出错了,改一下还是可以起来,这个就更NB了,直接起不来了。
# systemctl start mysql@bootstrap
Job for mysql@bootstrap.service failed because the control process exited with error code. See "systemctl status mysql@bootstrap.service" and "journalctl -xe" for details.
# systemctl status mysql@bootstrap.service
● mysql@bootstrap.service - Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap
Loaded: loaded (/usr/lib/systemd/system/mysql@.service; disabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Tue 2018-01-25 13:35:27 CST; 7s ago
Process: 19792 ExecStopPost=/usr/bin/mysql-systemd stop-post (code=exited, status=0/SUCCESS)
Process: 19761 ExecStop=/usr/bin/mysql-systemd stop (code=exited, status=2)
Process: 18679 ExecStartPost=/usr/bin/mysql-systemd start-post $MAINPID (code=exited, status=1/FAILURE)
Process: 18678 ExecStart=/usr/bin/mysqld_safe --basedir=/usr ${EXTRA_ARGS} (code=exited, status=0/SUCCESS)
Process: 18635 ExecStartPre=/usr/bin/mysql-systemd start-pre (code=exited, status=0/SUCCESS)
Main PID: 18678 (code=exited, status=0/SUCCESS)
Jan 26 15:05:27 db-50 mysql-systemd[18679]: ERROR! mysqld_safe with PID 18678 has already exited: FAILURE
Jan 26 15:05:27 db-50 systemd[1]: mysql@bootstrap.service: control process exited, code=exited status=1
Jan 26 15:05:27 db-50 mysql-systemd[19761]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Jan 26 15:05:27 db-50 mysql-systemd[19761]: ERROR! mysql already dead
Jan 26 15:05:27 db-50 systemd[1]: mysql@bootstrap.service: control process exited, code=exited status=2
Jan 26 15:05:27 db-50 mysql-systemd[19792]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Jan 26 15:05:27 db-50 mysql-systemd[19792]: WARNING: mysql may be already dead
Jan 26 15:05:27 db-50 systemd[1]: Failed to start Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap.
Jan 26 15:05:27 db-50 systemd[1]: Unit mysql@bootstrap.service entered failed state.
Jan 26 15:05:27 db-50 systemd[1]: mysql@bootstrap.service failed.
# journalctl -xe
Jan 26 13:35:48 db-50 su[18607]: (to root) robinson on pts/0
Jan 26 13:35:48 db-50 systemd[1]: Starting Cleanup of Temporary Directories...
-- Subject: Unit systemd-tmpfiles-clean.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-tmpfiles-clean.service has begun starting up.
Jan 26 15:05:12 db-50 su[18607]: pam_unix(su-l:session): session opened for user root by robin(uid=0)
Jan 26 15:05:12 db-50 systemd[1]: Started Cleanup of Temporary Directories.
-- Subject: Unit systemd-tmpfiles-clean.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit systemd-tmpfiles-clean.service has finished starting up.
--
-- The start-up result is done.
Jan 26 13:36:03 db-50 polkitd[533]: Registered Authentication Agent for unix-process:18629:1141138560 (system bus name :1.51491 [/usr/bin/pkttyagent --notify-fd 5 --fallback], object
Jan 26 13:36:03 db-50 systemd[1]: Starting Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap...
-- Subject: Unit mysql@bootstrap.service has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit mysql@bootstrap.service has begun starting up.
Jan 26 15:05:12 db-50 mysql-systemd[18679]: State transfer in progress, setting sleep higher
Jan 26 15:05:12 db-50 mysqld_safe[18678]: 2018-02-27T05:35:18.095889Z mysqld_safe Logging to '/var/log/mysqld.log'.
Jan 26 15:05:12 db-50 mysqld_safe[18678]: 2018-02-27T05:35:18.098340Z mysqld_safe Logging to '/var/log/mysqld.log'.
Jan 26 15:05:12 db-50 mysqld_safe[18678]: 2018-02-27T05:35:18.122684Z mysqld_safe Starting mysqld daemon with databases from /u02/pxcdata
Jan 26 15:05:12 db-50 mysqld_safe[18678]: 2018-02-27T05:35:18.135690Z mysqld_safe WSREP: Running position recovery with --log_error='/u02/pxcdata/wsrep_recovery.vZdxS7' --pid-file='/
Jan 26 15:05:27 db-50 mysql-systemd[18679]: /usr/bin/mysql-systemd: line 140: kill: (18678) - No such process
Jan 26 15:05:27 db-50 mysql-systemd[18679]: ERROR! mysqld_safe with PID 18678 has already exited: FAILURE
Jan 26 15:05:27 db-50 systemd[1]: mysql@bootstrap.service: control process exited, code=exited status=1
Jan 26 15:05:27 db-50 mysql-systemd[19761]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Jan 26 15:05:27 db-50 mysql-systemd[19761]: ERROR! mysql already dead
Jan 26 15:05:27 db-50 systemd[1]: mysql@bootstrap.service: control process exited, code=exited status=2
Jan 26 15:05:27 db-50 mysql-systemd[19792]: WARNING: mysql pid file /var/run/mysqld/mysqld.pid empty or not readable
Jan 26 15:05:27 db-50 mysql-systemd[19792]: WARNING: mysql may be already dead
Jan 26 15:05:27 db-50 systemd[1]: Failed to start Percona XtraDB Cluster with config /etc/sysconfig/mysql.bootstrap.
-- Subject: Unit mysql@bootstrap.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- Author : Leshami
-- Unit mysql@bootstrap.service has failed.
-- Blog : http://blog.csdn.net/leshami
-- The result is failed.
Jan 26 15:05:27 db-50 systemd[1]: Unit mysql@bootstrap.service entered failed state.
Jan 26 15:05:27 db-50 systemd[1]: mysql@bootstrap.service failed.
Jan 26 15:05:27 db-50 polkitd[533]: Unregistered Authentication Agent for unix-process:18629:1141138560 (system bus name :1.51491, object path /org/freedesktop/PolicyKit1/Authenticat
三、故障解决
Google了一下,官方的解决方案是升级到5.7.20-29.24。
由于先前的方式采用了yum安装,因此可以先卸载或者直接yum update 升级(升级先备份配置文件)
升级完毕后,将配置文件copy回原路径,再次重启OK。