标签
PostgreSQL , pg_locks , pg_stat_activity , 锁监控 , 谁堵塞了谁
背景
在数据库中,通过锁以及多版本并发控制可以保护数据的一致性,例如A正在查询数据,B就无法对A访问的对象执行DDL。A正在更新某条记录,B就不能删除或更新这条记录。
锁是数据库自动管理的,同时数据库还提供了AD LOCK或者LOCK语法,允许用户自己控制锁。
例如AD lock的应用可以参考如下:
《PostgreSQL 使用advisory lock实现行级读写堵塞》
《PostgreSQL 无缝自增ID的实现 - by advisory lock》
《PostgreSQL 使用advisory lock或skip locked消除行锁冲突, 提高几十倍并发更新效率》
当然,如果应用程序逻辑设计不慎,就可能导致严重的锁等待,或者死锁的产生。
如果你发现SQL请求大多数时候处于等待锁的状态,那么可能出现了业务逻辑的问题。
如何检查或监控锁等待呢?
PostgreSQL提供了两个视图
1. pg_locks展示锁信息,每一个被锁或者等待锁的对象一条记录。
2. pg_stat_activity,每个会话一条记录,显示会话状态信息。
我们通过这两个视图可以查看锁,锁等待情况。同时可以了解发生锁冲突的情况。
pg_stat_activity.query反映的是当前正在执行或请求的SQL,而同一个事务中以前已经执行的SQL不能在pg_stat_activity中显示出来。
所以如果你发现两个会话发生了冲突,但是他们的pg_stat_activity.query没有冲突的话,那就有可能是他们之间的某个事务之前的SQL获取的锁与另一个事务当前请求的QUERY发生了锁冲突。
如果追踪详细的锁冲突信息:
1. 可以通过lock trace跟踪锁等待的详细信息,
《PostgreSQL Developer Options (debug, trace, system table mod and so on...) 详解》
2. 通过数据库日志(开启lock_timeout, log_lockwait参数)(csvlog)跟踪锁等待信息,
3. 或者通过数据库日志(开启log_statements='all',SQL审计)追踪事务中所有的SQL (csvlog),分析事务之间的锁冲突。
4. 通过SQL查看持锁,等锁的事务状态。
锁的释放时机:
大多数锁要等待事务结束后释放,某些轻量级锁(数据库自动控制)是随用随释放的。
查看当前事务锁等待、持锁信息的SQL
这条SQL非常有用,建议DBA珍藏。
with
t_wait as
(
select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted,
a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath,
b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name
from pg_locks a,pg_stat_activity b where a.pid=b.pid and not a.granted
),
t_run as
(
select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted,
a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath,
b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name
from pg_locks a,pg_stat_activity b where a.pid=b.pid and a.granted
),
t_overlap as
(
select r.* from t_wait w join t_run r on
(
r.locktype is not distinct from w.locktype and
r.database is not distinct from w.database and
r.relation is not distinct from w.relation and
r.page is not distinct from w.page and
r.tuple is not distinct from w.tuple and
r.virtualxid is not distinct from w.virtualxid and
r.transactionid is not distinct from w.transactionid and
r.classid is not distinct from w.classid and
r.objid is not distinct from w.objid and
r.objsubid is not distinct from w.objsubid and
r.pid <> w.pid
)
),
t_unionall as
(
select r.* from t_overlap r
union all
select w.* from t_wait w
)
select locktype,datname,relation::regclass,page,tuple,virtualxid,transactionid::text,classid::regclass,objid,objsubid,
string_agg(
'Pid: '||case when pid is null then 'NULL' else pid::text end||chr(10)||
'Lock_Granted: '||case when granted is null then 'NULL' else granted::text end||' , Mode: '||case when mode is null then 'NULL' else mode::text end||' , FastPath: '||case when fastpath is null then 'NULL' else fastpath::text end||' , VirtualTransaction: '||case when virtualtransaction is null then 'NULL' else virtualtransaction::text end||' , Session_State: '||case when state is null then 'NULL' else state::text end||chr(10)||
'Username: '||case when usename is null then 'NULL' else usename::text end||' , Database: '||case when datname is null then 'NULL' else datname::text end||' , Client_Addr: '||case when client_addr is null then 'NULL' else client_addr::text end||' , Client_Port: '||case when client_port is null then 'NULL' else client_port::text end||' , Application_Name: '||case when application_name is null then 'NULL' else application_name::text end||chr(10)||
'Xact_Start: '||case when xact_start is null then 'NULL' else xact_start::text end||' , Query_Start: '||case when query_start is null then 'NULL' else query_start::text end||' , Xact_Elapse: '||case when (now()-xact_start) is null then 'NULL' else (now()-xact_start)::text end||' , Query_Elapse: '||case when (now()-query_start) is null then 'NULL' else (now()-query_start)::text end||chr(10)||
'SQL (Current SQL in Transaction): '||chr(10)||
case when query is null then 'NULL' else query::text end,
chr(10)||'--------'||chr(10)
order by
( case mode
when 'INVALID' then 0
when 'AccessShareLock' then 1
when 'RowShareLock' then 2
when 'RowExclusiveLock' then 3
when 'ShareUpdateExclusiveLock' then 4
when 'ShareLock' then 5
when 'ShareRowExclusiveLock' then 6
when 'ExclusiveLock' then 7
when 'AccessExclusiveLock' then 8
else 0
end ) desc,
(case when granted then 0 else 1 end)
) as lock_conflict
from t_unionall
group by
locktype,datname,relation,page,tuple,virtualxid,transactionid::text,classid,objid,objsubid ;
如果觉得写SQL麻烦,可以将它创建为视图
create view v_locks_monitor as
with
t_wait as
(
select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted,
a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath,
b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name
from pg_locks a,pg_stat_activity b where a.pid=b.pid and not a.granted
),
t_run as
(
select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted,
a.objid,a.objsubid,a.pid,a.virtualtransaction,a.virtualxid,a.transactionid,a.fastpath,
b.state,b.query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name
from pg_locks a,pg_stat_activity b where a.pid=b.pid and a.granted
),
t_overlap as
(
select r.* from t_wait w join t_run r on
(
r.locktype is not distinct from w.locktype and
r.database is not distinct from w.database and
r.relation is not distinct from w.relation and
r.page is not distinct from w.page and
r.tuple is not distinct from w.tuple and
r.virtualxid is not distinct from w.virtualxid and
r.transactionid is not distinct from w.transactionid and
r.classid is not distinct from w.classid and
r.objid is not distinct from w.objid and
r.objsubid is not distinct from w.objsubid and
r.pid <> w.pid
)
),
t_unionall as
(
select r.* from t_overlap r
union all
select w.* from t_wait w
)
select locktype,datname,relation::regclass,page,tuple,virtualxid,transactionid::text,classid::regclass,objid,objsubid,
string_agg(
'Pid: '||case when pid is null then 'NULL' else pid::text end||chr(10)||
'Lock_Granted: '||case when granted is null then 'NULL' else granted::text end||' , Mode: '||case when mode is null then 'NULL' else mode::text end||' , FastPath: '||case when fastpath is null then 'NULL' else fastpath::text end||' , VirtualTransaction: '||case when virtualtransaction is null then 'NULL' else virtualtransaction::text end||' , Session_State: '||case when state is null then 'NULL' else state::text end||chr(10)||
'Username: '||case when usename is null then 'NULL' else usename::text end||' , Database: '||case when datname is null then 'NULL' else datname::text end||' , Client_Addr: '||case when client_addr is null then 'NULL' else client_addr::text end||' , Client_Port: '||case when client_port is null then 'NULL' else client_port::text end||' , Application_Name: '||case when application_name is null then 'NULL' else application_name::text end||chr(10)||
'Xact_Start: '||case when xact_start is null then 'NULL' else xact_start::text end||' , Query_Start: '||case when query_start is null then 'NULL' else query_start::text end||' , Xact_Elapse: '||case when (now()-xact_start) is null then 'NULL' else (now()-xact_start)::text end||' , Query_Elapse: '||case when (now()-query_start) is null then 'NULL' else (now()-query_start)::text end||chr(10)||
'SQL (Current SQL in Transaction): '||chr(10)||
case when query is null then 'NULL' else query::text end,
chr(10)||'--------'||chr(10)
order by
( case mode
when 'INVALID' then 0
when 'AccessShareLock' then 1
when 'RowShareLock' then 2
when 'RowExclusiveLock' then 3
when 'ShareUpdateExclusiveLock' then 4
when 'ShareLock' then 5
when 'ShareRowExclusiveLock' then 6
when 'ExclusiveLock' then 7
when 'AccessExclusiveLock' then 8
else 0
end ) desc,
(case when granted then 0 else 1 end)
) as lock_conflict
from t_unionall
group by
locktype,datname,relation,page,tuple,virtualxid,transactionid::text,classid,objid,objsubid ;
例子
postgres=# create table locktest(id int primary key, info text);
CREATE TABLE
postgres=# insert into locktest values (1,'a');
INSERT 0 1
会话A
postgres=# begin;
BEGIN
postgres=# update locktest set info='a' where id=1;
UPDATE 1
postgres=# select * from locktest ;
id | info
----+------
1 | a
(1 row)
会话B
postgres=# begin;
BEGIN
postgres=# select * from locktest ;
id | info
----+------
1 | a
(1 row)
会话C
postgres=# begin;
BEGIN
postgres=# insert into locktest values (2,'test');
INSERT 0 1
会话D
postgres=# begin;
BEGIN
postgres=# truncate locktest ;
waiting......
会话E
postgres=# select * from locktest ;
waiting......
会话F
postgres=# \x
Expanded display is on.
postgres=# select * from v_locks_monitor ;
-[ RECORD 1 ]-+------------------------------------------------------------------------------------------------------------------------------------------------------
locktype | relation
datname | postgres
relation | locktest
page |
tuple |
virtualxid |
transactionid |
classid |
objid |
objsubid |
string_agg | Pid: 23043 +
| Granted: false , Mode: AccessExclusiveLock , FastPath: false , VirtualTransaction: 4/1450064 , Session_State: active +
| Username: postgres , Database: postgres , Client_Addr: NULL , Client_Port: -1 , Application_Name: psql +
| Xact_Start: 2017-05-21 21:43:43.735829+08 , Query_Start: 2017-05-21 21:43:50.965797+08 , Xact_Elapse: 00:01:11.919991 , Query_Elapse: 00:01:04.690023+
| Query: truncate locktest ; +
| -------- +
| Pid: 40698 +
| Granted: true , Mode: RowExclusiveLock , FastPath: false , VirtualTransaction: 6/1031925 , Session_State: idle in transaction +
| Username: postgres , Database: postgres , Client_Addr: NULL , Client_Port: -1 , Application_Name: psql +
| Xact_Start: 2017-05-21 21:43:15.173798+08 , Query_Start: 2017-05-21 21:43:24.338804+08 , Xact_Elapse: 00:01:40.482022 , Query_Elapse: 00:01:31.317016+
| Query: insert into locktest values (2,'test'); +
| -------- +
| Pid: 17515 +
| Granted: true , Mode: RowExclusiveLock , FastPath: false , VirtualTransaction: 3/5671759 , Session_State: idle in transaction +
| Username: postgres , Database: postgres , Client_Addr: NULL , Client_Port: -1 , Application_Name: psql +
| Xact_Start: 2017-05-21 21:42:19.199124+08 , Query_Start: 2017-05-21 21:42:47.820125+08 , Xact_Elapse: 00:02:36.456696 , Query_Elapse: 00:02:07.835695+
| Query: select * from locktest ; +
| -------- +
| Pid: 17515 +
| Granted: true , Mode: RowExclusiveLock , FastPath: false , VirtualTransaction: 3/5671759 , Session_State: idle in transaction +
| Username: postgres , Database: postgres , Client_Addr: NULL , Client_Port: -1 , Application_Name: psql +
| Xact_Start: 2017-05-21 21:42:19.199124+08 , Query_Start: 2017-05-21 21:42:47.820125+08 , Xact_Elapse: 00:02:36.456696 , Query_Elapse: 00:02:07.835695+
| Query: select * from locktest ; +
| -------- +
| Pid: 40698 +
| Granted: true , Mode: RowExclusiveLock , FastPath: false , VirtualTransaction: 6/1031925 , Session_State: idle in transaction +
| Username: postgres , Database: postgres , Client_Addr: NULL , Client_Port: -1 , Application_Name: psql +
| Xact_Start: 2017-05-21 21:43:15.173798+08 , Query_Start: 2017-05-21 21:43:24.338804+08 , Xact_Elapse: 00:01:40.482022 , Query_Elapse: 00:01:31.317016+
| Query: insert into locktest values (2,'test'); +
| -------- +
| Pid: 40199 +
| Granted: true , Mode: AccessShareLock , FastPath: false , VirtualTransaction: 5/1029276 , Session_State: idle in transaction +
| Username: postgres , Database: postgres , Client_Addr: NULL , Client_Port: -1 , Application_Name: psql +
| Xact_Start: 2017-05-21 21:43:01.745129+08 , Query_Start: 2017-05-21 21:43:05.928125+08 , Xact_Elapse: 00:01:53.910691 , Query_Elapse: 00:01:49.727695+
| Query: select * from locktest ; +
| -------- +
| Pid: 17515 +
| Granted: true , Mode: AccessShareLock , FastPath: false , VirtualTransaction: 3/5671759 , Session_State: idle in transaction +
| Username: postgres , Database: postgres , Client_Addr: NULL , Client_Port: -1 , Application_Name: psql +
| Xact_Start: 2017-05-21 21:42:19.199124+08 , Query_Start: 2017-05-21 21:42:47.820125+08 , Xact_Elapse: 00:02:36.456696 , Query_Elapse: 00:02:07.835695+
| Query: select * from locktest ; +
| -------- +
| Pid: 40199 +
| Granted: true , Mode: AccessShareLock , FastPath: false , VirtualTransaction: 5/1029276 , Session_State: idle in transaction +
| Username: postgres , Database: postgres , Client_Addr: NULL , Client_Port: -1 , Application_Name: psql +
| Xact_Start: 2017-05-21 21:43:01.745129+08 , Query_Start: 2017-05-21 21:43:05.928125+08 , Xact_Elapse: 00:01:53.910691 , Query_Elapse: 00:01:49.727695+
| Query: select * from locktest ; +
| -------- +
| Pid: 17515 +
| Granted: true , Mode: AccessShareLock , FastPath: false , VirtualTransaction: 3/5671759 , Session_State: idle in transaction +
| Username: postgres , Database: postgres , Client_Addr: NULL , Client_Port: -1 , Application_Name: psql +
| Xact_Start: 2017-05-21 21:42:19.199124+08 , Query_Start: 2017-05-21 21:42:47.820125+08 , Xact_Elapse: 00:02:36.456696 , Query_Elapse: 00:02:07.835695+
| Query: select * from locktest ; +
| -------- +
| Pid: 24781 +
| Granted: false , Mode: AccessShareLock , FastPath: false , VirtualTransaction: 7/1025270 , Session_State: active +
| Username: postgres , Database: postgres , Client_Addr: NULL , Client_Port: -1 , Application_Name: psql +
| Xact_Start: 2017-05-21 21:44:20.725834+08 , Query_Start: 2017-05-21 21:44:20.725834+08 , Xact_Elapse: 00:00:34.929986 , Query_Elapse: 00:00:34.929986+
| Query: select * from locktest ;
处理方法
1. 前面的锁查询SQL,已经清晰的显示了每一个发生了锁等待的对象,按锁的大小排序,要快速解出这种状态,terminate最大的锁对应的PID即可。
postgres=# select pg_terminate_backend(23043);
-[ RECORD 1 ]--------+--
pg_terminate_backend | t
会话D
FATAL: terminating connection due to administrator command
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.
干掉23043后,大家都清净了
postgres=# select * from v_locks_monitor ;
(0 rows)
Greenplum
如果是Greenplum,由于版本问题,SQL语句略微不一样,如下:
with
t_wait as
(
select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted,
a.objid,a.objsubid,a.pid,a.transactionid,a.mppsessionid,a.mppiswriter,a.gp_segment_id,
b.procpid,b.sess_id,b.waiting_reason,b.current_query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name
from pg_locks a,pg_stat_activity b where a.mppsessionid=b.sess_id and not a.granted
),
t_run as
(
select a.mode,a.locktype,a.database,a.relation,a.page,a.tuple,a.classid,a.granted,
a.objid,a.objsubid,a.pid,a.transactionid,a.mppsessionid,a.mppiswriter,a.gp_segment_id,
b.procpid,b.sess_id,b.waiting_reason,b.current_query,b.xact_start,b.query_start,b.usename,b.datname,b.client_addr,b.client_port,b.application_name
from pg_locks a,pg_stat_activity b where a.mppsessionid=b.sess_id and a.granted
),
t_overlap as
(
select r.* from t_wait w join t_run r on
(
r.locktype is not distinct from w.locktype and
r.database is not distinct from w.database and
r.relation is not distinct from w.relation and
r.page is not distinct from w.page and
r.tuple is not distinct from w.tuple and
r.transactionid is not distinct from w.transactionid and
r.classid is not distinct from w.classid and
r.objid is not distinct from w.objid and
r.objsubid is not distinct from w.objsubid and
r.mppsessionid <> w.mppsessionid
)
),
t_unionall as
(
select r.* from t_overlap r
union all
select w.* from t_wait w
)
select locktype,datname,relation::regclass,page,tuple,textin(xidout(transactionid)),classid::regclass,objid,objsubid,
string_agg(
'Gp_Segment_Id: '||case when gp_segment_id is null then 'NULL' else gp_segment_id::text end||chr(10)||
'MppIsWriter: '||case when mppiswriter is null then 'NULL' when mppiswriter is true then 'TRUE' else 'FALSE' end||chr(10)||
'MppSessionId: '||case when mppsessionid is null then 'NULL' else mppsessionid::text end||chr(10)||
'ProcPid: '||case when procpid is null then 'NULL' else procpid::text end||chr(10)||
'Pid: '||case when pid is null then 'NULL' else pid::text end||chr(10)||
'Lock_Granted: '||case when granted is null then 'NULL' when granted is true then 'TRUE' else 'FALSE' end||' , Mode: '||case when mode is null then 'NULL' else mode::text end||' , Waiting_Reason: '||case when waiting_reason is null then 'NULL' else waiting_reason::text end||chr(10)||
'Username: '||case when usename is null then 'NULL' else usename::text end||' , Database: '||case when datname is null then 'NULL' else datname::text end||' , Client_Addr: '||case when client_addr is null then 'NULL' else client_addr::text end||' , Client_Port: '||case when client_port is null then 'NULL' else client_port::text end||' , Application_Name: '||case when application_name is null then 'NULL' else application_name::text end||chr(10)||
'Xact_Start: '||case when xact_start is null then 'NULL' else xact_start::text end||' , Query_Start: '||case when query_start is null then 'NULL' else query_start::text end||' , Xact_Elapse: '||case when (now()-xact_start) is null then 'NULL' else (now()-xact_start)::text end||' , Query_Elapse: '||case when (now()-query_start) is null then 'NULL' else (now()-query_start)::text end||chr(10)||
'SQL (Current SQL in Transaction): '||chr(10)||
case when current_query is null then 'NULL' else current_query::text end,
chr(10)||'--------'||chr(10)
order by
( case mode
when 'INVALID' then 0
when 'AccessShareLock' then 1
when 'RowShareLock' then 2
when 'RowExclusiveLock' then 3
when 'ShareUpdateExclusiveLock' then 4
when 'ShareLock' then 5
when 'ShareRowExclusiveLock' then 6
when 'ExclusiveLock' then 7
when 'AccessExclusiveLock' then 8
else 0
end ) desc,
(case when granted then 0 else 1 end)
) as lock_conflict
from t_unionall
group by
locktype,datname,relation::regclass,page,tuple,textin(xidout(transactionid)),classid::regclass,objid,objsubid ;
测试
-[ RECORD 1 ]-+------------------------------------------------------------------------------------------------------------------------------------------------------
locktype | relation
datname | postgres
relation | locktest
page |
tuple |
textin |
classid |
objid |
objsubid |
lock_conflict | Gp_Segment_Id: -1
| MppIsWriter: TRUE
| MppSessionId: 46
| ProcPid: 100310
| Pid: 100310
| Lock_Granted: TRUE , Mode: ExclusiveLock , Waiting_Reason: NULL
| Username: dege.zzz , Database: postgres , Client_Addr: 127.0.0.1/32 , Client_Port: 51220 , Application_Name: psql
| Xact_Start: 2017-05-22 14:59:50.067908+08 , Query_Start: 2017-05-22 15:00:01.568904+08 , Xact_Elapse: 00:00:37.858031 , Query_Elapse: 00:00:26.357035
| SQL (Current SQL in Transaction):
| <IDLE> in transaction
| --------
| Gp_Segment_Id: -1
| MppIsWriter: TRUE
| MppSessionId: 47
| ProcPid: 112053
| Pid: 112053
| Lock_Granted: FALSE , Mode: ExclusiveLock , Waiting_Reason: lock
| Username: dege.zzz , Database: postgres , Client_Addr: 127.0.0.1/32 , Client_Port: 51518 , Application_Name: psql
| Xact_Start: 2017-05-22 15:00:06.994012+08 , Query_Start: 2017-05-22 15:00:19.6+08 , Xact_Elapse: 00:00:20.931927 , Query_Elapse: 00:00:08.325939
| SQL (Current SQL in Transaction):
| update locktest set info='b' where id=2;
| --------
| Gp_Segment_Id: 0
| MppIsWriter: TRUE
| MppSessionId: 46
| ProcPid: 100310, master的pid
| Pid: 111641, segment的pid
| Lock_Granted: TRUE , Mode: RowExclusiveLock , Waiting_Reason: NULL
| Username: dege.zzz , Database: postgres , Client_Addr: 127.0.0.1/32 , Client_Port: 51220 , Application_Name: psql
| Xact_Start: 2017-05-22 14:59:50.067908+08 , Query_Start: 2017-05-22 15:00:01.568904+08 , Xact_Elapse: 00:00:37.858031 , Query_Elapse: 00:00:26.357035
| SQL (Current SQL in Transaction):
| <IDLE> in transaction
关注gp_segment_id=-1的,长时间等待,杀掉procpid即可。
postgres=# select pg_terminate_backend(100310);
-[ RECORD 1 ]--------+--
pg_terminate_backend | t
参考
https://www.postgresql.org/docs/9.6/static/view-pg-locks.html
https://www.postgresql.org/docs/9.6/static/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW
https://www.postgresql.org/docs/9.6/static/mvcc.html
《PostgreSQL Developer Options (debug, trace, system table mod and so on...) 详解》
《PostgreSQL 使用advisory lock实现行级读写堵塞》
《PostgreSQL 无缝自增ID的实现 - by advisory lock》
《PostgreSQL 使用advisory lock或skip locked消除行锁冲突, 提高几十倍并发更新效率》
《Compare PostgreSQL and Oracle dead lock detect and transaction》