PG异常无法启动的问题:could not read file "pg_logical/replorigin_checkpoint": Success

简介: 问题描述 新安装不久的PostgreSQL数据库,断电后重启,查看日志如下 2019-01-08 08:44:19.989 UTC [7493] LOG: database system was interrupted; last known up at 2018-12-24 10:56:28 UTC 2019-01-08 08:44:19.

问题描述

新安装不久的PostgreSQL数据库,断电后重启,查看日志如下

2019-01-08 08:44:19.989 UTC [7493] LOG:  database system was interrupted; last known up at 2018-12-24 10:56:28 UTC
2019-01-08 08:44:19.989 UTC [7493] PANIC:  could not read file "pg_logical/replorigin_checkpoint": Success
2019-01-08 08:44:19.989 UTC [7491] LOG:  startup process (PID 7493) was terminated by signal 6: Aborted
2019-01-08 08:44:19.989 UTC [7491] LOG:  aborting startup due to startup process failure
2019-01-08 08:44:20.011 UTC [7491] LOG:  database system is shut down

其中致命的错误是这句。

2019-01-08 08:44:19.989 UTC [7493] PANIC: could not read file "pg_logical/replorigin_checkpoint": Success

这个提示很明显,与文件replorigin_checkpoint有关。
于是找到这个文件,在数据库正常启动的情况下,尝试通过删除和修改的方式重现该问题。

无论是删除,还是修改,在数据库重启的情况下都会被重建。无法复现错误。

之所以如此,是因为一直认为该文件是在数据库启动时重建的。
直到意外的,在关闭数据库后清空文件内容,才成功复现了该问题。

注意:pg_logical/replorigin_checkpoint,文件是在数据库关闭时重建的,并且不能删除,必须清空文件内容。


问题已经清楚了,那么我们还有两个问题需要解决

  1. 文件的作用,以及数据库相应的处理流程?
  2. 如何避免该问题?

1.文件的作用

网上对于这部分的资料非常少,好在PG是开源的数据库,结合代码得到了如下的这些信息。

  1. 与逻辑复制有关,Logical replication(逻辑复制)。
  2. 通过名称pg_logical/replorigin_checkpoint 查找源代码,出现在如下两个函数中。

截取StartupReplicationOrigin()函数的一段代码

if (readBytes != sizeof(magic))
     {
         if (readBytes < 0)
             ereport(PANIC,
                     (errcode_for_file_access(),
                      errmsg("could not read file \"%s\": %m",
                             path)));
         else
             ereport(PANIC,
                     (errcode(ERRCODE_DATA_CORRUPTED),
                      errmsg("could not read file \"%s\": read %d of %zu",
                             path, readBytes, sizeof(magic))));
     }

代码中出现了我们日志中的错误 could not read file....,: Success
错误的原因是内容的大小与预设值不一致。与我们之前的测试结果类似,只有更改内容才会报错。

CheckPointReplicationOrigin()

其中,截取CheckPointReplicationOrigin函数的代码

 {
     const char *tmppath = "pg_logical/replorigin_checkpoint.tmp";
     const char *path = "pg_logical/replorigin_checkpoint";
     int         tmpfd;
     int         i;
     uint32      magic = REPLICATION_STATE_MAGIC;
     pg_crc32c   crc;
 
     if (max_replication_slots == 0)
         return;
 
     INIT_CRC32C(crc);
 
     /* make sure no old temp file is remaining */
     if (unlink(tmppath) < 0 && errno != ENOENT)
         ereport(PANIC,
                 (errcode_for_file_access(),
                  errmsg("could not remove file \"%s\": %m",
                         tmppath)));
 
     /*
      * no other backend can perform this at the same time, we're protected by
      * CheckpointLock.
      */
     tmpfd = OpenTransientFile(tmppath,
                               O_CREAT | O_EXCL | O_WRONLY | PG_BINARY);
     if (tmpfd < 0)
         ereport(PANIC,
                 (errcode_for_file_access(),
                  errmsg("could not create file \"%s\": %m",
                         tmppath)));
 
     /* write magic */
     errno = 0;
     if ((write(tmpfd, &magic, sizeof(magic))) != sizeof(magic))
     {
         /* if write didn't set errno, assume problem is no disk space */
         if (errno == 0)
             errno = ENOSPC;
         ereport(PANIC,
                 (errcode_for_file_access(),
                  errmsg("could not write to file \"%s\": %m",
                         tmppath)));
     }
     COMP_CRC32C(crc, &magic, sizeof(magic));
 
     /* prevent concurrent creations/drops */
     LWLockAcquire(ReplicationOriginLock, LW_SHARED);
 
     /* write actual data */
     for (i = 0; i < max_replication_slots; i++)
     {
         ReplicationStateOnDisk disk_state;
         ReplicationState *curstate = &replication_states[i];
         XLogRecPtr  local_lsn;
 
         if (curstate->roident == InvalidRepOriginId)
             continue;
 
         /* zero, to avoid uninitialized padding bytes */
         memset(&disk_state, 0, sizeof(disk_state));
 
         LWLockAcquire(&curstate->lock, LW_SHARED);
 
         disk_state.roident = curstate->roident;
 
         disk_state.remote_lsn = curstate->remote_lsn;
         local_lsn = curstate->local_lsn;
 
         LWLockRelease(&curstate->lock);
 
         /* make sure we only write out a commit that's persistent */
         XLogFlush(local_lsn);
 
         errno = 0;
         if ((write(tmpfd, &disk_state, sizeof(disk_state))) !=
             sizeof(disk_state))
         {
             /* if write didn't set errno, assume problem is no disk space */
             if (errno == 0)
                 errno = ENOSPC;
             ereport(PANIC,
                     (errcode_for_file_access(),
                      errmsg("could not write to file \"%s\": %m",
                             tmppath)));
         }
 
         COMP_CRC32C(crc, &disk_state, sizeof(disk_state));
     }
 
     LWLockRelease(ReplicationOriginLock);
 
     /* write out the CRC */
     FIN_CRC32C(crc);
     errno = 0;
     if ((write(tmpfd, &crc, sizeof(crc))) != sizeof(crc))
     {
         /* if write didn't set errno, assume problem is no disk space */
         if (errno == 0)
             errno = ENOSPC;
         ereport(PANIC,
                 (errcode_for_file_access(),
                  errmsg("could not write to file \"%s\": %m",
                         tmppath)));
     }
 
     if (CloseTransientFile(tmpfd))
         ereport(PANIC,
                 (errcode_for_file_access(),
                  errmsg("could not close file \"%s\": %m",
                         tmppath)));
 
     /* fsync, rename to permanent file, fsync file and directory */
     durable_rename(tmppath, path, PANIC);
 }
  1. 当参数max_replication_slots设置为0时,不会创建和修改文件,而是直接返回了。
if (max_replication_slots == 0)
         return;

参数max_replication_slots的设置,与 logical decoding有关

Before you can use logical decoding, you must set wal_level to logical and max_replication_slots
to at least 1.

在PostgreSQL 9.5.16 Documentation的文档中,我们找到如下描述的内容。

Replication slots provide an automated way to ensure that the master does not remove WAL segments until they have been received by all standbys, and that the master does not remove rows which could cause a recovery conflict even when the standby is disconnected.

复制槽提供了一种自动化的方法,以确保主服务器在所有备用服务器接收到WAL段之前不会删除它们,并且主服务器不会删除可能导致恢复冲突的行,即使在备用服务器断开连接时也是如此。

In lieu of using replication slots, it is possible to prevent the removal of old WAL segments using wal_keep_segments, or by storing the segments in an archive using archive_command. However, these methods often result in retaining more WAL segments than required, whereas replication slots retain only the number of segments known to be needed. An advantage of these methods is that they bound the space requirement for pg_xlog; there is currently no way to do this using replication slots.

Similarly, hot_standby_feedback and vacuum_defer_cleanup_age provide protection against relevant rows being removed by vacuum, but the former provides no protection during any time period when the standby is not connected, and the latter often needs to be set to a high value to provide adequate protection. Replication slots overcome these disadvantages.

接下来的两段,是在说明,在没有使用复制槽的时候,如何避免上面提到的两个问题

  1. 确保主服务器在所有备用服务器接收到WAL段之前不会删除它们
  2. 主服务器不会删除可能导致恢复冲突的行

很显然,使用复制槽能够更好的避免这些问题。

而我们通过源码,能够发现,将复制槽中内容,持续写入到文件的过程

/* write actual data */
     for (i = 0; i < max_replication_slots; i++)
     {
         ReplicationStateOnDisk disk_state;
         ReplicationState *curstate = &replication_states[i];
         XLogRecPtr  local_lsn;
 
         if (curstate->roident == InvalidRepOriginId)
             continue;
 
         /* zero, to avoid uninitialized padding bytes */
         memset(&disk_state, 0, sizeof(disk_state));
 
         LWLockAcquire(&curstate->lock, LW_SHARED);
 
         disk_state.roident = curstate->roident;
 
         disk_state.remote_lsn = curstate->remote_lsn;
         local_lsn = curstate->local_lsn;
 
         LWLockRelease(&curstate->lock);
 
         /* make sure we only write out a commit that's persistent */
         XLogFlush(local_lsn);
 
         errno = 0;
         if ((write(tmpfd, &disk_state, sizeof(disk_state))) !=
             sizeof(disk_state))
         {
             /* if write didn't set errno, assume problem is no disk space */
             if (errno == 0)
                 errno = ENOSPC;
             ereport(PANIC,
                     (errcode_for_file_access(),
                      errmsg("could not write to file \"%s\": %m",
                             tmppath)));
         }
 
         COMP_CRC32C(crc, &disk_state, sizeof(disk_state));
     }

2.如何避免该问题

避免该问题需要从两方面考虑

第一个,如果我们需要使用逻辑复制,也就是需要使用logical decoding,那么必须设置参数max_replication_slots大于1
当出现突然断电的情况,pg_logical/replorigin_checkpoint只创建了文件,没有来得及写数据的情况下,仍然会出现这种问题。
所以可能的解决办法是,当数据库启动异常时,将文件删除,确保数据库能够正常启动。

第二个,当我们不需要使用逻辑复制,那么只需要将参数max_replication_slots设置为0,则数据库自动不创建该文件,从而避免该问题。

相关文章
|
关系型数据库 数据库
Harbor断电重启postgres报错 could not locate a valid checkpoint record
Harbor断电重启postgres报错 could not locate a valid checkpoint record
449 0
|
测试技术
pg_rewind实例--could not find previous WAL record at %X/%X
pg_rewind实例--could not find previous WAL record at %X/%X
136 0
|
Oracle 前端开发 关系型数据库
log file sync 和 log file parallel write等待事件的区别和联系
log file parallel write 和log file sync这两个等待事件关系密切,很多人对这两个等待事件有一些误解,我们先来看看Oracle官方文档的解释:
191 0
|
SQL 弹性计算 关系型数据库
为什么pg_basebackup或pg_start_backup好像hang住确没有开始拷贝文件 - checkpoint 的几种调度(checkpoint_completion_target)
标签 PostgreSQL , checkpoint , 调度 , lazy , immediate , pg_start_backup , pg_basebackup 背景 PostgreSQL支持在线全量备份与增量归档备份。在线全量备份实际上就是拷贝文件,增量备份则分为两种,一种是基于BLOCK lsn变化的BLOCK即增量备份,另一种是基于WAL的持续归档文件备份。 全量备份通常
1612 0
|
SQL Oracle 关系型数据库
ORACLE等待事件: log file parallel write
log file parallel write概念介绍 log file parallel write 事件是LGWR进程专属的等待事件,发生在LGWR将日志缓冲区(log_buffer)中的重做日志信息写入联机重做日志文件组的成员文件,LGWR在该事件上等待该写入过程的完成。
1627 0
0322理解db file parallel read等待事件2
[20180322]理解db file parallel read等待事件2.txt --//上个星期的学习:http://blog.itpub.net/267265/viewspace-2151973/ https://docs.
1156 0
|
关系型数据库 Oracle Linux
0316理解db file parallel read等待事件
[20180316]理解db file parallel read等待事件.txt --//一直对db file parallel read等待事件不理解,因为在实际系统中很少遇到这样的等待事件.
1212 0