the error message :
Failed to kill processes for segment /home/gpuser/gpdb/data/gpseg-1: ([Errno 3] No such process)
the error message will show when i init greenplum or gpstop -a ,but it works right!
also i could to user it to psql postgres and create table,insert and so on.
please help me !
using english to describe the simple question to practice ,than you!
jczhang@jczhang:~/gpdb$ gpstart -a
20160126:11:23:07:002880 gpstart:jczhang:jczhang-[INFO]:-Starting gpstart with args: -a
20160126:11:23:07:002880 gpstart:jczhang:jczhang-[INFO]:-Gathering information and validating the environment...
20160126:11:23:07:002880 gpstart:jczhang:jczhang-[INFO]:-Greenplum Binary Version: 'postgres (Greenplum Database) 4.3.99.00 build dev'
20160126:11:23:07:002880 gpstart:jczhang:jczhang-[INFO]:-Greenplum Catalog Version: '301601051'
20160126:11:23:07:002880 gpstart:jczhang:jczhang-[INFO]:-Starting Master instance in admin mode
20160126:11:23:09:002880 gpstart:jczhang:jczhang-[INFO]:-Obtaining Greenplum Master catalog information
20160126:11:23:09:002880 gpstart:jczhang:jczhang-[INFO]:-Obtaining Segment details from master...
20160126:11:23:10:002880 gpstart:jczhang:jczhang-[INFO]:-Setting new master era
20160126:11:23:10:002880 gpstart:jczhang:jczhang-[INFO]:-Master Started...
20160126:11:23:10:002880 gpstart:jczhang:jczhang-[INFO]:-Shutting down master
20160126:11:23:12:002880 gpstart:jczhang:jczhang-[INFO]:-Commencing parallel segment instance startup, please wait...
........
20160126:11:23:20:002880 gpstart:jczhang:jczhang-[INFO]:-Process results...
20160126:11:23:20:002880 gpstart:jczhang:jczhang-[INFO]:-----------------------------------------------------
20160126:11:23:20:002880 gpstart:jczhang:jczhang-[INFO]:- Successful segment starts = 2
20160126:11:23:20:002880 gpstart:jczhang:jczhang-[INFO]:- Failed segment starts = 0
20160126:11:23:20:002880 gpstart:jczhang:jczhang-[INFO]:- Skipped segment starts (segments are marked down in configuration) = 0
20160126:11:23:20:002880 gpstart:jczhang:jczhang-[INFO]:-----------------------------------------------------
20160126:11:23:20:002880 gpstart:jczhang:jczhang-[INFO]:-
20160126:11:23:20:002880 gpstart:jczhang:jczhang-[INFO]:-Successfully started 2 of 2 segment instances
20160126:11:23:20:002880 gpstart:jczhang:jczhang-[INFO]:-----------------------------------------------------
20160126:11:23:20:002880 gpstart:jczhang:jczhang-[INFO]:-Starting Master instance jczhang directory /home/jczhang/gpdb/data/gpseg-1
20160126:11:23:21:002880 gpstart:jczhang:jczhang-[INFO]:-Command pg_ctl reports Master jczhang instance active
20160126:11:23:21:002880 gpstart:jczhang:jczhang-[INFO]:-No standby master configured. skipping...
20160126:11:23:21:002880 gpstart:jczhang:jczhang-[INFO]:-Database successfully started
上面是我启动过程
下面是ps x显示的进程
2986 ? Ss 0:00 /home/jczhang/gpdb/build/gpdb.master/bin/postgres -D /home/jczhang/gpdb/data/gpseg0 -p 40000 -b 2 -z 2 --silent-mode=true -i -M mirrorless -C 0
2987 ? Ss 0:00 /home/jczhang/gpdb/build/gpdb.master/bin/postgres -D /home/jczhang/gpdb/data/gpseg1 -p 40001 -b 3 -z 2 --silent-mode=true -i -M mirrorless -C 1
2988 ? Ss 0:00 postgres: port 40001, logger process
2989 ? Ss 0:00 postgres: port 40000, logger process
2996 ? Ss 0:00 postgres: port 40001, stats collector process
2997 ? Ss 0:00 postgres: port 40001, writer process
2998 ? Ss 0:00 postgres: port 40001, checkpoint process
2999 ? S 0:00 postgres: port 40001, sweeper process
3000 ? Ss 0:00 postgres: port 40000, stats collector process
3001 ? Ss 0:00 postgres: port 40000, writer process
3002 ? Ss 0:00 postgres: port 40000, checkpoint process
3003 ? S 0:00 postgres: port 40000, sweeper process
3015 ? Ss 0:00 /home/jczhang/gpdb/build/gpdb.master/bin/postgres -D /home/jczhang/gpdb/data/gpseg-1 -p 5432 -b 1 -z 2 --silent-mode=true -i -M master -C -1 -x 0 -E
3016 ? Ss 0:00 postgres: port 5432, master logger process
3019 ? Ss 0:00 postgres: port 5432, stats collector process
3020 ? Ss 0:00 postgres: port 5432, writer process
3021 ? Ss 0:00 postgres: port 5432, checkpoint process
3022 ? S 0:00 postgres: port 5432, seqserver process
3023 ? S 0:00 postgres: port 5432, ftsprobe process
3024 ? S 0:00 postgres: port 5432, sweeper process
可以看到master进程:3015是启动的
然后是关闭集群
20160126:11:24:17:003062 gpstop:jczhang:jczhang-[INFO]:-Starting gpstop with args: -a
20160126:11:24:17:003062 gpstop:jczhang:jczhang-[INFO]:-Gathering information and validating the environment...
20160126:11:24:17:003062 gpstop:jczhang:jczhang-[INFO]:-Obtaining Greenplum Master catalog information
20160126:11:24:17:003062 gpstop:jczhang:jczhang-[INFO]:-Obtaining Segment details from master...
20160126:11:24:17:003062 gpstop:jczhang:jczhang-[INFO]:-Greenplum Version: 'postgres (Greenplum Database) 4.3.99.00 build dev'
20160126:11:24:17:003062 gpstop:jczhang:jczhang-[INFO]:-There are 0 connections to the database
20160126:11:24:17:003062 gpstop:jczhang:jczhang-[INFO]:-Commencing Master instance shutdown with mode='smart'
20160126:11:24:17:003062 gpstop:jczhang:jczhang-[INFO]:-Master host=jczhang
20160126:11:24:17:003062 gpstop:jczhang:jczhang-[INFO]:-Commencing Master instance shutdown with mode=smart
20160126:11:24:17:003062 gpstop:jczhang:jczhang-[INFO]:-Master segment instance directory=/home/jczhang/gpdb/data/gpseg-1
20160126:11:24:18:003062 gpstop:jczhang:jczhang-[INFO]:-Attempting forceful termination of any leftover master process
20160126:11:24:18:003062 gpstop:jczhang:jczhang-[INFO]:-Terminating processes for segment /home/jczhang/gpdb/data/gpseg-1
20160126:11:24:18:003062 gpstop:jczhang:jczhang-[ERROR]:-Failed to kill processes for segment /home/jczhang/gpdb/data/gpseg-1: ([Errno 3] No such process)
20160126:11:24:18:003062 gpstop:jczhang:jczhang-[INFO]:-No standby master host configured
20160126:11:24:18:003062 gpstop:jczhang:jczhang-[INFO]:-Commencing parallel segment instance shutdown, please wait...
20160126:11:24:18:003062 gpstop:jczhang:jczhang-[INFO]:-0.00% of jobs completed
20160126:11:24:28:003062 gpstop:jczhang:jczhang-[INFO]:-100.00% of jobs completed
20160126:11:24:28:003062 gpstop:jczhang:jczhang-[INFO]:-----------------------------------------------------
20160126:11:24:28:003062 gpstop:jczhang:jczhang-[INFO]:- Segments stopped successfully = 2
20160126:11:24:28:003062 gpstop:jczhang:jczhang-[INFO]:- Segments with errors during stop = 0
20160126:11:24:28:003062 gpstop:jczhang:jczhang-[INFO]:-----------------------------------------------------
20160126:11:24:28:003062 gpstop:jczhang:jczhang-[INFO]:-Successfully shutdown 2 of 2 segment instances
20160126:11:24:28:003062 gpstop:jczhang:jczhang-[INFO]:-Database successfully shutdown with no errors reported
20160126:11:24:28:003062 gpstop:jczhang:jczhang-[INFO]:-Cleaning up leftover gpmmon process
20160126:11:24:28:003062 gpstop:jczhang:jczhang-[INFO]:-No leftover gpmmon process found
20160126:11:24:28:003062 gpstop:jczhang:jczhang-[INFO]:-Cleaning up leftover gpsmon processes
20160126:11:24:28:003062 gpstop:jczhang:jczhang-[INFO]:-No leftover gpsmon processes on some hosts. not attempting forceful termination on these hosts
20160126:11:24:28:003062 gpstop:jczhang:jczhang-[INFO]:-Cleaning up leftover shared memory
注意到有error提示了!
是不是在gpstop前,进程以及不在了呢?
看了一下gpstop的脚本,这个问题是由于KILL一个不存在的进程引起的,说明正常停库了。
gpstop会尝试先停库,然后不管有没有停掉,都会去KILL。
######
def _stop_master(self,masterOnly=False):
''' shutsdown the master '''
self.conn = dbconn.connect(self.dburl, utility=True)
self._stop_master_checks()
self.conn.close()
e = GpEraFile(self.master_datadir, logger=get_logger_if_verbose())
e.end_era()
logger.info("Commencing Master instance shutdown with mode=%s" % self.mode)
logger.info("Master segment instance directory=%s" % self.master_datadir)
cmd=gp.MasterStop("stopping master", self.master_datadir, mode=self.mode, timeout=self.timeout)
try:
cmd.run(validateAfter=True)
except:
# Didn't stop in timeout or pg_ctl failed. So try kill
(succeeded,mypid,file_datadir)=pg.ReadPostmasterTempFile.local("Read master tmp file", self.dburl.pgport).getResults()
if succeeded and file_datadir == self.master_datadir:
if unix.check_pid(mypid):
logger.info("Failed to shutdown master with pg_ctl.")
logger.info("Sending SIGQUIT signal...")
os.kill(mypid,signal.SIGQUIT)
time.sleep(5)
# Still not gone... try SIGABRT
if unix.check_pid(mypid):
logger.info("Sending SIGABRT signal...")
os.kill(mypid,signal.SIGABRT)
time.sleep(5)
if not unix.check_pid(mypid):
# Clean up files
lockfile="/tmp/.s.PGSQL.%s" % self.dburl.pgport
if os.path.exists(lockfile):
logger.info("Clearing segment instance lock files")
os.remove(lockfile)
logger.info('Attempting forceful termination of any leftover master process')
(succeeded,mypid,file_datadir)=pg.ReadPostmasterTempFile.local("Read master tmp file", self.dburl.pgport).getResults()
unix.kill_9_segment_processes(self.master_datadir, self.dburl.pgport, mypid)
logger.debug("Successfully shutdown the Master instance in admin mode")
GPSTOP的逻辑引起的错误,首先GPSTOP会正常关闭master,然后会KILL master。
如果正常关闭了,gpstop还是会去kill。所以只要正常关闭了master,就一定会报这个ERROR。
代码在gpstop中可以看到。
Seems that the process with corresponding pid you want to kill does not exist. Maybe not a bug.
版权声明:本文内容由阿里云实名注册用户自发贡献,版权归原作者所有,阿里云开发者社区不拥有其著作权,亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容,填写侵权投诉表单进行举报,一经查实,本社区将立刻删除涉嫌侵权内容。