snapshot操作在硬盘上形式:
/hbase/.snapshots
/.tmp <---- working directory
/[snapshot name] <----- completed snapshot
当snapshot完成时的形式展示:
/hbase/.snapshots/[snapshot name]
.snapshotinfo <--- Description of the snapshot
.tableinfo <--- Copy of the tableinfo
/.logs
/[server_name]
/... [log files]
...
/[region name] <---- All the region's information
.regioninfo <---- Copy of the HRegionInfo
/[column family name]
/[hfile name] <--- name of the hfile in the real region
...
...
snapshot基本步骤:
1.执行前会枷锁操作,不允许删除添加操作;
2.在hdfs在创建指定目录,写入相关的信息进去;
3.刷新memstore中的数据到hfile,
4.为hfile文件创建引用指针.
以下是大体的代码流程。
hbaseAdmin执行发起的snapshot:
public void snapshot(final String snapshotName, final TableName tableName, SnapshotDescription.Type type) throws IOException, SnapshotCreationException, IllegalArgumentException {
SnapshotDescription.Builder builder = SnapshotDescription.newBuilder();
builder.setTable(tableName.getNameAsString());
builder.setName(snapshotName);
builder.setType(type);
snapshot(builder.build());
}
执行快照并等待服务器完成该快照(阻止)。HBase实例一次只能有一个快照,或者结果可能是未定义(你可以告诉多个HBase集群同时快照,但只有一个在单个群集同时)。
public void snapshot(SnapshotDescription snapshot) throws IOException, SnapshotCreationException, IllegalArgumentException {
// actually take the snapshot
SnapshotResponse response = takeSnapshotAsync(snapshot);
MasterRpcService:异步触发并完成一次snapshot:
`master.snapshotManager.takeSnapshot(snapshot);`
SnapshotManager类:完成一次snapshot需要根据表的状态:disabled或者enabled
if (assignmentMgr.getTableStateManager().isTableState(snapshotTable, ZooKeeperProtos.Table.State.ENABLED)) {
LOG.debug("Table enabled, starting distributed snapshot.");
snapshotEnabledTable(snapshot);
LOG.debug("Started snapshot: " + ClientSnapshotDescriptionUtils.toString(snapshot));
}
// For disabled table, snapshot is created by the master
else if (assignmentMgr.getTableStateManager().isTableState(snapshotTable, ZooKeeperProtos.Table.State.DISABLED)) {
LOG.debug("Table is disabled, running snapshot entirely on master.");
snapshotDisabledTable(snapshot);
LOG.debug("Started snapshot: " + ClientSnapshotDescriptionUtils.toString(snapshot));
}
private synchronized void snapshotEnabledTable(SnapshotDescription snapshot) throws HBaseSnapshotException {
// setup the snapshot
prepareToTakeSnapshot(snapshot);
// Take the snapshot of the enabled table
EnabledTableSnapshotHandler handler = new EnabledTableSnapshotHandler(snapshot, master, this);
snapshotTable(snapshot, handler);
}
enabled状态下执行表的snapshot:
// setup the snapshot
准备工作
prepareToTakeSnapshot(snapshot);
// Take the snapshot of the enabled table
EnabledTableSnapshotHandler handler = new EnabledTableSnapshotHandler(snapshot, master, this);
开始执行snapshot
snapshotTable(snapshot, handler);
}
snapshot开始之前的设置准备:检查是否有一个在运行的snapshot工作以及还原snapshot工作的请求存在。#
// make sure we aren't already running a snapshot
if (isTakingSnapshot(snapshot)) {
SnapshotSentinel handler = this.snapshotHandlers.get(snapshotTable);
throw new SnapshotCreationException("Rejected taking " + ClientSnapshotDescriptionUtils.toString(snapshot) + " because we are already running another snapshot " + (handler != null ? ("on the same table " + ClientSnapshotDescriptionUtils.toString(handler.getSnapshot())) : "with the same name"), snapshot);
}
// make sure we aren't running a restore on the same table
if (isRestoringTable(snapshotTable)) {
SnapshotSentinel handler = restoreHandlers.get(snapshotTable);
throw new SnapshotCreationException("Rejected taking " + ClientSnapshotDescriptionUtils.toString(snapshot) + " because we are already have a restore in progress on the same snapshot " + ClientSnapshotDescriptionUtils.toString(handler.getSnapshot()), snapshot);
}
try {
// delete the working directory, since we aren't running the snapshot. Likely leftovers
// from a failed attempt.
fs.delete(workingDir, true);
// recreate the working directory for the snapshot
if (!fs.mkdirs(workingDir)) {
throw new SnapshotCreationException("Couldn't create working directory (" + workingDir + ") for snapshot", snapshot);
}
设置准备工作完成就开始进行snapshot用指定的handler进行snapshot工作:
handler.prepare();
this.executorService.submit(handler);
this.snapshotHandlers.put(TableName.valueOf(snapshot.getTable()), handler);
...
TakeSnapshotHandler真正开始处理snapshot操作:
1.将snapshot描述信息写入.snapshotinfo目录
FsPermission perms = FSUtils.getFilePermissions(fs, fs.getConf(), HConstants.DATA_FILE_UMASK_KEY);
Path snapshotInfo = new Path(workingDir, SnapshotDescriptionUtils.SNAPSHOTINFO_FILE);
try {
FSDataOutputStream out = FSUtils.create(fs, snapshotInfo, perms, true);
try {
snapshot.writeTo(out);
} finally {
out.close();
}
}
2.复制表的信息:
snapshotManifest.addTableDescriptor(this.htd);
3.获取hregionserver上的regions以及位置信息 ##:
List<Pair<HRegionInfo, ServerName>> regionsAndLocations;
if (TableName.META_TABLE_NAME.equals(snapshotTable)) {
regionsAndLocations = new MetaTableLocator().getMetaRegionsAndLocations(server.getZooKeeper());
} else {
regionsAndLocations = MetaTableAccessor.getTableRegionsAndLocations(server.getZooKeeper(), server.getConnection(), snapshotTable, false);
}
4.开始执行snapshot操作,上面获取到的region信息及位置信息
// run the snapshot
snapshotRegions(regionsAndLocations);
启动snapshot程序:::
在regionserver上开始snapshot // start the snapshot on the RS所有的snapshot操作的具体细节
Procedure proc = coordinator.startProcedure(this.monitor, this.snapshot.getName(), this.snapshot.toByteArray(),
Lists.newArrayList(regionServers));
if (proc == null) {
String msg = "Failed to submit distributed procedure for snapshot '" + snapshot.getName() + "'";
LOG.error(msg);
throw new HBaseSnapshotException(msg);
}
等待snapshot完成:
proc.waitForCompleted();
将下线的region作为disabled处理
// Take the offline regions as disabled
for (Pair<HRegionInfo, ServerName> region : regions) {
HRegionInfo regionInfo = region.getFirst();
if (regionInfo.isOffline() && (regionInfo.isSplit() || regionInfo.isSplitParent())) {
LOG.info("Take disabled snapshot of offline region=" + regionInfo);
snapshotDisabledRegion(regionInfo);
}
}
5.相关region信息以及servername,用来验证snapshot的有效性
// extract each pair to separate lists
Set<String> serverNames = new HashSet<String>();
for (Pair<HRegionInfo, ServerName> p : regionsAndLocations) {
if (p != null && p.getFirst() != null && p.getSecond() != null) {
HRegionInfo hri = p.getFirst();
if (hri.isOffline() && (hri.isSplit() || hri.isSplitParent()))
continue;
serverNames.add(p.getSecond().toString());
}
}
6.刷新内存状态,写snapshot-mnifest信息到目录
// flush the in-memory state, and write the single manifest
status.setStatus("Consolidate snapshot: " + snapshot.getName());
snapshotManifest.consolidate();
7.开始验证snapshot的有效性
// verify the snapshot is valid
status.setStatus("Verifying snapshot: " + snapshot.getName());
verifier.verifySnapshot(this.workingDir, serverNames);
8.完成snapshot,转移目录等
// complete the snapshot, atomically moving from tmp to .snapshot dir.
completeSnapshot(this.snapshotDir, this.workingDir, this.fs);
msg = "Snapshot " + snapshot.getName() + " of table " + snapshotTable + " completed";
status.markComplete(msg);
LOG.info(msg);
metricsSnapshot.addSnapshot(status.getCompletionTimestamp() - status.getStartTime());