In-place update in WiredTiger

本文涉及的产品
RDS MySQL DuckDB 分析主实例,基础系列 4核8GB
RDS AI 助手,专业版
RDS MySQL DuckDB 分析主实例,集群系列 4核8GB
简介:

There is a great new feature in the release note of MongoDB 3.5.12.

Faster In-place Updates in WiredTiger

This work brings improvements to in-place update workloads for users running the WiredTiger engine, especially for updates to large documents. Some workloads may see a reduction of up to 7x in disk utilization (from 24 MB/s to 3 MB/s) as well as a 20% improvement in throughput.

I thought wiredtiger has impeletementd the delta page feature introduced in the bw-tree paper, that is, writing pages that are deltas from previously written pages. But after I read the source code, I found it's a totally diffirent idea, in-place update only impacted the in-meomry and journal format, the on disk layout of data is not changed.

I will explain the core of the in-place update implementation.

MongoDB introduced mutable bson to descirbe document update as incremental(delta) update.

Mutable BSON provides classes to facilitate the manipulation of existing BSON objects or the construction of new BSON objects from scratch in an incremental fashion.

Suppose you have a very large document, see 1MB

{
   _id: ObjectId("59097118be4a61d87415cd15"),
   name: "ZhangYoudong",
   birthday: "xxxx",
   fightvalue: 100,
   xxx: .... // many other fields
}

If the fightvalue is changed from 100 to 101, you can use a DamageEvent to describe the update, it just tells you the offset、size、content(kept in another array) of the change.

struct DamageEvent {
    typedef uint32_t OffsetSizeType;
    // Offset of source data (in some buffer held elsewhere).
    OffsetSizeType sourceOffset;

    // Offset of target data (in some buffer held elsewhere).
    OffsetSizeType targetOffset;

    // Size of the damage region.
    size_t size;
};

So if you have many small changes for a document, you will have DamageEvent array, MongoDB add a new storage interface to support inserting DamageEvent array (DamageVector).

bool WiredTigerRecordStore::updateWithDamagesSupported() const {
    return true;
}

StatusWith<RecordData> WiredTigerRecordStore::updateWithDamages(
    OperationContext* opCtx,
    const RecordId& id,
    const RecordData& oldRec,
    const char* damageSource,
    const mutablebson::DamageVector& damages) {

}

WiredTiger added a new update type called WT_UPDATE_MODIFIED to support MongoDB, when a WT_UPDATE_MODIFIED update happened, wiredTiger first logged a change list which is transformed from DamageVector into journal, then kept the change list in memory associated with the original record.

When the record is read, wiredTiger will first read the original record, then apply every operation in change list, returned the final record to the client.

So the core for in-place update:

  1. WiredTiger support delta update in memory and journal, so the IO of writing journal will be greatly reduced for large document.
  2. WiredTiger's data layout is kept unchanged, so the IO of writing data is not changed.
相关文章
|
JavaScript 小程序 前端开发
【手把手教教学物联网项目】01 视频大纲
《手把手教教学物联网项目》是一系列视频教程,旨在引导初学者掌握物联网技术。视频涵盖物联网基础,如物联网概述、架构和技术;STM32微控制器的介绍、编程及外设使用;网关开发,涉及ESP8266和ESP32;物联网通信协议如TCP、MQTT、Modbus等;物联网总线协议如单总线、CAN、IIC和SPI;OLED显示原理与驱动;MQTT服务器搭建;物联网云平台介绍,包括阿里云平台的使用;微信小程序开发入门及前端VUE项目实践。此外,教程还涉及UniAPP和SpringBoot后台开发,最后通过“智能取餐柜”项目将理论知识付诸实践。视频可在B站找到,适合学生、爱好者和开发人员学习物联网技术。
1148 12
【手把手教教学物联网项目】01 视频大纲
|
Java BI 开发工具
静态代码自动扫描p3c的使用
静态代码自动扫描p3c的使用
1270 0
|
Oracle JavaScript Java
JDK的版本迭代特性(JDK9 - JDK20)
JDK的版本迭代特性(JDK9 - JDK20)
|
7月前
|
前端开发 数据可视化 JavaScript
惊喜! Github 10k+ star 的国产流程图框架,LogicFlow 能解你的图编辑痛点?
LogicFlow 是一款高效、灵活的流程图编辑框架,支持可视化渲染、自定义节点、插件扩展及前端执行。适用于审批流、ER 图、低代码平台等多种场景,具备清晰架构与活跃社区,助力开发者快速实现专业流程图编辑与执行。
1133 1
|
缓存 前端开发 数据可视化
Webpack Bundle Analyzer:深入分析与优化你的包
Webpack Bundle Analyzer是一款可视化工具,帮助分析Webpack构建结果,找出占用空间较大的模块以便优化。首先需安装Webpack和Webpack Bundle Analyzer,接着在`webpack.config.js`中配置插件。运行Webpack后,会在`dist`目录生成`report.html`,展示交互式图表分析包大小分布。为优化可采用代码分割、Tree Shaking、压缩插件、加载器优化、模块懒加载、代码预热、提取公共库、使用CDN、图片优化、利用缓存、避免重复模块、使用Source Maps、优化字体和图标、避免全局样式污染以及优化HTML输出等策略。
758 3
|
Java Spring
Javaweb之SpringBootWeb案例之AOP通知类型的详细解析
Javaweb之SpringBootWeb案例之AOP通知类型的详细解析
262 0
|
存储 安全 Java
Java 注解详解
在 Java 编程中,注解(Annotation)是一种元数据,它提供了关于程序代码的额外信息。注解不直接影响程序的执行,但可以在运行时提供有关程序的信息,或者让编译器执行额外的检查。 本文将详细介绍 Java 注解的基本概念、内置注解和自定义注解的创建与使用。
225 0
|
存储 缓存 运维
基础篇丨链路追踪(Tracing)其实很简单
基础篇丨链路追踪(Tracing)其实很简单
基础篇丨链路追踪(Tracing)其实很简单
|
存储 NoSQL 数据库
MongoDB Sharded cluster架构原理
为什么需要Sharded cluster? MongoDB目前3大核心优势:『灵活模式』+ 『高可用性』 + 『可扩展性』,通过json文档来实现灵活模式,通过复制集来保证高可用,通过Sharded cluster来保证可扩展性。 当MongoDB复制集遇到下面的业务场景时,你就需要考虑使用Sh
42739 155
|
监控 负载均衡 前端开发
微服务架构之链路追踪原理
微服务架构是一个分布式架构,它按业务划分服务单元,一个分布式系统往往有很多个服务单元。由于服务单元数量众多,业务的复杂性,如果出现了错误和异常,很难去定位。主要体现在,一个请求可能需要调用很多个服务,而内部服务的调用复杂性,决定了问题难以定位。
1282 0