mongoDB Indexing Advice

简介:
mongoDB建索引的几个建议:
1. 建立索引的前提是查询条件,建立完之后explain查看有索引和无索引的开销对比.能建立唯一的索引尽量唯一.
如:
> for (i=0;i<10000;i++) {
... db.userinfo.insert({"firstname" : "zhou" + i,"lastname" : "digoal" + i,"age" : i,"city" : "HangZhou"})
... }
> db.userinfo.ensureIndex({"firstname" : 1,"lastname" : 1,"age" : 1})
# 当查询条件中含有驱动列时优化器将选择索引查询
> db.userinfo.find({"firstname" : "zhou100"},{"city" : 1}).explain()
{
        "cursor" : "BtreeCursor firstname_1_lastname_1_age_1",
        "nscanned" : 1,
        "nscannedObjects" : 1,
        "n" : 1,
        "millis" : 0,
        "indexBounds" : {
                "firstname" : [
                        [
                                "zhou100",
                                "zhou100"
                        ]
                ],
                "lastname" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ],
                "age" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        }
}
# 当查询条件中不包含驱动列时,不走索引.
> db.userinfo.find({"lastname" : "digoal100"},{"city" : 1}).explain()                                    
{
        "cursor" : "BasicCursor",
        "nscanned" : 10000,
        "nscannedObjects" : 10000,
        "n" : 1,
        "millis" : 5,
        "indexBounds" : {

        }
}
# 当然,你可以强制使用索引,不过millis就不是5了,增加到16
> db.userinfo.find({"lastname" : "digoal100"},{"city" : 1}).hint({"firstname" : 1,"lastname" : 1,"age" : 1}).explain()
{
        "cursor" : "BtreeCursor firstname_1_lastname_1_age_1",
        "nscanned" : 1,
        "nscannedObjects" : 1,
        "n" : 1,
        "millis" : 16,
        "indexBounds" : {
                "firstname" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ],
                "lastname" : [
                        [
                                "digoal100",
                                "digoal100"
                        ]
                ],
                "age" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ]
        }
}
> db.userinfo.find({"lastname" : "digoal100"},{"city" : 1}).hint({"firstname" : 1,"lastname" : 1,"age" : 1})
{ "_id" : ObjectId("4d266637c119c7060b212fd6"), "city" : "HangZhou" }

2. 索引一定要建立在选择性好的列上面,对于选择性不好的列建索引效果不好.(对于选择性不好的列建索引需要考虑复合索引)
如下:当使用选择性不好的索引时,查询时间分别是8ms,26ms,而不使用索引时 分别是6ms,9ms,因此选择性不好的索引千万别建。
> db.userinfo.dropIndex({"city" : 1})                                                                       
{ "nIndexesWas" : 3, "ok" : 1 }
> db.userinfo.find({"city" : "HangZhou"}).explain()                                                                   
{
        "cursor" : "BasicCursor",
        "nscanned" : 10002,
        "nscannedObjects" : 10002,
        "n" : 10002,
        "millis" : 6,
        "indexBounds" : {

        }
}
> db.userinfo.find({"city" : "HangZhou","age" : 27}).explain()
{
        "cursor" : "BasicCursor",
        "nscanned" : 10002,
        "nscannedObjects" : 10002,
        "n" : 3,
        "millis" : 9,
        "indexBounds" : {

        }
}
> db.userinfo.ensureIndex({"city" : 1}) 
> db.userinfo.find({"city" : "HangZhou"}).explain()
{
        "cursor" : "BtreeCursor city_1",
        "nscanned" : 10002,
        "nscannedObjects" : 10002,
        "n" : 10002,
        "millis" : 8,
        "indexBounds" : {
                "city" : [
                        [
                                "HangZhou",
                                "HangZhou"
                        ]
                ]
        }
}
> db.userinfo.find({"city" : "HangZhou","age" : 27}).explain()
{
        "cursor" : "BtreeCursor city_1",
        "nscanned" : 10002,
        "nscannedObjects" : 10002,
        "n" : 3,
        "millis" : 26,
        "indexBounds" : {
                "city" : [
                        [
                                "HangZhou",
                                "HangZhou"
                        ]
                ]
        }
}
# 使用复合索引替代选择性不好的索引,返回时间= 0 ms
> db.userinfo.ensureIndex({"city" : 1,"age" : 1})             
> db.userinfo.find({"city" : "HangZhou","age" : 27}).explain()
{
        "cursor" : "BtreeCursor city_1_age_1",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 3,
        "millis" : 0,
        "indexBounds" : {
                "city" : [
                        [
                                "HangZhou",
                                "HangZhou"
                        ]
                ],
                "age" : [
                        [
                                27,
                                27
                        ]
                ]
        }
}

3. 当索引有多个可选时,mongoDB对索引的选择会是怎么样呢?
> db.userinfo.getIndexes()
[
        {
                "name" : "_id_",
                "ns" : "test.userinfo",
                "key" : {
                        "_id" : 1
                }
        },
        {
                "_id" : ObjectId("4d267377c119c7060b215687"),
                "ns" : "test.userinfo",
                "key" : {
                        "city" : 1,
                        "age" : 1
                },
                "name" : "city_1_age_1"
        },
        {
                "_id" : ObjectId("4d2677aac119c7060b215688"),
                "ns" : "test.userinfo",
                "key" : {
                        "firstname" : 1,
                        "lastname" : 1,
                        "age" : 1
                },
                "name" : "firstname_1_lastname_1_age_1"
        }
]
# 同样的结果,同样的索引,按照firstname和lastname排序,消耗的时间一个是0ms  另一个是18ms,这个有点纳闷
> db.userinfo.find({"city" : "HangZhou","age" : 27}).sort({"firstname" :1}).explain()
{
        "cursor" : "BtreeCursor city_1_age_1",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 3,
        "scanAndOrder" : true,
        "millis" : 18,
        "indexBounds" : {
                "city" : [
                        [
                                "HangZhou",
                                "HangZhou"
                        ]
                ],
                "age" : [
                        [
                                27,
                                27
                        ]
                ]
        }
}
> db.userinfo.find({"city" : "HangZhou","age" : 27}).sort({"lastname" :1}).explain() 
{
        "cursor" : "BtreeCursor city_1_age_1",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 3,
        "scanAndOrder" : true,
        "millis" : 0,
        "indexBounds" : {
                "city" : [
                        [
                                "HangZhou",
                                "HangZhou"
                        ]
                ],
                "age" : [
                        [
                                27,
                                27
                        ]
                ]
        }
}
# 偶尔也会走另一个索引{"firstname" :1,"lastname" :1,"age" :1}
# 强制使用另一个索引的开销18MS
> db.userinfo.find({"city" : "HangZhou","age" : 27}).sort({"firstname" : 1}).hint({"firstname" :1,"lastname" :1,"age" :1}).explain>
{
        "cursor" : "BtreeCursor firstname_1_lastname_1_age_1",
        "nscanned" : 3,
        "nscannedObjects" : 3,
        "n" : 3,
        "millis" : 18,
        "indexBounds" : {
                "firstname" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ],
                "lastname" : [
                        [
                                {
                                        "$minElement" : 1
                                },
                                {
                                        "$maxElement" : 1
                                }
                        ]
                ],
                "age" : [
                        [
                                27,
                                27
                        ]
                ]
        }
}
# 强制全表扫描的开销是9ms
> db.userinfo.find({"city" : "HangZhou","age" : 27}).sort({"firstname" : 1}).hint({"$natural":1}).explain()
{
        "cursor" : "BasicCursor",
        "nscanned" : 10002,
        "nscannedObjects" : 10002,
        "n" : 3,
        "scanAndOrder" : true,
        "millis" : 9,
        "indexBounds" : {

        }
}
# 看出来mongoDB是索引优先啊。

4. 确保索引可以放入内存。
当索引的大小不能被pin在内存中,那是比较悲剧的事情。返回bytes
> db.userinfo.totalIndexSize()
1089536

5. 权衡数据读写操作比例,索引多了对写就有影响.(MySQL的MyISAM引擎有一个比较好的参数DELAY_KEY_WRITE可以延时写索引,一定程度上缓解了索引对写带来的负担)

6. MongoDB's $ne or $nin operator's aren't efficient with indexes.
  • When excluding just a few documents, it's better to retrieve extra rows from MongoDB and do the exclusion on the client side.
      加入遇到这种情况,建议要么删除INDEX要么用HINT指定全表扫描.

7.  As your collection grows, you’ll need to create indexes for any large sorts your queries are doing. If you call sort on a key that is not indexed, MongoDB needs to pull all of that data into memory to sort it. Thus, there’s a limit on the amount you can sort without an index: you can’t sort a terabyte of data in memory. Once your collection is too big to sort in memory, MongoDB will just return an error for queries that try. Indexing the sort allows MongoDB to pull the sorted data in order, allowing you to sort any amount of data without running out of memory.

注意事项:
1. explain不处理insert操作,但是实际上insert已经操作,如:
> db.userinfo.insert({"firstname" : "zhou" ,"lastname" : "digoal" , "age" : 27 , "city" : "HangZhou"}).explain()
Fri Jan  7 09:44:01 TypeError: db.userinfo.insert({firstname:"zhou", lastname:"digoal", age:27, city:"HangZhou"}) has no properties (shell):0
> db.userinfo.insert({"firstname" : "zhou" ,"lastname" : "digoal" , "age" : 27 , "city" : "HangZhou"})          
> db.userinfo.find({"firstname" : "zhou"})
{ "_id" : ObjectId("4d266fe1c119c7060b215684"), "firstname" : "zhou", "lastname" : "digoal", "age" : 27, "city" : "HangZhou" }
{ "_id" : ObjectId("4d266fedc119c7060b215685"), "firstname" : "zhou", "lastname" : "digoal", "age" : 27, "city" : "HangZhou" }

其他:
1. explain output 
  • cursor: the value for cursor can be either BasicCursor or BtreeCursor. The second of these indicates that the given query is using an index.
  • nscanned: he number of documents scanned.
  • n: the number of documents returned by the query. You want the value of n to be close to the value of nscanned. What you want to avoid is doing a collection scan, that is, where every document in the collection is accessed. This is the case whennscanned is equal to the number of documents in the collection.
  • millis: the number of milliseconds require to complete the query. This value is useful for comparing indexing strategies, indexed vs. non-indexed queries, etc.
目录
相关文章
|
5月前
|
NoSQL MongoDB 数据库
数据库数据恢复—MongoDB数据库数据恢复案例
MongoDB数据库数据恢复环境: 一台操作系统为Windows Server的虚拟机上部署MongoDB数据库。 MongoDB数据库故障: 工作人员在MongoDB服务仍然开启的情况下将MongoDB数据库文件拷贝到其他分区,数据复制完成后将MongoDB数据库原先所在的分区进行了格式化操作。 结果发现拷贝过去的数据无法使用。管理员又将数据拷贝回原始分区,MongoDB服务仍然无法使用,报错“Windows无法启动MongoDB服务(位于 本地计算机 上)错误1067:进程意外终止。”
|
5月前
|
缓存 NoSQL Linux
在CentOS 7系统中彻底移除MongoDB数据库的步骤
以上步骤完成后,MongoDB应该会从您的CentOS 7系统中被彻底移除。在执行上述操作前,请确保已经备份好所有重要数据以防丢失。这些步骤操作需要一些基本的Linux系统管理知识,若您对某一步骤不是非常清楚,请先进行必要的学习或咨询专业人士。在执行系统级操作时,推荐在实施前创建系统快照或备份,以便在出现问题时能够恢复到原先的状态。
459 79
|
5月前
|
存储 NoSQL MongoDB
MongoDB数据库详解-针对大型分布式项目采用的原因以及基础原理和发展-卓伊凡|贝贝|莉莉
MongoDB数据库详解-针对大型分布式项目采用的原因以及基础原理和发展-卓伊凡|贝贝|莉莉
279 8
MongoDB数据库详解-针对大型分布式项目采用的原因以及基础原理和发展-卓伊凡|贝贝|莉莉
|
4月前
|
运维 NoSQL 容灾
告别运维噩梦:手把手教你将自建 MongoDB 平滑迁移至云数据库
程序员为何逃离自建MongoDB?扩容困难、运维复杂、高可用性差成痛点。阿里云MongoDB提供分钟级扩容、自动诊断与高可用保障,助力企业高效运维、降本增效,实现数据库“无感运维”。
|
8月前
|
NoSQL MongoDB 数据库
数据库数据恢复——MongoDB数据库服务无法启动的数据恢复案例
MongoDB数据库数据恢复环境: 一台Windows Server操作系统虚拟机上部署MongoDB数据库。 MongoDB数据库故障: 管理员在未关闭MongoDB服务的情况下拷贝数据库文件。将MongoDB数据库文件拷贝到其他分区后,对MongoDB数据库所在原分区进行了格式化操作。格式化完成后将数据库文件拷回原分区,并重新启动MongoDB服务。发现服务无法启动并报错。

推荐镜像

更多