这一期我们讨论Designing Data-Intensive Applications书中partitioning这一章的学习笔记。
Show Notes
这一期我们讨论Designing Data-Intensive Applications书中partitioning这一章的学习笔记。
🔴 这一期偏重技术话题,我们会用很多英文表述技术性专有名词。之前有朋友反馈过中英夹杂对大家收听不方便,希望在意的朋友见谅。如果有不准确或者过时的地方欢迎指正。
# Show Notes
- 📕 Designing Data-Intensive Applications
- What is partitioning?
- A partition is a division of a logical database or its constituent elements into distinct independent parts.
- Main reason: scalability - the query load can be distributed across many processors.
- Youtube / Vitess scaling story
- Single MySQL → Add read replica → Write can’t catchup up → Partition
- How to partition?
- Partitioning by Key Range (e.g., Bigtable)
- Assign a continuous range of keys to each partition
- Pro: range scan is easier, data locality
- Cons: certain access patterns can lead to hot spots (timestamp)
- Cons: finding split points and managing rebalancing is hard
- Partitioning by Hash
- Good hash function: uniformly distribute keys
- Con: no easy range queries
- Cassandra does KKV (partitioning key, sort key, value)
- Hot spots: 3% of Twitter's Servers Dedicated to Justin Bieber
- Secondary indexes: Local index
- Efficient write, expensive read
- ElasticSearch
- Secondary indexes: Global index
- Rebalancing partitions
- Move loads to other nodes
- Fixed number of partitions
- New node steals partitions from every existing node
- Notion: 480 partitions
- Dynamic partitioning
- 📈: split partition into 2
- 📉: merge 2 partitions into 1
- Fixed number of partitions per node
- Operations: full automatic (dangerous) / semi-automatic / full manual (tedious)
- Request Routing
- 3 approaches: nodes talk to each other, separate routing tier, smart client
- Separate coordination service such as ZooKeeper
- Notes by xg
# 联系方式
What is Eng Cafe?
程序员喝咖啡的时候都谈论些什么