作者: 啦啦啦啦啦
一.背景
印象里 2022 年初的时候,有小伙伴在年初的时候。 asktug 提出 TiDB 未来是否会出现多租户功能的问题,没想到这么快就已经有了 GA 了。资源控制的特点(Resource Control)可以说是 TiDB 7.1 其中最亮眼的功能之一使 TiDB 具有隔离多租户的能力。资源控制特性的合理利用可以减少集群数量,降低运维难度和管理成本。近期因为 MySQL 数据归档问题忙了好几个晚上,现有的 TiDB 集群平均负荷不高,因此产生了多套 MySQL 业务迁移到一套 TiDB 集群的想法,但又怕多个业务会产生资源竞争,影响核心业务。Resource Control 可以解决这个问题,借此机会对这个特性进行简单的测试。
二、测试环境准备
1.硬件配置和集群拓扑规划
Role
Host
Ports
alertmanager
10.0.0.1
9093/9094
grafana
10.0.0.1
3000
pd
10.0.0.1
2379/2380
tidb
10.0.0.1
4000/10080
tikv
10.0.0.2
20160/20180
tikv
10.0.0.3
20160/20180
tikv
10.0.0.4
20160/20180
由于硬件条件有限, 使用 4 配置为台云主机 8C 32G 200G 普通 SAS 硬盘。但本次测试仅针对资源控制的特点进行测试,对硬件的要求并不是特别严格。TiDB 7.1 默认情况下,基于资源组配额的资源组流量控制和请求调度参数将被打开,因此参数没有特别调整。
2. 资源控制配置
1.估计集群容量
由于它是一个测试集群,没有真正的业务流量,我们使用基于硬件部署的估计容量,而不是根据实际负载更准确地估计容量。
MySQL [(none)]> CALIBRATE RESOURCE WORKLOAD TPCC;+-------+| QUOTA |+-------+| 69768 |+-------+1 row in set (0.01 sec)MySQL [(none)]> CALIBRATE RESOURCE WORKLOAD OLTP_WRITE_ONLY;+-------+| QUOTA |+-------+| 73184 |+-------+1 row in set (0.00 sec)MySQL [(none)]> CALIBRATE RESOURCE WORKLOAD OLTP_READ_WRITE;+-------+| QUOTA |+-------+| 29772 |+-------+1 row in set (0.01 sec)MySQL [(none)]> CALIBRATE RESOURCE WORKLOAD OLTP_READ_ONLY;+-------+| QUOTA |+-------+| 13970 |+-------+1 row in set (0.00 sec)
当然,dashboard也可以直观地看到
2.创建资源组
创建cc1
限额为每秒的资源组 10000 RU,并允许资源组的应用超额占用资源,设置绝对优先级HIGH
。
CREATE RESOURCE GROUP IF NOT EXISTS cc1 RU_PER_SEC = 10000 BURSTABLE HIGH PRIORITY = HIGH;
创建cc2
限额为每秒的资源组 2000 RU,当系统资源充足时,资源组的应用不允许超额占用资源,设置绝对优先级LOW
。
CREATE RESOURCE GROUP IF NOT EXISTS cc2 RU_PER_SEC = 2000 PRIORITY = LOW;
3.绑定资源组
将用户绑定到资源组
MySQL [(none)]> CREATE USER "cc1"@" IDENTIFIED BY 'test' RESOURCE GROUP cc1;Query OK, 0 rows affected (0.54 sec)MySQL [(none)]> CREATE USER "cc2""" IDENTIFIED BY 'test' RESOURCE GROUP cc2;Query OK, 0 rows affected (0.02 sec)MySQL [(none)]> grant all on cctest1.* to "cc1"@";Query OK, 0 rows affected (0.28 sec)MySQL [(none)]> grant all on cctest2.* to "cc2""";Query OK, 0 rows affected (0.13 sec)
三、性能测试
1.未进行资源控制
首先关闭资源控制
SET GLOBAL tidb_enable_resource_control = 'OFF';
生成 10 张 1kw表选择 oltp_read_write 测试场景
使用sysbench对库ctest1进行压测,模拟业务1
sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest1 --mysql-user=cc1 --mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run
[ 10s ] thds: 32 tps: 1218.30 qps: 24404.43 (r/w/o: 17088.15/4800.41/2515.87 lat (ms,95%): 36.89 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 32 tps: 1175.40 qps: 23512.93 (r/w/o: 16456.85/4632.38/2423.70 lat (ms,95%): 42.61 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 32 tps: 1206.80 qps: 24140.37 (r/w/o: 16901.68/4750.89/2487.80 lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 32 tps: 1211.18 qps: 24219.65 (r/w/o: 16952.28/4768.21/249.15 lat (ms,95%): 38.25 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 32 tps: 1180.30 qps: 23605.52 (r/w/o: 16524.24/4649.48/2431.79) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 32 tps: 1227.51 qps: 24545.73 (r/w/o: 17181.29/4837.82/2526.61 lat (ms,95%): 36.24 err/s: 0.00 reconn/s: 0.00SQL statistics: queries performed: read: 1011220 write: 284490 other: 148890 total: 1444600 transactions: 72230 (1203.28 per sec.) queries: 1444600 (24065.53 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.)General statistics: total time: 60.0264s total number of events: 7230Latency (ms): min: 14.43 avg: 26.59 max: 90.19 95th percentile: 38.94 sum: 1920326.91threads fairness: events (avg/stddev): 2257.1875/10.55 execution time (avg/stddev): 60.0102/0.01
平均 QPS 24065.53
使用 sysbench 对库 cctest1 及 cctest2 压力测量,模拟业务 1 和业务 2 同时进行
sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest1 --mysql-user=cc1 --mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run
[ 10s ] thds: 32 tps: 728.88 qps: 14619.00 (r/w/o: 10237.65/2878.31/1503.04 lat (ms,95%): 80.03 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 32 tps: 670.78 qps: 13424.65 (r/w/o: 9398.36/2645.13/1381.17 lat (ms,95%): 87.56 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 32 tps: 673.99 qps: 13461.74 (r/w/o: 9421.59/2654.79/138.36 lat (ms,95%): 87.56 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 32 tps: 700.84 qps: 14023.75 (r/w/o: 9817.53/2761.22/144.0 lat (ms,95%): 80.03 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 32 tps: 670.72 qps: 13412.90 (r/w/o: 9389.71/2642.46/1380.73 lat (ms,95%): 90.78 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 32 tps: 748.85 qps: 14977.25 (r/w/o: 10484.06/2953.79/1539.39 lat (ms,95%): 68.05 err/s: 0.00 reconn/s: 0.00SQL statistics: queries performed: read: 587636 write: 165458 other: 86386 total: 839480 transactions: 41974 (699.05 per sec.) queries: 839480 (13981.06 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.)General statistics: total time: 60.0427s total number of events: 41974Latency (ms): min: 14.99 avg: 45.75 max: 214.63 95th percentile: 82.96 sum: 1920514.27threads fairness: events (avg/stddev): 1311.6875/11.52 execution time (avg/stddev): 60.0161/0.01
sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest2 --mysql-user=cc2--mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run
[ 10s ] thds: 32 tps: 702.80 qps: 14081.23 (r/w/o: 9858.82/2784.22/1438.19 lat (ms,95%): 84.47 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 32 tps: 716.73 qps: 14350.91 (r/w/o: 10048.43/2839.72/1462.76 lat (ms,95%): 81.48 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 32 tps: 705.69 qps: 14105.48 (r/w/o: 9872.32/2791.28/1441.89 lat (ms,95%): 84.47 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 32 tps: 760.04 qps: 15205.28 (r/w/o: 10647.74/3003.17/1554.37 lat (ms,95%): 74.46 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 32 tps: 725.75 qps: 14508.59 (r/w/o: 10152.46/2875.01/1481.12 lat (ms,95%): 81.48 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 32 tps: 840.27 qps: 16812.45 (r/w/o: 1771.21/332.29/1718.94) lat (ms,95%): 64.47 err/s: 0.00 reconn/s: 0.00SQL statistics: queries performed: read: 623644 write: 176266 other: 91010 total: 890920 transactions: 44546 (741.82 per sec.) queries: 890920 (14836.39 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.)General statistics: total time: 60.0483s total number of events: 44546Latency (ms): min: 14.30 avg: 43.12 max: 294.21 95th percentile: 78.60 sum: 192091.21threads fairness: events (avg/stddev): 1392.0625/14.39 execution time (avg/stddev): 60.0310/0.01
业务可见 2 QPS 为 14836.39 ,抢占了业务 1 资源,业务 1 的 QPS 也下降为 13981.06
2.利用资源进行控制
打开资源控制
SET GLOBAL tidb_enable_resource_control = 'ON';
使用 sysbench 对库 cctest1 及 cctest2 压力测量,模拟业务 1 和业务 2 同时进行
sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest1 --mysql-user=cc1 --mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run
[ 10s ] thds: 32 tps: 1149.18 qps: 23022.02 (r/w/o: 16120.63/4540.04/2361.35 lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 32 tps: 1192.25 qps: 23843.04 (r/w/o: 16689.56/4706.29/2447.20 lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 32 tps: 1198.40 qps: 23980.96 (r/w/o: 16777.4730.29/2463.20 lat (ms,95%): 38.25 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 32 tps: 1224.59 qps: 24479.44 (r/w/o: 17134.82/4835.05/2509.57 lat (ms,95%): 36.89 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 32 tps: 1213.27 qps: 24266.29 (r/w/o: 16986.95/4790.90/248.45 lat (ms,95%): 38.25 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 32 tps: 1178.54 qps: 23566.47 (r/w/o: 16495.84/4652.36/2418.27 lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00SQL statistics: queries performed: read: 1002344 write: 282657 other: 146919 total: 1431920 transactions: 71596 (1192.73 per sec.) queries: 1431920 (23854.51 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.)General statistics: total time: 60.0257s total number of events: 71596Latency (ms): min: 15.95 avg: 26.82 max: 230.65 95th percentile: 38.94 sum: 1920204.43Threads fairness: events (avg/stddev): 2237.3750/10.36 execution time (avg/stddev): 60.0064/0.01
sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest2 --mysql-user=cc2--mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run
[ 10s ] thds: 32 tps: 121.44 qps: 2448.37 (r/w/o: 1716.11/480.48/251.78 lat (ms,95%): 363.18 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 32 tps: 86.52 qps: 1748.80 (r/w/o: 1226.38/344.98/177.4 lat (ms,95%): 404.61 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 32 tps: 84.89 qps: 1699.06 (r/w/o: 1192.50/333.07/173.49 lat (ms,95%): 411.96 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 32 tps: 84.91 qps: 1682.52 (r/w/o: 1173.15/36.14/173.2 lat (ms,95%): 411.96 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 32 tps: 85.10 qps: 1696.50 (r/w/o: 1185.80/37.70/173.0 lat (ms,95%): 411.96 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 32 tps: 84.70 qps: 1702.66 (r/w/o: 1195.14/335.11/172.41 lat (ms,95%): 458.96 err/s: 0.00 reconn/s: 0.00SQL statistics: queries performed: read: 77112 write: 21802 other: 11246 total: 110160 transactions: 5508 (91.27 per sec.) queries: 110160 (1825.40 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.)General statistics: total time: 60.3470s total number of events: 5508Latency (ms): min: 23.99 avg: 349.70 max: 640.48 95th percentile: 411.96 sum: Threads1926121 fairness: events (avg/stddev): 172.1250/1.27 execution time (avg/stddev): 60.1913/0.08
使用资源控制后,可以看到业务 2 平均 QPS 降为 1825.40 ,业务 1 平均 QPS 升为 23854.51 ,单独经营业务已基本实现 1 时的水平。
资源控制也可以在grafana和dashboard中监控
使用 sysbench 对库 cctest1 及 cctest2 进行压力测量,改进对库 cctest1 压测线程数,模拟业务 1 负载突然升高
sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest1 --mysql-user=cc1 --mysql-password=test --time=60 --threads=64 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run
[ 10s ] thds: 64 tps: 1362.60 qps: 27356.78 (r/w/o: 19159.21/543.79/2753.78 lat (ms,95%): 74.46 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 64 tps: 1353.08 qps: 27019.69 (r/w/o: 18910.81/5383.82/2725.06 lat (ms,95%): 77.19 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 64 tps: 1415.42 qps: 28324.12 (r/w/o: 19830.69/5641.18/2852.24 lat (ms,95%): 73.13 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 64 tps: 1451.07 qps: 29013.40 (r/w/o: 20307.68/5780.28/2925.44 lat (ms,95%): 69.29 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 64 tps: 1457.38 qps: 29156.95 (r/w/o: 20410.48/5809.71/2936.75 lat (ms,95%): 69.29 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 64 tps: 1322.01 qps: 26439.42 (r/w/o: 18505.58/5268.82/2665.01 lat (ms,95%): 71.83 err/s: 0.00 reconn/s: 0.00SQL statistics: queries performed: read: 1171590 write: 333452 other: 168658 total: 1673700 transactions: 83685 (1393.68 per sec.) queries: 1673700 (27873.65 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.)General statistics: total time: 60.0446s total number of events: 8365Latency (ms): min: 17.65 avg: 45.90 max: 714.10 95th percentile: 73.13 sum: 3840924.58 fairness: events (avg/stddev): 1307.5781/11.15 execution time (avg/stddev): 60.0144/0.01
sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest2 --mysql-user=cc2 --mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run
[ 10s ] thds: 32 tps: 97.95 qps: 1984.96 (r/w/o: 1392.74/389.42/202.80 lat (ms,95%): 397.39 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 32 tps: 84.72 qps: 1691.32 (r/w/o: 1183.03/335.26/173.03 lat (ms,95%): 427.07 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 32 tps: 84.41 qps: 1691.95 (r/w/o: 1185.80/34.23/171.91 lat (ms,95%): 427.07 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 32 tps: 84.77 qps: 1701.44 (r/w/o: 1191.81/335.29/174.34 lat (ms,95%): 411.96 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 32 tps: 84.63 qps: 1689.95 (r/w/o: 1183.69/334.31/171.96 lat (ms,95%): 530.08 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 32 tps: 84.70 qps: 1692.46 (r/w/o: 1182.87/336.99/172.60 lat (ms,95%): 404.61 err/s: 0.00 reconn/s: 0.00SQL statistics:queries performed:read: 73402write: 20760other: 10698total: 104860transaction: 5243 (86.92 per sec.)queries: 104860 (1738.40 per sec.)ignored errors: 0 (0.00 per sec.)reconnects: 0 (0.00 per sec.)General statistics:total time: 60.3183 number of events: 5243Latency (ms):min: 27.81avg: 367.24max: 996.8795th percentile: 419.45sum: 1925420.24Threads fairness:events (avg/stddev): 163.8438/1.25execution time (avg/stddev): 60.1694/0.10
您可以看到增加业务 1 负载后,业务 1 平均 QPS 升为 27873.65,业务 2 平均 QPS 为 1738.40,业务 2 QPS 变化不大。
通过监控发现业务 2 使用的 RU 仍然为 2000,业务 1 超过设定的读写配额达到 实现资源利用最大化的3万元。
遇到突发事件的模拟集群 SQL 性能问题
TiDB 6.5 之后的 OOM 问题有了很大的改善,但当遇到很多突发事件时 SQL 性能问题仍然可能影响业务,此时可以结合 SQL Binding 和资源组暂时限制某个资源组 SQL 资源消耗。
一个简单的笛卡尔积 SQL 为例
1.直接执行 SQL
MySQL [cctest1]> select * from sbtest1 join sbtest2;ERROR 1105 (HY000): probeWorker[2] meets error: Your query has been cancelled due to exceeding the allowed memory limit for a single SQL query. Please try narrowing your query scope or increase the tidb_mem_quota_query limit and try again.[conn=7537998832357871515]
等一段时间后 SQL 超出了 tidb_mem_quota_query 终止了限制
2.创建一个小的资源组分 RU
CREATE RESOURCE GROUP IF NOT EXISTS cc3 RU_PER_SEC = 1;
3.创建 SQL Binding
CREATE GLOBAL BINDING FOR SELECT * FROM sbtest1,sbtest2 USING SELECT /*+ RESOURCE_GROUP(cc3) */ * FROM sbtest1,sbtest2;
4.新建 SESSION 执行 SQL
MySQL [cctest1]> select * from sbtest1,sbtest2;ERROR 8252 (HY000): Exceeded resource group quota limitation
SQL 由于对资源组的限制被终止,因此对某个资源组进行了临时限制 SQL 资源消耗的目的
四.总结
资源管控(Resource Control)有很多场景可以使用,可以覆盖更多的过去可能更合适 MySQL 例如,可以使用多个场景 MySQL 业务迁移 TiDB,通过资源控制,不影响各业务,降低硬件成本和运维成本。更多场景、使用方法和原则可参考官方文档[使用资源控制 (Resource Control) 实现资源隔离](https://docs.pingcap.com/zh/tidb/stable/tidb-resource-control) 。这个测试场景只是其中之一,类似于使用同一个重要的业务,如测试库或报表库。 TiDB 集群可以为重要业务设置更高的优先级,并超额占用资源。在本次测试过程中,通过资源组限制配额,实现了多租户的基本功能,但通过硬件配置进行校准 RU 估计容量似乎不是很准确,实际容量不能达到估计,预计官方后续会有更好的优化。
