当前位置: 首页 > 图灵资讯 > 技术篇> TiDB 7.1 资源管控特性试用

TiDB 7.1 资源管控特性试用

来源:图灵教育
时间:2023-07-06 15:33:24

作者: 啦啦啦啦啦

一.背景

印象里 2022 年初的时候,有小伙伴在年初的时候。 asktug 提出 TiDB 未来是否会出现多租户功能的问题,没想到这么快就已经有了 GA 了。资源控制的特点(Resource Control)可以说是 TiDB 7.1 其中最亮眼的功能之一使 TiDB 具有隔离多租户的能力。资源控制特性的合理利用可以减少集群数量,降低运维难度和管理成本。近期因为 MySQL 数据归档问题忙了好几个晚上,现有的 TiDB 集群平均负荷不高,因此产生了多套 MySQL 业务迁移到一套 TiDB 集群的想法,但又怕多个业务会产生资源竞争,影响核心业务。Resource Control 可以解决这个问题,借此机会对这个特性进行简单的测试。

二、测试环境准备

1.硬件配置和集群拓扑规划

Role

Host

Ports

alertmanager

10.0.0.1

9093/9094

grafana

10.0.0.1

3000

pd

10.0.0.1

2379/2380

tidb

10.0.0.1

4000/10080

tikv

10.0.0.2

20160/20180

tikv

10.0.0.3

20160/20180

tikv

10.0.0.4

20160/20180

由于硬件条件有限, 使用 4 配置为台云主机 8C 32G 200G 普通 SAS 硬盘。但本次测试仅针对资源控制的特点进行测试,对硬件的要求并不是特别严格。TiDB 7.1 默认情况下,基于资源组配额的资源组流量控制和请求调度参数将被打开,因此参数没有特别调整。

2. 资源控制配置

1.估计集群容量

由于它是一个测试集群,没有真正的业务流量,我们使用基于硬件部署的估计容量,而不是根据实际负载更准确地估计容量。

MySQL [(none)]> CALIBRATE RESOURCE WORKLOAD TPCC;+-------+| QUOTA |+-------+| 69768 |+-------+1 row in set (0.01 sec)MySQL [(none)]> CALIBRATE RESOURCE WORKLOAD OLTP_WRITE_ONLY;+-------+| QUOTA |+-------+| 73184 |+-------+1 row in set (0.00 sec)MySQL [(none)]> CALIBRATE RESOURCE WORKLOAD OLTP_READ_WRITE;+-------+| QUOTA |+-------+| 29772 |+-------+1 row in set (0.01 sec)MySQL [(none)]> CALIBRATE RESOURCE WORKLOAD OLTP_READ_ONLY;+-------+| QUOTA |+-------+| 13970 |+-------+1 row in set (0.00 sec)

当然,dashboard也可以直观地看到

TiDB 7.1 资源管控特性试用_mysql

2.创建资源组

创建cc1限额为每秒的资源组 10000 RU,并允许资源组的应用超额占用资源,设置绝对优先级HIGH

CREATE RESOURCE GROUP IF NOT EXISTS cc1 RU_PER_SEC = 10000 BURSTABLE HIGH PRIORITY = HIGH;

创建cc2限额为每秒的资源组 2000 RU,当系统资源充足时,资源组的应用不允许超额占用资源,设置绝对优先级LOW

CREATE RESOURCE GROUP IF NOT EXISTS cc2 RU_PER_SEC = 2000 PRIORITY = LOW;

3.绑定资源组

将用户绑定到资源组

MySQL [(none)]> CREATE USER "cc1"@" IDENTIFIED BY 'test' RESOURCE GROUP cc1;Query OK, 0 rows affected (0.54 sec)MySQL [(none)]> CREATE USER "cc2""" IDENTIFIED BY 'test' RESOURCE GROUP cc2;Query OK, 0 rows affected (0.02 sec)MySQL [(none)]> grant all on cctest1.* to "cc1"@";Query OK, 0 rows affected (0.28 sec)MySQL [(none)]> grant all on cctest2.* to "cc2""";Query OK, 0 rows affected (0.13 sec)

三、性能测试

1.未进行资源控制

首先关闭资源控制

SET GLOBAL tidb_enable_resource_control = 'OFF';

生成 10 张 1kw表选择 oltp_read_write 测试场景

使用sysbench对库ctest1进行压测,模拟业务1

sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest1 --mysql-user=cc1 --mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run

[ 10s ] thds: 32 tps: 1218.30 qps: 24404.43 (r/w/o: 17088.15/4800.41/2515.87 lat (ms,95%): 36.89 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 32 tps: 1175.40 qps: 23512.93 (r/w/o: 16456.85/4632.38/2423.70 lat (ms,95%): 42.61 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 32 tps: 1206.80 qps: 24140.37 (r/w/o: 16901.68/4750.89/2487.80 lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 32 tps: 1211.18 qps: 24219.65 (r/w/o: 16952.28/4768.21/249.15 lat (ms,95%): 38.25 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 32 tps: 1180.30 qps: 23605.52 (r/w/o: 16524.24/4649.48/2431.79) lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 32 tps: 1227.51 qps: 24545.73 (r/w/o: 17181.29/4837.82/2526.61 lat (ms,95%): 36.24 err/s: 0.00 reconn/s: 0.00SQL statistics:    queries performed:        read:                            1011220        write:                           284490        other:                           148890        total:                           1444600    transactions:                        72230  (1203.28 per sec.)    queries:                             1444600 (24065.53 per sec.)    ignored errors:                      0      (0.00 per sec.)    reconnects:                          0      (0.00 per sec.)General statistics:    total time:                          60.0264s    total number of events:              7230Latency (ms):         min:                                   14.43         avg:                                   26.59         max:                                   90.19         95th percentile:                       38.94         sum:                              1920326.91threads fairness:    events (avg/stddev):           2257.1875/10.55    execution time (avg/stddev):   60.0102/0.01

平均 QPS 24065.53

使用 sysbench 对库 cctest1 及 cctest2 压力测量,模拟业务 1 和业务 2 同时进行

sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest1 --mysql-user=cc1 --mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run

[ 10s ] thds: 32 tps: 728.88 qps: 14619.00 (r/w/o: 10237.65/2878.31/1503.04 lat (ms,95%): 80.03 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 32 tps: 670.78 qps: 13424.65 (r/w/o: 9398.36/2645.13/1381.17 lat (ms,95%): 87.56 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 32 tps: 673.99 qps: 13461.74 (r/w/o: 9421.59/2654.79/138.36 lat (ms,95%): 87.56 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 32 tps: 700.84 qps: 14023.75 (r/w/o: 9817.53/2761.22/144.0 lat (ms,95%): 80.03 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 32 tps: 670.72 qps: 13412.90 (r/w/o: 9389.71/2642.46/1380.73 lat (ms,95%): 90.78 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 32 tps: 748.85 qps: 14977.25 (r/w/o: 10484.06/2953.79/1539.39 lat (ms,95%): 68.05 err/s: 0.00 reconn/s: 0.00SQL statistics:    queries performed:        read:                            587636        write:                           165458        other:                           86386        total:                           839480    transactions:                        41974  (699.05 per sec.)    queries:                             839480 (13981.06 per sec.)    ignored errors:                      0      (0.00 per sec.)    reconnects:                          0      (0.00 per sec.)General statistics:    total time:                          60.0427s    total number of events:              41974Latency (ms):         min:                                   14.99         avg:                                   45.75         max:                                  214.63         95th percentile:                       82.96         sum:                              1920514.27threads fairness:    events (avg/stddev):           1311.6875/11.52    execution time (avg/stddev):   60.0161/0.01

sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest2 --mysql-user=cc2--mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run

[ 10s ] thds: 32 tps: 702.80 qps: 14081.23 (r/w/o: 9858.82/2784.22/1438.19 lat (ms,95%): 84.47 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 32 tps: 716.73 qps: 14350.91 (r/w/o: 10048.43/2839.72/1462.76 lat (ms,95%): 81.48 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 32 tps: 705.69 qps: 14105.48 (r/w/o: 9872.32/2791.28/1441.89 lat (ms,95%): 84.47 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 32 tps: 760.04 qps: 15205.28 (r/w/o: 10647.74/3003.17/1554.37 lat (ms,95%): 74.46 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 32 tps: 725.75 qps: 14508.59 (r/w/o: 10152.46/2875.01/1481.12 lat (ms,95%): 81.48 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 32 tps: 840.27 qps: 16812.45 (r/w/o: 1771.21/332.29/1718.94) lat (ms,95%): 64.47 err/s: 0.00 reconn/s: 0.00SQL statistics:    queries performed:        read:                            623644        write:                           176266        other:                           91010        total:                           890920    transactions:                        44546  (741.82 per sec.)    queries:                             890920 (14836.39 per sec.)    ignored errors:                      0      (0.00 per sec.)    reconnects:                          0      (0.00 per sec.)General statistics:    total time:                          60.0483s    total number of events:              44546Latency (ms):         min:                                   14.30         avg:                                   43.12         max:                                  294.21         95th percentile:                       78.60         sum:                              192091.21threads fairness:    events (avg/stddev):           1392.0625/14.39    execution time (avg/stddev):   60.0310/0.01

业务可见 2 QPS 为 14836.39 ,抢占了业务 1 资源,业务 1 的 QPS 也下降为 13981.06

2.利用资源进行控制

打开资源控制

SET GLOBAL tidb_enable_resource_control = 'ON';

使用 sysbench 对库 cctest1 及 cctest2 压力测量,模拟业务 1 和业务 2 同时进行

sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest1 --mysql-user=cc1 --mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run

[ 10s ] thds: 32 tps: 1149.18 qps: 23022.02 (r/w/o: 16120.63/4540.04/2361.35 lat (ms,95%): 41.10 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 32 tps: 1192.25 qps: 23843.04 (r/w/o: 16689.56/4706.29/2447.20 lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 32 tps: 1198.40 qps: 23980.96 (r/w/o: 16777.4730.29/2463.20 lat (ms,95%): 38.25 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 32 tps: 1224.59 qps: 24479.44 (r/w/o: 17134.82/4835.05/2509.57 lat (ms,95%): 36.89 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 32 tps: 1213.27 qps: 24266.29 (r/w/o: 16986.95/4790.90/248.45 lat (ms,95%): 38.25 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 32 tps: 1178.54 qps: 23566.47 (r/w/o: 16495.84/4652.36/2418.27 lat (ms,95%): 38.94 err/s: 0.00 reconn/s: 0.00SQL statistics:    queries performed:        read:                            1002344        write:                           282657        other:                           146919        total:                           1431920    transactions:                        71596  (1192.73 per sec.)    queries:                             1431920 (23854.51 per sec.)    ignored errors:                      0      (0.00 per sec.)    reconnects:                          0      (0.00 per sec.)General statistics:    total time:                          60.0257s    total number of events:              71596Latency (ms):         min:                                   15.95         avg:                                   26.82         max:                                  230.65         95th percentile:                       38.94         sum:                              1920204.43Threads fairness:    events (avg/stddev):           2237.3750/10.36    execution time (avg/stddev):   60.0064/0.01

sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest2 --mysql-user=cc2--mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run

[ 10s ] thds: 32 tps: 121.44 qps: 2448.37 (r/w/o: 1716.11/480.48/251.78 lat (ms,95%): 363.18 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 32 tps: 86.52 qps: 1748.80 (r/w/o: 1226.38/344.98/177.4 lat (ms,95%): 404.61 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 32 tps: 84.89 qps: 1699.06 (r/w/o: 1192.50/333.07/173.49 lat (ms,95%): 411.96 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 32 tps: 84.91 qps: 1682.52 (r/w/o: 1173.15/36.14/173.2 lat (ms,95%): 411.96 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 32 tps: 85.10 qps: 1696.50 (r/w/o: 1185.80/37.70/173.0 lat (ms,95%): 411.96 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 32 tps: 84.70 qps: 1702.66 (r/w/o: 1195.14/335.11/172.41 lat (ms,95%): 458.96 err/s: 0.00 reconn/s: 0.00SQL statistics:    queries performed:        read:                            77112        write:                           21802        other:                           11246        total:                           110160    transactions:                        5508   (91.27 per sec.)    queries:                             110160 (1825.40 per sec.)    ignored errors:                      0      (0.00 per sec.)    reconnects:                          0      (0.00 per sec.)General statistics:    total time:                          60.3470s    total number of events:              5508Latency (ms):         min:                                   23.99         avg:                                  349.70         max:                                  640.48         95th percentile:                      411.96         sum:                              Threads1926121 fairness:    events (avg/stddev):           172.1250/1.27    execution time (avg/stddev):   60.1913/0.08

使用资源控制后,可以看到业务 2 平均 QPS 降为 1825.40 ,业务 1 平均 QPS 升为 23854.51 ,单独经营业务已基本实现 1 时的水平。

资源控制也可以在grafana和dashboard中监控

TiDB 7.1 资源管控特性试用_SQL_02

TiDB 7.1 资源管控特性试用_SQL_03

使用 sysbench 对库 cctest1 及 cctest2 进行压力测量,改进对库 cctest1 压测线程数,模拟业务 1 负载突然升高

sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest1 --mysql-user=cc1 --mysql-password=test --time=60 --threads=64 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run

[ 10s ] thds: 64 tps: 1362.60 qps: 27356.78 (r/w/o: 19159.21/543.79/2753.78 lat (ms,95%): 74.46 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 64 tps: 1353.08 qps: 27019.69 (r/w/o: 18910.81/5383.82/2725.06 lat (ms,95%): 77.19 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 64 tps: 1415.42 qps: 28324.12 (r/w/o: 19830.69/5641.18/2852.24 lat (ms,95%): 73.13 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 64 tps: 1451.07 qps: 29013.40 (r/w/o: 20307.68/5780.28/2925.44 lat (ms,95%): 69.29 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 64 tps: 1457.38 qps: 29156.95 (r/w/o: 20410.48/5809.71/2936.75 lat (ms,95%): 69.29 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 64 tps: 1322.01 qps: 26439.42 (r/w/o: 18505.58/5268.82/2665.01 lat (ms,95%): 71.83 err/s: 0.00 reconn/s: 0.00SQL statistics:    queries performed:        read:                            1171590        write:                           333452        other:                           168658        total:                           1673700    transactions:                        83685  (1393.68 per sec.)    queries:                             1673700 (27873.65 per sec.)    ignored errors:                      0      (0.00 per sec.)    reconnects:                          0      (0.00 per sec.)General statistics:    total time:                          60.0446s    total number of events:              8365Latency (ms):         min:                                   17.65         avg:                                   45.90         max:                                  714.10         95th percentile:                       73.13         sum:                              3840924.58 fairness:    events (avg/stddev):           1307.5781/11.15    execution time (avg/stddev):   60.0144/0.01

sysbench --mysql-host=10.0.0.1 --mysql-port=4000 --mysql-db=cctest2 --mysql-user=cc2 --mysql-password=test --time=60 --threads=32 --report-interval=10 --db-driver=mysql oltp_read_write --tables=10 --table-size=10000000 run

[ 10s ] thds: 32 tps: 97.95 qps: 1984.96 (r/w/o: 1392.74/389.42/202.80 lat (ms,95%): 397.39 err/s: 0.00 reconn/s: 0.00[ 20s ] thds: 32 tps: 84.72 qps: 1691.32 (r/w/o: 1183.03/335.26/173.03 lat (ms,95%): 427.07 err/s: 0.00 reconn/s: 0.00[ 30s ] thds: 32 tps: 84.41 qps: 1691.95 (r/w/o: 1185.80/34.23/171.91 lat (ms,95%): 427.07 err/s: 0.00 reconn/s: 0.00[ 40s ] thds: 32 tps: 84.77 qps: 1701.44 (r/w/o: 1191.81/335.29/174.34 lat (ms,95%): 411.96 err/s: 0.00 reconn/s: 0.00[ 50s ] thds: 32 tps: 84.63 qps: 1689.95 (r/w/o: 1183.69/334.31/171.96 lat (ms,95%): 530.08 err/s: 0.00 reconn/s: 0.00[ 60s ] thds: 32 tps: 84.70 qps: 1692.46 (r/w/o: 1182.87/336.99/172.60 lat (ms,95%): 404.61 err/s: 0.00 reconn/s: 0.00SQL statistics:queries performed:read:                            73402write:                           20760other:                           10698total:                           104860transaction:                        5243   (86.92 per sec.)queries:                             104860 (1738.40 per sec.)ignored errors:                      0      (0.00 per sec.)reconnects:                          0      (0.00 per sec.)General statistics:total time:                          60.3183 number of events:              5243Latency (ms):min:                                   27.81avg:                                  367.24max:                                  996.8795th percentile:                      419.45sum:                              1925420.24Threads fairness:events (avg/stddev):           163.8438/1.25execution time (avg/stddev):   60.1694/0.10

您可以看到增加业务 1 负载后,业务 1 平均 QPS 升为 27873.65,业务 2 平均 QPS 为 1738.40,业务 2 QPS 变化不大。

TiDB 7.1 资源管控特性试用_mysql_04

通过监控发现业务 2 使用的 RU 仍然为 2000,业务 1 超过设定的读写配额达到 实现资源利用最大化的3万元。

遇到突发事件的模拟集群 SQL 性能问题

TiDB 6.5 之后的 OOM 问题有了很大的改善,但当遇到很多突发事件时 SQL 性能问题仍然可能影响业务,此时可以结合 SQL Binding 和资源组暂时限制某个资源组 SQL 资源消耗。

一个简单的笛卡尔积 SQL 为例

1.直接执行 SQL

MySQL [cctest1]> select * from sbtest1 join sbtest2;ERROR 1105 (HY000): probeWorker[2] meets error: Your query has been cancelled due to exceeding the allowed memory limit for a single SQL query. Please try narrowing your query scope or increase the tidb_mem_quota_query limit and try again.[conn=7537998832357871515]

等一段时间后 SQL 超出了 tidb_mem_quota_query 终止了限制

2.创建一个小的资源组分 RU

CREATE RESOURCE GROUP IF NOT EXISTS cc3 RU_PER_SEC = 1;

3.创建 SQL Binding

CREATE GLOBAL BINDING FOR SELECT * FROM sbtest1,sbtest2 USING SELECT /*+ RESOURCE_GROUP(cc3) */ * FROM sbtest1,sbtest2;

4.新建 SESSION 执行 SQL

MySQL [cctest1]> select * from sbtest1,sbtest2;ERROR 8252 (HY000): Exceeded resource group quota limitation

SQL 由于对资源组的限制被终止,因此对某个资源组进行了临时限制 SQL 资源消耗的目的

四.总结

资源管控(Resource Control)有很多场景可以使用,可以覆盖更多的过去可能更合适 MySQL 例如,可以使用多个场景 MySQL 业务迁移 TiDB,通过资源控制,不影响各业务,降低硬件成本和运维成本。更多场景、使用方法和原则可参考官方文档[使用资源控制 (Resource Control) 实现资源隔离](https://docs.pingcap.com/zh/tidb/stable/tidb-resource-control) 。这个测试场景只是其中之一,类似于使用同一个重要的业务,如测试库或报表库。 TiDB 集群可以为重要业务设置更高的优先级,并超额占用资源。在本次测试过程中,通过资源组限制配额,实现了多租户的基本功能,但通过硬件配置进行校准 RU 估计容量似乎不是很准确,实际容量不能达到估计,预计官方后续会有更好的优化。