MDEV-32014 Rename binlog cache temporary file to binlog file

for large transaction

Description
===========
When a transaction commits, it copies the binlog events from
binlog cache to binlog file. Very large transactions
(eg. gigabytes) can stall other transactions for a long time
because the data is copied while holding LOCK_log, which blocks
other commits from binlogging.

The solution in this patch is to rename the binlog cache file to
a binlog file instead of copy, if the commiting transaction has
large binlog cache. Rename is a very fast operation, it doesn't
block other transactions a long time.

Design
======
* binlog_large_commit_threshold
  type: ulonglong
  scope: global
  dynamic: yes
  default: 128MB

  Only the binlog cache temporary files large than 128MB are
  renamed to binlog file.

* #binlog_cache_files directory
  To support rename, all binlog cache temporary files are managed
  as normal files now. `#binlog_cache_files` directory is in the same
  directory with binlog files. It is created at server startup if it doesn't
  exist. Otherwise, all files in the directory is deleted at startup.

  The temporary files are named with ML_ prefix and the memorary address
  of the binlog_cache_data object which guarantees it is unique.

* Reserve space
  To supprot rename feature, It must reserve enough space at the
  begin of the binlog cache file. The space is required for
  Format description, Gtid list, checkpoint and Gtid events when
  renaming it to a binlog file.

  Since binlog_cache_data's cache_log is directly accessed by binlog log,
  online alter and wsrep. It is not easy to update all the code. Thus
  binlog cache will not reserve space if it is not session binlog cache or
  wsrep session is enabled.

  - m_file_reserved_bytes
    Stores the bytes reserved at the begin of the cache file.
    It is initialized in write_prepare() and cleared by reset().

    The reserved file header is hide to callers. Thus there is no
    change for callers. E.g.
    - get_byte_position() still get the length of binlog data
      written to the cache, but not the file length.
    - truncate(0) will truncate the file to m_file_reserved_bytes but not 0.

  - write_prepare()
    write_prepare() is called everytime when anything is being written
    into the cache. It will call init_file_reserved_bytes() to  create
    the cache file (if it doesn't exist) and reserve suitable space if
    the data written exceeds buffer's size.

* Binlog_commit_by_rotate
  It is used to encapsulate the code for remaing a binlog cache
  tempoary file to binlog file.
  - should_commit_by_rotate()
    it is called by write_transaction_to_binlog_events() to check if
    a binlog cache should be rename to a binlog file.
  - commit()
    That is the entry to rename a binlog cache and commit the
    transaction. Both rename and commit are protected by LOCK_log,
    Thus not other transactions can write anything into the renamed
    binlog before it.

    Rename happens in a rotation. After the new binlog file is generated,
    replace_binlog_file() is called to:
    - copy data from the new binlog file to its binlog cache file.
    - write gtid event.
    - rename the binlog cache file to binlog file.

    After that the rotation will continue to succeed. Then the transaction
    is committed in a seperated group itself. Its cache file will be
    detached and cache log will be reset before calling
    trx_group_commit_with_engines(). Thus only Xid event be written.
This commit is contained in:
Libing Song 2024-09-05 00:16:35 +08:00 committed by Brandon Nesterenko
parent 35cebfdc51
commit 72cc58bb71
23 changed files with 1612 additions and 131 deletions

View File

@ -67,7 +67,7 @@ SET(SQL_EMBEDDED_SOURCES emb_qcache.cc libmysqld.c lib_sql.cc
../sql/item_subselect.cc ../sql/item_sum.cc ../sql/item_timefunc.cc
../sql/item_xmlfunc.cc ../sql/item_jsonfunc.cc
../sql/json_schema.cc ../sql/json_schema_helper.cc
../sql/key.cc ../sql/lock.cc ../sql/log.cc
../sql/key.cc ../sql/lock.cc ../sql/log.cc ../sql/log_cache.cc
../sql/log_event.cc ../sql/log_event_server.cc
../sql/mf_iocache.cc ../sql/my_decimal.cc
../sql/net_serv.cc ../sql/opt_range.cc

View File

@ -48,7 +48,6 @@
--thread-pool-oversubscribe=#
How many additional active worker threads in a group are
allowed
@@ -1572,8 +1585,8 @@
automatically convert it to an on-disk MyISAM or Aria
table
-t, --tmpdir=name Path for temporary files. Several paths may be specified,
@ -56,6 +55,13 @@
- round-robin fashion
+ separated by a semicolon (;), in this case they are used
+ in a round-robin fashion
background for binlogging by user threads are placed in a
separate location (see `binlog_large_commit_threshold`
option). Several paths may be specified, separated by a
- colon (:), in this case they are used in a round-robin
- fashion
+ semicolon (;), in this case they are used in a
+ round-robin fashion
--transaction-alloc-block-size=#
Allocation block size for transactions to be stored in
binary log

View File

@ -109,6 +109,16 @@ The following specify which files/extra groups are read (specified before remain
--binlog-ignore-db=name
Tells the master that updates to the given database
should not be logged to the binary log
--binlog-large-commit-threshold=#
Increases transaction concurrency for large transactions
(i.e. those with sizes larger than this value) by using
the large transaction's cache file as a new binary log,
and rotating the active binary log to the large
transaction's cache file at commit time. This avoids the
default commit logic that copies the transaction cache
data to the end of the active binary log file while
holding a lock that prevents other transactions from
binlogging
--binlog-legacy-event-pos
Fill in the end_log_pos field of _all_ events in the
binlog, even when doing so costs performance. Can be used
@ -614,7 +624,9 @@ The following specify which files/extra groups are read (specified before remain
--max-binlog-cache-size=#
Sets the total size of the transactional cache
--max-binlog-size=# Binary log will be rotated automatically when the size
exceeds this value
exceeds this value, unless
`binlog_large_commit_threshold` causes rotation
prematurely
--max-binlog-stmt-cache-size=#
Sets the total size of the statement cache
--max-binlog-total-size=#
@ -1579,9 +1591,12 @@ The following specify which files/extra groups are read (specified before remain
temporary table exceeds this size, MariaDB will
automatically convert it to an on-disk MyISAM or Aria
table
-t, --tmpdir=name Path for temporary files. Several paths may be specified,
separated by a colon (:), in this case they are used in a
round-robin fashion
-t, --tmpdir=name Path for temporary files. Files that are created in
background for binlogging by user threads are placed in a
separate location (see `binlog_large_commit_threshold`
option). Several paths may be specified, separated by a
colon (:), in this case they are used in a round-robin
fashion
--transaction-alloc-block-size=#
Allocation block size for transactions to be stored in
binary log
@ -1640,6 +1655,7 @@ binlog-format MIXED
binlog-gtid-index TRUE
binlog-gtid-index-page-size 4096
binlog-gtid-index-span-min 65536
binlog-large-commit-threshold 134217728
binlog-legacy-event-pos FALSE
binlog-optimize-thread-scheduling TRUE
binlog-row-event-max-size 8192

View File

@ -160,16 +160,17 @@ ERROR HY000: Global temporary space limit reached
#
set @save_max_tmp_total_space_usage=@@global.max_tmp_total_space_usage;
set @@global.max_tmp_total_space_usage=64*1024*1024;
set @@max_tmp_session_space_usage=1179648;
set @@max_tmp_session_space_usage=1179648+65536;
select @@max_tmp_session_space_usage;
@@max_tmp_session_space_usage
1179648
1245184
set @save_aria_repair_threads=@@aria_repair_threads;
set @@aria_repair_threads=2;
set @save_max_heap_table_size=@@max_heap_table_size;
set @@max_heap_table_size=16777216;
CREATE TABLE t1 (a CHAR(255),b INT,INDEX (b));
INSERT INTO t1 SELECT SEQ,SEQ FROM seq_1_to_100000;
set @@max_tmp_session_space_usage=1179648;
SELECT * FROM t1 UNION SELECT * FROM t1;
ERROR HY000: Local temporary space limit reached
DROP TABLE t1;
@ -205,11 +206,13 @@ ERROR HY000: Local temporary space limit reached
#
connect c1, localhost, root,,;
set @@binlog_format=row;
CREATE OR REPLACE TABLE t1 (a DATETIME) ENGINE=MyISAM;
CREATE OR REPLACE TABLE t1 (a DATETIME) ENGINE=InnoDB;
BEGIN;
INSERT INTO t1 SELECT NOW() FROM seq_1_to_6000;
SET max_tmp_session_space_usage = 64*1024;
SELECT * FROM information_schema.ALL_PLUGINS LIMIT 2;
ERROR HY000: Local temporary space limit reached
ROLLBACK;
drop table t1;
connection default;
disconnect c1;

View File

@ -215,7 +215,8 @@ select count(distinct concat(seq,repeat('x',1000))) from seq_1_to_1000;
set @save_max_tmp_total_space_usage=@@global.max_tmp_total_space_usage;
set @@global.max_tmp_total_space_usage=64*1024*1024;
set @@max_tmp_session_space_usage=1179648;
# Binlog cache reserve 4096 bytes at the begin of the temporary file.
set @@max_tmp_session_space_usage=1179648+65536;
select @@max_tmp_session_space_usage;
set @save_aria_repair_threads=@@aria_repair_threads;
set @@aria_repair_threads=2;
@ -224,6 +225,7 @@ set @@max_heap_table_size=16777216;
CREATE TABLE t1 (a CHAR(255),b INT,INDEX (b));
INSERT INTO t1 SELECT SEQ,SEQ FROM seq_1_to_100000;
set @@max_tmp_session_space_usage=1179648;
--error 200
SELECT * FROM t1 UNION SELECT * FROM t1;
DROP TABLE t1;
@ -266,11 +268,16 @@ SELECT MIN(VARIABLE_VALUE) OVER (), NTILE(1) OVER (), MAX(VARIABLE_NAME) OVER ()
connect(c1, localhost, root,,);
set @@binlog_format=row;
CREATE OR REPLACE TABLE t1 (a DATETIME) ENGINE=MyISAM;
CREATE OR REPLACE TABLE t1 (a DATETIME) ENGINE=InnoDB;
# Binlog cache file will be truncated at commit, thus keep the the transaction
# to keep binlog cache temporary file large enough
BEGIN;
INSERT INTO t1 SELECT NOW() FROM seq_1_to_6000;
SET max_tmp_session_space_usage = 64*1024;
--error 200
SELECT * FROM information_schema.ALL_PLUGINS LIMIT 2;
ROLLBACK;
drop table t1;
connection default;
disconnect c1;

View File

@ -0,0 +1,100 @@
RESET MASTER;
#
# binlog cache file is created in #binlog_cache_files directory
# and it is deleted at disconnect
#
connect con1,localhost,root,,;
CREATE TABLE t1 (c1 LONGTEXT) ENGINE = InnoDB;
# list binlog_cache_files/
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t1 values(repeat("1", 5242880));
FLUSH BINARY LOGS;
# list #binlog_cache_files/
ML_BINLOG_CACHE_FILE
SET debug_sync = "thread_end SIGNAL signal.thread_end";
disconnect con1;
connection default;
SET debug_sync = "now WAIT_FOR signal.thread_end";
# binlog cache file is deleted at disconnection
# list #binlog_cache_files/
#
# Reserved space is not big enough, rename will not happen. But rotate
# will succeed.
#
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
SET debug = 'd,simulate_required_size_too_big';
UPDATE t1 SET c1 = repeat('2', 5242880);
include/assert.inc [Binlog is rotated, but rename is not executed.]
#
# Error happens when renaming binlog cache to binlog file, rename will
# not happen. Since the original binlog is delete, the rotate will failed
# too. binlog will be closed.
#
SET debug = 'd,simulate_rename_binlog_cache_to_binlog_error';
UPDATE t1 SET c1 = repeat('3', 5242880);
ERROR HY000: Can't open file: './master-bin.000004' (errno: 1 "Operation not permitted")
SELECT count(*) FROM t1 WHERE c1 like "3%";
count(*)
0
# Binlog is closed
show master status;
File Position Binlog_Do_DB Binlog_Ignore_DB
# restart
show master status;
File Position Binlog_Do_DB Binlog_Ignore_DB
master-bin.000004 # <Binlog_Do_DB> <Binlog_Ignore_DB>
#
# Crash happens before rename the file
#
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
SET debug = 'd,binlog_commit_by_rotate_crash_before_rename';
UPDATE t1 SET c1 = repeat('4', 5242880);
ERROR HY000: Lost connection to server during query
# One cache file left afte crash
# list #binlog_cache_files/
ML_BINLOG_CACHE_FILE
non_binlog_cache
# restart
# The cache file is deleted at startup.
# list #binlog_cache_files/
non_binlog_cache
include/assert_grep.inc [warning: non_binlog_cache file is in #binlog_cache_files/]
include/show_binlog_events.inc
Log_name Pos Event_type Server_id End_log_pos Info
master-bin.000005 # Format_desc # # SERVER_VERSION, BINLOG_VERSION
master-bin.000005 # Gtid_list # # [#-#-#]
#
# Crash happens just after rotation is finished, binlog commit is not
# started yet, so there is no Xid_log_event in the log, no garbage at
# the end of the file.
#
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
BEGIN;
UPDATE t1 SET c1 = repeat('5', 5242880);
SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('6', 5242880);
UPDATE t1 SET c1 = repeat('7', 5242880);
ROLLBACK TO SAVEPOINT s1;
INSERT INTO t1 VALUES('a');
SET debug = 'd,binlog_commit_by_rotate_crash_after_rotate';
COMMIT;
ERROR HY000: Lost connection to server during query
# No cache file left afte crash
# list #binlog_cache_files/
# restart
include/show_binlog_events.inc
Log_name Pos Event_type Server_id End_log_pos Info
master-bin.000006 # Format_desc # # SERVER_VERSION, BINLOG_VERSION
master-bin.000006 # Gtid_list # # [#-#-#]
master-bin.000006 # Gtid # # BEGIN GTID #-#-#
master-bin.000006 # Annotate_rows # # UPDATE t1 SET c1 = repeat('5', 5242880)
master-bin.000006 # Table_map # # table_id: # (test.t1)
master-bin.000006 # Update_rows_v1 # # table_id: #
master-bin.000006 # Update_rows_v1 # # table_id: # flags: STMT_END_F
master-bin.000006 # Query # # SAVEPOINT `s1`
master-bin.000006 # Annotate_rows # # INSERT INTO t1 VALUES('a')
master-bin.000006 # Table_map # # table_id: # (test.t1)
master-bin.000006 # Write_rows_v1 # # table_id: # flags: STMT_END_F
call mtr.add_suppression(".*Turning logging off for the whole duration.*");
call mtr.add_suppression(".*non_binlog_cache is in #binlog_cache_files/.*");
DROP TABLE t1;

View File

@ -0,0 +1,143 @@
################################################################################
# MDEV-32014 Rename binlog cache to binlog file
#
# It verifies that the rename logic is handled correct if error happens.
################################################################################
--source include/have_binlog_format_row.inc
--source include/have_innodb.inc
--source include/have_debug.inc
--source include/have_debug_sync.inc
RESET MASTER;
--echo #
--echo # binlog cache file is created in #binlog_cache_files directory
--echo # and it is deleted at disconnect
--echo #
--connect(con1,localhost,root,,)
CREATE TABLE t1 (c1 LONGTEXT) ENGINE = InnoDB;
--echo # list binlog_cache_files/
--let $datadir= `SELECT @@datadir`
--list_files $datadir/#binlog_cache_files
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t1 values(repeat("1", 5242880));
FLUSH BINARY LOGS;
--echo # list #binlog_cache_files/
--replace_regex /ML_[0-9]+/ML_BINLOG_CACHE_FILE/
--list_files $datadir/#binlog_cache_files
SET debug_sync = "thread_end SIGNAL signal.thread_end";
--disconnect con1
--connection default
# Wait until the connection is closed completely.
SET debug_sync = "now WAIT_FOR signal.thread_end";
--echo # binlog cache file is deleted at disconnection
--echo # list #binlog_cache_files/
--list_files $datadir/#binlog_cache_files
--echo #
--echo # Reserved space is not big enough, rename will not happen. But rotate
--echo # will succeed.
--echo #
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
SET debug = 'd,simulate_required_size_too_big';
UPDATE t1 SET c1 = repeat('2', 5242880);
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000002' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos < 4096
--let $assert_text= Binlog is rotated, but rename is not executed.
--source include/assert.inc
--echo #
--echo # Error happens when renaming binlog cache to binlog file, rename will
--echo # not happen. Since the original binlog is delete, the rotate will failed
--echo # too. binlog will be closed.
--echo #
SET debug = 'd,simulate_rename_binlog_cache_to_binlog_error';
--error ER_CANT_OPEN_FILE
UPDATE t1 SET c1 = repeat('3', 5242880);
SELECT count(*) FROM t1 WHERE c1 like "3%";
--echo # Binlog is closed
--source include/show_master_status.inc
--source include/restart_mysqld.inc
--source include/show_master_status.inc
--echo #
--echo # Crash happens before rename the file
--echo #
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
SET debug = 'd,binlog_commit_by_rotate_crash_before_rename';
--source include/expect_crash.inc
--error 2013
UPDATE t1 SET c1 = repeat('4', 5242880);
--write_file $datadir/#binlog_cache_files/non_binlog_cache
It is not a binlog cache file
EOF
--echo # One cache file left afte crash
--echo # list #binlog_cache_files/
--replace_regex /ML_[0-9]+/ML_BINLOG_CACHE_FILE/
--list_files $datadir/#binlog_cache_files
--source include/start_mysqld.inc
--echo # The cache file is deleted at startup.
--echo # list #binlog_cache_files/
--list_files $datadir/#binlog_cache_files
--let $assert_text= warning: non_binlog_cache file is in #binlog_cache_files/
--let $assert_file= $MYSQLTEST_VARDIR/log/mysqld.1.err
--let $assert_select= non_binlog_cache.*#binlog_cache_files/
--let $assert_count= 1
--let $assert_only_after= CURRENT_TEST: binlog.binlog_commit_by_rotate_atomic
--source include/assert_grep.inc
--remove_file $datadir/#binlog_cache_files/non_binlog_cache
--let $binlog_file= LAST
--let $binlog_start= 4
--let $skip_checkpoint_events= 1
--source include/show_binlog_events.inc
--echo #
--echo # Crash happens just after rotation is finished, binlog commit is not
--echo # started yet, so there is no Xid_log_event in the log, no garbage at
--echo # the end of the file.
--echo #
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
BEGIN;
UPDATE t1 SET c1 = repeat('5', 5242880);
SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('6', 5242880);
UPDATE t1 SET c1 = repeat('7', 5242880);
ROLLBACK TO SAVEPOINT s1;
INSERT INTO t1 VALUES('a');
SET debug = 'd,binlog_commit_by_rotate_crash_after_rotate';
--source include/expect_crash.inc
--error 2013
COMMIT;
--echo # No cache file left afte crash
--echo # list #binlog_cache_files/
--replace_regex /ML_[0-9]+/ML_BINLOG_CACHE_FILE/
--list_files $datadir/#binlog_cache_files
--source include/start_mysqld.inc
--let $binlog_file= master-bin.000006
--let $binlog_start= 4
--let $skip_checkpoint_events= 1
--source include/show_binlog_events.inc
call mtr.add_suppression(".*Turning logging off for the whole duration.*");
call mtr.add_suppression(".*non_binlog_cache is in #binlog_cache_files/.*");
DROP TABLE t1;

View File

@ -0,0 +1,18 @@
RESET MASTER;
CREATE TABLE t1 (c1 LONGTEXT) ENGINE = InnoDB;
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t1 values(repeat("1", 5242880));
FLUSH BINARY LOGS;
SET @saved_threshold= @@GLOBAL.binlog_large_commit_threshold;
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
UPDATE t1 SET c1 = repeat('2', 5242880);
include/show_binlog_events.inc
Log_name Pos Event_type Server_id End_log_pos Info
master-bin.000002 # Gtid # # BEGIN GTID #-#-#
master-bin.000002 # Annotate_rows # # UPDATE t1 SET c1 = repeat('2', 5242880)
master-bin.000002 # Table_map # # table_id: # (test.t1)
master-bin.000002 # Update_rows_v1 # # table_id: #
master-bin.000002 # Update_rows_v1 # # table_id: # flags: STMT_END_F
master-bin.000002 # Xid # # COMMIT /* XID */
SET GLOBAL binlog_large_commit_threshold = @saved_threshold;
DROP TABLE t1;

View File

@ -0,0 +1 @@
--encrypt-tmp-files=on

View File

@ -0,0 +1,19 @@
--source include/have_file_key_management_plugin.inc
--source include/have_binlog_format_row.inc
--source include/have_innodb.inc
RESET MASTER;
CREATE TABLE t1 (c1 LONGTEXT) ENGINE = InnoDB;
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t1 values(repeat("1", 5242880));
FLUSH BINARY LOGS;
SET @saved_threshold= @@GLOBAL.binlog_large_commit_threshold;
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
UPDATE t1 SET c1 = repeat('2', 5242880);
--let $binlog_file= LAST
--let $skip_checkpoint_events=1
--source include/show_binlog_events.inc
SET GLOBAL binlog_large_commit_threshold = @saved_threshold;
DROP TABLE t1;

View File

@ -0,0 +1,173 @@
include/master-slave.inc
[connection master]
# Prepare
SET @saved_binlog_large_commit_threshold= @@GLOBAL.binlog_large_commit_threshold;
SET @saved_binlog_checksum= @@GLOBAL.binlog_checksum;
SET GLOBAL binlog_checksum = "NONE";
CREATE TABLE t1 (c1 LONGTEXT) ENGINE = InnoDB;
CREATE TABLE t2 (c1 LONGTEXT) ENGINE = MyISAM;
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t2 values(repeat("1", 5242880));
INSERT INTO t2 values(repeat("1", 5242880));
FLUSH BINARY LOGS;
# Not renamed to binlog, since the binlog cache is not larger than the
# threshold. And it should works well after ROLLBACK TO SAVEPOINT
BEGIN;
SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('1', 5242880);
ROLLBACK TO SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('2', 5242880);
SAVEPOINT s2;
UPDATE t1 SET c1 = repeat('3', 5242880);
UPDATE t1 SET c1 = repeat('4', 5242880);
ROLLBACK TO SAVEPOINT s2;
COMMIT;
include/assert.inc [Binlog is not rotated]
#
# Test binlog cache rename to binlog file with checksum off
#
include/sync_slave_sql_with_master.inc
include/stop_slave.inc
SET @saved_binlog_large_commit_threshold = @@GLOBAL.binlog_large_commit_threshold;
SET @saved_slave_parallel_workers = @@GLOBAL.slave_parallel_workers;
SET @saved_slave_parallel_mode = @@GLOBAL.slave_parallel_mode;
SET @saved_slave_parallel_max_queued = @@GLOBAL.slave_parallel_max_queued;
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
SET GLOBAL slave_parallel_max_queued = 100 * 1024 * 1024;
SET GLOBAL slave_parallel_workers = 4;
SET GLOBAL slave_parallel_mode = "aggressive";
include/start_slave.inc
BEGIN;
DELETE FROM t1;
connection master;
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
# Transaction cache can be renamed and works well with ROLLBACK TO SAVEPOINT
BEGIN;
SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('2', 5242880);
ROLLBACK TO s1;
UPDATE t1 SET c1 = repeat('3', 5242880);
SAVEPOINT s2;
UPDATE t1 SET c1 = repeat('4', 5242880);
UPDATE t1 SET c1 = repeat('5', 5242880);
UPDATE t1 SET c1 = repeat('6', 5242880);
ROLLBACK TO SAVEPOINT s2;
COMMIT;
INSERT INTO t1 VALUES("after_update_t1");
include/assert.inc [Rename is executed.]
# statement cache can be renamed
connection master;
BEGIN;
UPDATE t2 SET c1 = repeat('4', 5242880);
INSERT INTO t1 VALUES("after_update_t2");
COMMIT;
include/assert.inc [Rename is executed.]
connection slave;
ROLLBACK;
connection master;
include/sync_slave_sql_with_master.inc
include/assert.inc [Rename is executed.]
include/assert.inc [Rename is executed.]
include/show_binlog_events.inc
Log_name Pos Event_type Server_id End_log_pos Info
slave-bin.000002 # Gtid # # BEGIN GTID #-#-#
slave-bin.000002 # Annotate_rows # # UPDATE t1 SET c1 = repeat('3', 5242880)
slave-bin.000002 # Table_map # # table_id: # (test.t1)
slave-bin.000002 # Update_rows_v1 # # table_id: #
slave-bin.000002 # Update_rows_v1 # # table_id: # flags: STMT_END_F
slave-bin.000002 # Query # # SAVEPOINT `s2`
slave-bin.000002 # Xid # # COMMIT /* XID */
slave-bin.000002 # Gtid # # BEGIN GTID #-#-#
slave-bin.000002 # Annotate_rows # # INSERT INTO t1 VALUES("after_update_t1")
slave-bin.000002 # Table_map # # table_id: # (test.t1)
slave-bin.000002 # Write_rows_v1 # # table_id: # flags: STMT_END_F
slave-bin.000002 # Xid # # COMMIT /* XID */
slave-bin.000002 # Rotate # # slave-bin.000003;pos=POS
include/show_binlog_events.inc
Log_name Pos Event_type Server_id End_log_pos Info
slave-bin.000003 # Gtid # # BEGIN GTID #-#-#
slave-bin.000003 # Annotate_rows # # UPDATE t2 SET c1 = repeat('4', 5242880)
slave-bin.000003 # Table_map # # table_id: # (test.t2)
slave-bin.000003 # Update_rows_v1 # # table_id: #
slave-bin.000003 # Update_rows_v1 # # table_id: # flags: STMT_END_F
slave-bin.000003 # Query # # COMMIT
slave-bin.000003 # Gtid # # BEGIN GTID #-#-#
slave-bin.000003 # Annotate_rows # # INSERT INTO t1 VALUES("after_update_t2")
slave-bin.000003 # Table_map # # table_id: # (test.t1)
slave-bin.000003 # Write_rows_v1 # # table_id: # flags: STMT_END_F
slave-bin.000003 # Xid # # COMMIT /* XID */
include/stop_slave.inc
SET GLOBAL binlog_large_commit_threshold = @saved_binlog_large_commit_threshold;
SET GLOBAL slave_parallel_workers = @saved_slave_parallel_workers;
SET GLOBAL slave_parallel_max_queued = @saved_slave_parallel_max_queued;
SET GLOBAL slave_parallel_mode = @saved_slave_parallel_mode;
include/start_slave.inc
# CREATE SELECT works well
connection master;
CREATE TABLE t3 SELECT * FROM t1;
include/assert.inc [Rename is executed.]
CREATE TABLE t4 SELECT * FROM t2;
include/assert.inc [Rename is executed.]
# XA statement works well
XA START "test-a-long-xid========================================";
UPDATE t1 SET c1 = repeat('1', 5242880);
XA END "test-a-long-xid========================================";
XA PREPARE "test-a-long-xid========================================";
XA COMMIT "test-a-long-xid========================================";
include/assert.inc [Rename is executed.]
XA START "test-xid";
UPDATE t1 SET c1 = repeat('2', 5242880);
XA END "test-xid";
XA COMMIT "test-xid" ONE PHASE;
include/assert.inc [Rename is executed.]
#
# It works well in the situation that binlog header is larger than
# IO_SIZE and binlog file's buffer.
#
FLUSH BINARY LOGS;
SET SESSION server_id = 1;
UPDATE t1 SET c1 = repeat('3', 5242880);
include/assert.inc [Rename is executed.]
#
# RESET MASTER should work well. It also verifies binlog checksum mechanism.
#
include/rpl_reset.inc
#
# Test binlog cache rename to binlog file with checksum on
#
SET GLOBAL binlog_checksum = "CRC32";
# It will not rename the cache to file, since the cache's checksum was
# initialized when reset the cache at the end of previous transaction.
UPDATE t1 SET c1 = repeat('5', 5242880);
include/assert.inc [Binlog is not rotated]
#
# Not rename to binlog file If the cache's checksum is not same
# to binlog_checksum
#
BEGIN;
UPDATE t1 SET c1 = repeat('6', 5242880);
SET GLOBAL binlog_checksum = "NONE";
COMMIT;
include/assert.inc [Binlog is not rotated]
BEGIN;
UPDATE t1 SET c1 = repeat('7', 5242880);
SET GLOBAL binlog_checksum = "CRC32";
COMMIT;
include/assert.inc [Binlog is not rotated]
#
# Not rename to binlog file If both stmt and trx cache are not empty
#
UPDATE t1, t2 SET t1.c1 = repeat('8', 5242880), t2.c1 = repeat('7', 5242880);
include/assert.inc [Binlog is not rotated]
#
# Not rename to binlog file If binlog_legacy_event_pos is on
#
SET GLOBAL binlog_legacy_event_pos = ON;
UPDATE t1 SET c1 = repeat('9', 5242880);
SET GLOBAL binlog_legacy_event_pos = OFF;
include/assert.inc [Binlog is not rotated]
DROP TABLE t1, t2, t3, t4;
SET GLOBAL binlog_large_commit_threshold = @saved_binlog_large_commit_threshold;
SET GLOBAL binlog_checksum = @saved_binlog_checksum;
include/rpl_end.inc

View File

@ -0,0 +1,271 @@
################################################################################
# MDEV-32014 Rename binlog cache to binlog file
#
# It verifies that the binlog caches which are larger
# than binlog_large_commit_threshold can be move to a binlog file
# successfully. With a successful rename,
# - it rotates the binlog and the cache is renamed to the new binlog file
# - an ignorable event is generated just after the Gtid_log_event of the
# transaction to take the reserved spaces which is unused.
#
# It also verifies that rename is not supported in below cases
# though the cache is larger than the threshold
# - both statement and transaction cache should be flushed.
# - the cache's checksum option is not same to binlog_checksum
# - binlog_legacy_event_pos is enabled.
################################################################################
--source include/have_binlog_format_row.inc
--source include/have_innodb.inc
--source include/master-slave.inc
--echo # Prepare
SET @saved_binlog_large_commit_threshold= @@GLOBAL.binlog_large_commit_threshold;
SET @saved_binlog_checksum= @@GLOBAL.binlog_checksum;
SET GLOBAL binlog_checksum = "NONE";
CREATE TABLE t1 (c1 LONGTEXT) ENGINE = InnoDB;
CREATE TABLE t2 (c1 LONGTEXT) ENGINE = MyISAM;
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t1 values(repeat("1", 5242880));
INSERT INTO t2 values(repeat("1", 5242880));
INSERT INTO t2 values(repeat("1", 5242880));
FLUSH BINARY LOGS;
--echo # Not renamed to binlog, since the binlog cache is not larger than the
--echo # threshold. And it should works well after ROLLBACK TO SAVEPOINT
BEGIN;
SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('1', 5242880);
ROLLBACK TO SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('2', 5242880);
SAVEPOINT s2;
UPDATE t1 SET c1 = repeat('3', 5242880);
UPDATE t1 SET c1 = repeat('4', 5242880);
ROLLBACK TO SAVEPOINT s2;
COMMIT;
--let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $assert_cond= "$binlog_file" = "master-bin.000003"
--let $assert_text= Binlog is not rotated
--source include/assert.inc
--echo #
--echo # Test binlog cache rename to binlog file with checksum off
--echo #
--source include/sync_slave_sql_with_master.inc
--source include/stop_slave.inc
SET @saved_binlog_large_commit_threshold = @@GLOBAL.binlog_large_commit_threshold;
SET @saved_slave_parallel_workers = @@GLOBAL.slave_parallel_workers;
SET @saved_slave_parallel_mode = @@GLOBAL.slave_parallel_mode;
SET @saved_slave_parallel_max_queued = @@GLOBAL.slave_parallel_max_queued;
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
SET GLOBAL slave_parallel_max_queued = 100 * 1024 * 1024;
SET GLOBAL slave_parallel_workers = 4;
SET GLOBAL slave_parallel_mode = "aggressive";
--source include/start_slave.inc
# Block all DML on slave
BEGIN;
DELETE FROM t1;
--connection master
SET GLOBAL binlog_large_commit_threshold = 10 * 1024 * 1024;
--echo # Transaction cache can be renamed and works well with ROLLBACK TO SAVEPOINT
BEGIN;
SAVEPOINT s1;
UPDATE t1 SET c1 = repeat('2', 5242880);
ROLLBACK TO s1;
UPDATE t1 SET c1 = repeat('3', 5242880);
SAVEPOINT s2;
UPDATE t1 SET c1 = repeat('4', 5242880);
UPDATE t1 SET c1 = repeat('5', 5242880);
UPDATE t1 SET c1 = repeat('6', 5242880);
ROLLBACK TO SAVEPOINT s2;
COMMIT;
INSERT INTO t1 VALUES("after_update_t1");
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000004' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
--echo # statement cache can be renamed
--connection master
BEGIN;
UPDATE t2 SET c1 = repeat('4', 5242880);
INSERT INTO t1 VALUES("after_update_t2");
COMMIT;
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000005' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
--connection slave
# UPDATE t2 should be waiting for prior transactions to commit.
let $wait_condition=
SELECT count(*) = 1 FROM information_schema.processlist
WHERE State = "Waiting for prior transaction to commit";
--source include/wait_condition.inc
ROLLBACK;
--connection master
--source include/sync_slave_sql_with_master.inc
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'slave-bin.000002' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'slave-bin.000003' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
--let $binlog_file= slave-bin.000002
--let $skip_checkpoint_events= 1
--source include/show_binlog_events.inc
--let $binlog_file= slave-bin.000003
--source include/show_binlog_events.inc
--source include/stop_slave.inc
SET GLOBAL binlog_large_commit_threshold = @saved_binlog_large_commit_threshold;
SET GLOBAL slave_parallel_workers = @saved_slave_parallel_workers;
SET GLOBAL slave_parallel_max_queued = @saved_slave_parallel_max_queued;
SET GLOBAL slave_parallel_mode = @saved_slave_parallel_mode;
--source include/start_slave.inc
--echo # CREATE SELECT works well
--connection master
CREATE TABLE t3 SELECT * FROM t1;
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000006' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
CREATE TABLE t4 SELECT * FROM t2;
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000007' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
--echo # XA statement works well
XA START "test-a-long-xid========================================";
UPDATE t1 SET c1 = repeat('1', 5242880);
XA END "test-a-long-xid========================================";
XA PREPARE "test-a-long-xid========================================";
XA COMMIT "test-a-long-xid========================================";
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000008' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
XA START "test-xid";
UPDATE t1 SET c1 = repeat('2', 5242880);
XA END "test-xid";
XA COMMIT "test-xid" ONE PHASE;
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000009' LIMIT 4, End_log_pos, 4)
--let $assert_cond= $gtid_end_pos = 4096
--let $assert_text= Rename is executed.
--source include/assert.inc
--echo #
--echo # It works well in the situation that binlog header is larger than
--echo # IO_SIZE and binlog file's buffer.
--echo #
--disable_query_log
# make Gtid_list_event larger than 64K(binlog file's buffer)
--let $server_id= 100000
while ($server_id < 104096)
{
eval SET SESSION server_id = $server_id;
eval UPDATE t1 SET c1 = "$server_id" LIMIT 1;
--inc $server_id
}
--enable_query_log
# After flush, reserved space should be updated.
FLUSH BINARY LOGS;
SET SESSION server_id = 1;
UPDATE t1 SET c1 = repeat('3', 5242880);
--let $gtid_end_pos= query_get_value(SHOW BINLOG EVENTS IN 'master-bin.000011' LIMIT 4, End_log_pos, 4)
# 69632 is 65K which is larger, binlog's buffer is 64K
--let $assert_cond= $gtid_end_pos = 69632
--let $assert_text= Rename is executed.
--source include/assert.inc
--echo #
--echo # RESET MASTER should work well. It also verifies binlog checksum mechanism.
--echo #
--source include/rpl_reset.inc
--echo #
--echo # Test binlog cache rename to binlog file with checksum on
--echo #
SET GLOBAL binlog_checksum = "CRC32";
--echo # It will not rename the cache to file, since the cache's checksum was
--echo # initialized when reset the cache at the end of previous transaction.
UPDATE t1 SET c1 = repeat('5', 5242880);
--let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $assert_cond= "$binlog_file" = "master-bin.000002"
--let $assert_text= Binlog is not rotated
--source include/assert.inc
--echo #
--echo # Not rename to binlog file If the cache's checksum is not same
--echo # to binlog_checksum
--echo #
BEGIN;
UPDATE t1 SET c1 = repeat('6', 5242880);
SET GLOBAL binlog_checksum = "NONE";
COMMIT;
--let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $assert_cond= "$binlog_file" = "master-bin.000003"
--let $assert_text= Binlog is not rotated
--source include/assert.inc
BEGIN;
UPDATE t1 SET c1 = repeat('7', 5242880);
SET GLOBAL binlog_checksum = "CRC32";
COMMIT;
--let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $assert_cond= "$binlog_file" = "master-bin.000004"
--let $assert_text= Binlog is not rotated
--source include/assert.inc
--echo #
--echo # Not rename to binlog file If both stmt and trx cache are not empty
--echo #
UPDATE t1, t2 SET t1.c1 = repeat('8', 5242880), t2.c1 = repeat('7', 5242880);
--let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $assert_cond= "$binlog_file" = "master-bin.000004"
--let $assert_text= Binlog is not rotated
--source include/assert.inc
--echo #
--echo # Not rename to binlog file If binlog_legacy_event_pos is on
--echo #
SET GLOBAL binlog_legacy_event_pos = ON;
UPDATE t1 SET c1 = repeat('9', 5242880);
SET GLOBAL binlog_legacy_event_pos = OFF;
--let $binlog_file= query_get_value(SHOW MASTER STATUS, File, 1)
--let $assert_cond= "$binlog_file" = "master-bin.000004"
--let $assert_text= Binlog is not rotated
--source include/assert.inc
# cleanup
DROP TABLE t1, t2, t3, t4;
SET GLOBAL binlog_large_commit_threshold = @saved_binlog_large_commit_threshold;
SET GLOBAL binlog_checksum = @saved_binlog_checksum;
--let $binlog_file=
--let $skip_checkpoint_events=0
--source include/rpl_end.inc

View File

@ -462,6 +462,16 @@ NUMERIC_BLOCK_SIZE 1
ENUM_VALUE_LIST NULL
READ_ONLY NO
COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME BINLOG_LARGE_COMMIT_THRESHOLD
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BIGINT UNSIGNED
VARIABLE_COMMENT Increases transaction concurrency for large transactions (i.e. those with sizes larger than this value) by using the large transaction's cache file as a new binary log, and rotating the active binary log to the large transaction's cache file at commit time. This avoids the default commit logic that copies the transaction cache data to the end of the active binary log file while holding a lock that prevents other transactions from binlogging
NUMERIC_MIN_VALUE 10485760
NUMERIC_MAX_VALUE 18446744073709551615
NUMERIC_BLOCK_SIZE 1
ENUM_VALUE_LIST NULL
READ_ONLY NO
COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME BINLOG_OPTIMIZE_THREAD_SCHEDULING
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BOOLEAN
@ -1895,7 +1905,7 @@ COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME MAX_BINLOG_SIZE
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BIGINT UNSIGNED
VARIABLE_COMMENT Binary log will be rotated automatically when the size exceeds this value
VARIABLE_COMMENT Binary log will be rotated automatically when the size exceeds this value, unless `binlog_large_commit_threshold` causes rotation prematurely
NUMERIC_MIN_VALUE 4096
NUMERIC_MAX_VALUE 1073741824
NUMERIC_BLOCK_SIZE 4096
@ -3955,7 +3965,7 @@ COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME TMPDIR
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE VARCHAR
VARIABLE_COMMENT Path for temporary files. Several paths may be specified, separated by a colon (:), in this case they are used in a round-robin fashion
VARIABLE_COMMENT Path for temporary files. Files that are created in background for binlogging by user threads are placed in a separate location (see `binlog_large_commit_threshold` option). Several paths may be specified, separated by a colon (:), in this case they are used in a round-robin fashion
NUMERIC_MIN_VALUE NULL
NUMERIC_MAX_VALUE NULL
NUMERIC_BLOCK_SIZE NULL

View File

@ -492,6 +492,16 @@ NUMERIC_BLOCK_SIZE NULL
ENUM_VALUE_LIST NULL
READ_ONLY YES
COMMAND_LINE_ARGUMENT NULL
VARIABLE_NAME BINLOG_LARGE_COMMIT_THRESHOLD
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BIGINT UNSIGNED
VARIABLE_COMMENT Increases transaction concurrency for large transactions (i.e. those with sizes larger than this value) by using the large transaction's cache file as a new binary log, and rotating the active binary log to the large transaction's cache file at commit time. This avoids the default commit logic that copies the transaction cache data to the end of the active binary log file while holding a lock that prevents other transactions from binlogging
NUMERIC_MIN_VALUE 10485760
NUMERIC_MAX_VALUE 18446744073709551615
NUMERIC_BLOCK_SIZE 1
ENUM_VALUE_LIST NULL
READ_ONLY NO
COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME BINLOG_LEGACY_EVENT_POS
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BOOLEAN
@ -2095,7 +2105,7 @@ COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME MAX_BINLOG_SIZE
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE BIGINT UNSIGNED
VARIABLE_COMMENT Binary log will be rotated automatically when the size exceeds this value
VARIABLE_COMMENT Binary log will be rotated automatically when the size exceeds this value, unless `binlog_large_commit_threshold` causes rotation prematurely
NUMERIC_MIN_VALUE 4096
NUMERIC_MAX_VALUE 1073741824
NUMERIC_BLOCK_SIZE 4096
@ -4825,7 +4835,7 @@ COMMAND_LINE_ARGUMENT REQUIRED
VARIABLE_NAME TMPDIR
VARIABLE_SCOPE GLOBAL
VARIABLE_TYPE VARCHAR
VARIABLE_COMMENT Path for temporary files. Several paths may be specified, separated by a colon (:), in this case they are used in a round-robin fashion
VARIABLE_COMMENT Path for temporary files. Files that are created in background for binlogging by user threads are placed in a separate location (see `binlog_large_commit_threshold` option). Several paths may be specified, separated by a colon (:), in this case they are used in a round-robin fashion
NUMERIC_MIN_VALUE NULL
NUMERIC_MAX_VALUE NULL
NUMERIC_BLOCK_SIZE NULL

View File

@ -107,7 +107,7 @@ SET (SQL_SOURCE
hostname.cc init.cc item.cc item_buff.cc item_cmpfunc.cc
item_create.cc item_func.cc item_geofunc.cc item_row.cc
item_strfunc.cc item_subselect.cc item_sum.cc item_timefunc.cc
key.cc log.cc lock.cc
key.cc log.cc log_cache.cc lock.cc
log_event.cc log_event_server.cc
rpl_record.cc rpl_reporting.cc
mf_iocache.cc my_decimal.cc

View File

@ -163,6 +163,128 @@ static SHOW_VAR binlog_status_vars_detail[]=
{NullS, NullS, SHOW_LONG}
};
/**
This class implementes the feature to rename a binlog cache temporary file to
a binlog file. It is used to avoid holding LOCK_log long time when writting a
huge binlog cache to binlog file.
With this feature, temporary files of binlog caches will be created in
BINLOG_CACHE_DIR which is created in the same directory to binlog files
at server startup.
*/
class Binlog_commit_by_rotate
{
public:
Binlog_commit_by_rotate() {}
/**
Check whether rename to binlog should be executed on the cache_data.
@param group_commit_entry object of current transaction
@retval true it should do rename
@retval false it should do normal commit.
*/
bool should_commit_by_rotate(
const MYSQL_BIN_LOG::group_commit_entry *entry) const;
/**
This function is the entry function to rename a binlog cache to a binlog
file. It first, rotate the binlog, then rename the temporary file of the
binlog cache to new binlog file, after that it commits the transaction.
@param entry, group_commit_entry object of current transaction.
@retval true Succeeds to rename binlog cache and commit the transaction
@retval false Fails if error happens or the cache cannot be renamed
*/
bool commit(MYSQL_BIN_LOG::group_commit_entry *entry);
/**
During binlog rotate, after creating the new binlog file and writing the
events that describe its state (e.g. Format description event)}, copy
them into the the binlog cache, delete the binlog file and then rename
the binlog cache to the new binlog file.
@retval true Succeeds to replace the binlog file.
@retval false Failed to replace the binlog file. It only return
true if some error happened after the new binlog file
is deleted. In this situation rotate process will fail.
*/
bool replace_binlog_file();
/**
The space left is more than a gtid event required, thus the extra
space is padded into the gtid event as 0. This function is used
to calculate the real gtid size with pad.
*/
size_t get_gtid_event_pad_data_size();
/**
The space required for session binlog caches to reserve. It is calculated
from the length of current binlog file when it is generated and aligned
to IO_SIZE;
@param header_len header length of current binlog file.
*/
void set_reserved_bytes(uint32 header_len)
{
// Add reserved space for gtid event
header_len+= LOG_EVENT_HEADER_LEN + Gtid_log_event::max_data_length +
BINLOG_CHECKSUM_LEN;
// reserved size is aligned to IO_SIZE.
header_len= (header_len + (IO_SIZE - 1)) & ~(IO_SIZE - 1);
if (header_len != m_reserved_bytes)
m_reserved_bytes= header_len;
}
/**
Return reserved space required for binlog cache. It is NOT defined as
an atomic variable, while it is get and set in parallel. Synchronizing
between set and get is not really necessary, m_reserved_bytes doesnot
be updated often. It may read an old value, but it just effects
current transaction. Next transaction will get the fresh value.
And reserving space is a transaction level action, so there alway
are some transactions reserving space with the old value.
*/
uint32 get_reserved_size()
{
return m_reserved_bytes;
}
private:
/* Singleton object, disable the constructors to prevent copying by mistake */
Binlog_commit_by_rotate &operator=(const Binlog_commit_by_rotate &);
Binlog_commit_by_rotate(const Binlog_commit_by_rotate &);
/**
The commit entry of current transaction which is committed by renaming
it binlog cache to binlog file.
*/
MYSQL_BIN_LOG::group_commit_entry *m_entry{nullptr};
/**
The cache_data which will be renamed to binlog, it is used with
LOCK_log acquired.
*/
binlog_cache_data *m_cache_data{nullptr};
/**
It will be set to true if rename operation succeeds,
it is used with LOCK_log acquired.
*/
bool m_replaced{false};
uint32 m_reserved_bytes {IO_SIZE};
};
static Binlog_commit_by_rotate binlog_commit_by_rotate;
ulonglong opt_binlog_commit_by_rotate_threshold= 128 * 1024 * 1024;
uint32 binlog_cache_reserved_size()
{
return binlog_commit_by_rotate.get_reserved_size();
}
/*
Variables for the binlog background thread.
Protected by the MYSQL_BIN_LOG::LOCK_binlog_background_thread mutex.
@ -3761,7 +3883,8 @@ bool MYSQL_BIN_LOG::open(const char *log_name,
enum cache_type io_cache_type_arg,
ulong max_size_arg,
bool null_created_arg,
bool need_mutex)
bool need_mutex,
bool commit_by_rotate)
{
xid_count_per_binlog *new_xid_list_entry= NULL, *b;
DBUG_ENTER("MYSQL_BIN_LOG::open");
@ -4028,14 +4151,23 @@ bool MYSQL_BIN_LOG::open(const char *log_name,
goto err;
bytes_written+= description_event_for_queue->data_written;
}
/*
Offset must be saved before replace_binlog_file(), it will update the
file position
*/
my_off_t offset= my_b_tell(&log_file);
if (commit_by_rotate && binlog_commit_by_rotate.replace_binlog_file())
goto err;
if (flush_io_cache(&log_file) ||
mysql_file_sync(log_file.file, MYF(MY_WME)))
goto err;
my_off_t offset= my_b_tell(&log_file);
if (!is_relay_log)
{
binlog_commit_by_rotate.set_reserved_bytes((uint32)offset);
/* update binlog_end_pos so that it can be read by after sync hook */
reset_binlog_end_pos(log_file_name, offset);
@ -4127,8 +4259,7 @@ bool MYSQL_BIN_LOG::open(const char *log_name,
/* Notify the io thread that binlog is rotated to a new file */
if (is_relay_log)
signal_relay_log_update();
else
update_binlog_end_pos();
DBUG_RETURN(0);
err:
@ -5712,7 +5843,7 @@ int MYSQL_BIN_LOG::new_file()
{
int res;
mysql_mutex_lock(&LOCK_log);
res= new_file_impl();
res= new_file_impl(false);
mysql_mutex_unlock(&LOCK_log);
return res;
}
@ -5721,9 +5852,9 @@ int MYSQL_BIN_LOG::new_file()
@retval
nonzero - error
*/
int MYSQL_BIN_LOG::new_file_without_locking()
int MYSQL_BIN_LOG::new_file_without_locking(bool commit_by_rotate)
{
return new_file_impl();
return new_file_impl(commit_by_rotate);
}
@ -5738,7 +5869,7 @@ int MYSQL_BIN_LOG::new_file_without_locking()
binlog_space_total will be updated if binlog_space_limit is set
*/
int MYSQL_BIN_LOG::new_file_impl()
int MYSQL_BIN_LOG::new_file_impl(bool commit_by_rotate)
{
int error= 0, close_on_error= FALSE;
char new_name[FN_REFLEN], *new_name_ptr, *old_name, *file_to_open;
@ -5860,7 +5991,8 @@ int MYSQL_BIN_LOG::new_file_impl()
{
/* reopen the binary log file. */
file_to_open= new_name_ptr;
error= open(old_name, new_name_ptr, 0, io_cache_type, max_size, 1, FALSE);
error= open(old_name, new_name_ptr, 0, io_cache_type, max_size, 1, FALSE,
commit_by_rotate);
}
/* handle reopening errors */
@ -5963,7 +6095,7 @@ bool MYSQL_BIN_LOG::append_no_lock(Log_event* ev,
if (flush_and_sync(0))
goto err;
if (my_b_append_tell(&log_file) > max_size)
error= new_file_without_locking();
error= new_file_without_locking(false);
err:
update_binlog_end_pos();
DBUG_RETURN(error);
@ -6022,7 +6154,7 @@ bool MYSQL_BIN_LOG::write_event_buffer(uchar* buf, uint len)
if (flush_and_sync(0))
goto err;
if (my_b_append_tell(&log_file) > max_size)
error= new_file_without_locking();
error= new_file_without_locking(false);
err:
my_safe_afree(ebuf, len);
if (likely(!error))
@ -6211,11 +6343,11 @@ static binlog_cache_mngr *binlog_setup_cache_mngr(THD *thd)
sizeof(binlog_cache_mngr),
MYF(MY_ZEROFILL));
if (!cache_mngr ||
open_cached_file(&cache_mngr->stmt_cache.cache_log, mysql_tmpdir,
LOG_PREFIX, (size_t)binlog_stmt_cache_size,
open_cached_file(&cache_mngr->stmt_cache.cache_log, binlog_cache_dir,
LOG_PREFIX, (size_t) binlog_stmt_cache_size,
MYF(MY_WME | MY_TRACK_WITH_LIMIT)) ||
open_cached_file(&cache_mngr->trx_cache.cache_log, mysql_tmpdir,
LOG_PREFIX, (size_t)binlog_cache_size,
open_cached_file(&cache_mngr->trx_cache.cache_log, binlog_cache_dir,
LOG_PREFIX, (size_t) binlog_cache_size,
MYF(MY_WME | MY_TRACK_WITH_LIMIT)))
{
my_free(cache_mngr);
@ -6870,6 +7002,7 @@ Event_log::prepare_pending_rows_event(THD *thd, TABLE* table,
bool
MYSQL_BIN_LOG::write_gtid_event(THD *thd, bool standalone,
bool is_transactional, uint64 commit_id,
bool commit_by_rotate,
bool has_xid, bool is_ro_1pc)
{
rpl_gtid gtid;
@ -6938,6 +7071,9 @@ MYSQL_BIN_LOG::write_gtid_event(THD *thd, bool standalone,
}
#endif
if (unlikely(commit_by_rotate))
gtid_event.pad_to_size= binlog_commit_by_rotate.get_gtid_event_pad_data_size();
if (write_event(&gtid_event))
DBUG_RETURN(true);
status_var_add(thd->status_var.binlog_bytes_written, gtid_event.data_written);
@ -7266,7 +7402,7 @@ bool MYSQL_BIN_LOG::write(Log_event *event_info, my_bool *with_annotate)
commit_name.length);
commit_id= entry->val_int(&null_value);
});
res= write_gtid_event(thd, true, using_trans, commit_id);
res= write_gtid_event(thd, true, using_trans, commit_id, false);
if (mdl_request.ticket)
thd->mdl_context.release_lock(mdl_request.ticket);
thd->backup_commit_lock= 0;
@ -7627,7 +7763,8 @@ MYSQL_BIN_LOG::do_checkpoint_request(ulong binlog_id)
@retval
nonzero - error in rotating routine.
*/
int MYSQL_BIN_LOG::rotate(bool force_rotate, bool* check_purge)
int MYSQL_BIN_LOG::rotate(bool force_rotate, bool *check_purge,
bool commit_by_rotate)
{
int error= 0;
ulonglong binlog_pos;
@ -7668,7 +7805,7 @@ int MYSQL_BIN_LOG::rotate(bool force_rotate, bool* check_purge)
*/
mark_xids_active(binlog_id, 1);
if (unlikely((error= new_file_without_locking())))
if (unlikely((error= new_file_without_locking(commit_by_rotate))))
{
/**
Be conservative... There are possible lost events (eg,
@ -7969,12 +8106,14 @@ int Event_log::write_cache_raw(THD *thd, IO_CACHE *cache)
int Event_log::write_cache(THD *thd, binlog_cache_data *cache_data)
{
int res;
IO_CACHE *cache= &cache_data->cache_log;
DBUG_ENTER("Event_log::write_cache");
mysql_mutex_assert_owner(&LOCK_log);
if (cache_data->init_for_read())
DBUG_RETURN(ER_ERROR_ON_WRITE);
/*
If possible, just copy the cache over byte-by-byte with pre-computed
checksums.
@ -7983,14 +8122,13 @@ int Event_log::write_cache(THD *thd, binlog_cache_data *cache_data)
likely(!crypto.scheme) &&
likely(!opt_binlog_legacy_event_pos))
{
int res= my_b_copy_all_to_cache(cache, &log_file);
status_var_add(thd->status_var.binlog_bytes_written, my_b_tell(cache));
int res=
my_b_copy_to_cache(cache, &log_file, cache_data->length_for_read());
status_var_add(thd->status_var.binlog_bytes_written,
cache_data->length_for_read());
DBUG_RETURN(res ? ER_ERROR_ON_WRITE : 0);
}
if ((res= reinit_io_cache(cache, READ_CACHE, 0, 0, 0)))
DBUG_RETURN(ER_ERROR_ON_WRITE);
/* Amount of remaining bytes in the IO_CACHE read buffer. */
size_t log_file_pos;
uchar header_buf[LOG_EVENT_HEADER_LEN];
@ -8704,10 +8842,29 @@ end:
DBUG_RETURN(result);
}
bool
inline bool
MYSQL_BIN_LOG::write_transaction_to_binlog_events(group_commit_entry *entry)
{
if (unlikely(binlog_commit_by_rotate.should_commit_by_rotate(entry)))
{
if (binlog_commit_by_rotate.commit(entry))
return true;
}
else if (write_transaction_with_group_commit(entry))
return true;
if (likely(!entry->error))
return false;
write_transaction_handle_error(entry);
return true;
}
bool
MYSQL_BIN_LOG::write_transaction_with_group_commit(group_commit_entry *entry)
{
int is_leader= queue_for_group_commit(entry);
#ifdef WITH_WSREP
/* commit order was released in queue_for_group_commit() call,
here we check if wsrep_commit_ordered() failed or if we are leader */
@ -8812,7 +8969,13 @@ MYSQL_BIN_LOG::write_transaction_to_binlog_events(group_commit_entry *entry)
if (likely(!entry->error))
return entry->thd->wait_for_prior_commit();
else
write_transaction_handle_error(entry);
return true;
}
void MYSQL_BIN_LOG::write_transaction_handle_error(group_commit_entry *entry)
{
switch (entry->error)
{
case ER_ERROR_ON_WRITE:
@ -8841,8 +9004,6 @@ MYSQL_BIN_LOG::write_transaction_to_binlog_events(group_commit_entry *entry)
if (entry->cache_mngr->using_xa && entry->cache_mngr->xa_xid &&
entry->cache_mngr->need_unlog)
mark_xid_done(entry->cache_mngr->binlog_id, true);
return 1;
}
/*
@ -8858,65 +9019,74 @@ MYSQL_BIN_LOG::write_transaction_to_binlog_events(group_commit_entry *entry)
void
MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
{
uint xid_count= 0;
my_off_t UNINIT_VAR(commit_offset);
group_commit_entry *current, *last_in_queue;
group_commit_entry *queue= NULL;
bool check_purge= false;
ulong UNINIT_VAR(binlog_id);
uint64 commit_id;
DBUG_ENTER("MYSQL_BIN_LOG::trx_group_commit_leader");
{
#ifdef ENABLED_DEBUG_SYNC
DBUG_EXECUTE_IF("inject_binlog_commit_before_get_LOCK_log",
DBUG_ASSERT(!debug_sync_set_action(leader->thd, STRING_WITH_LEN
("commit_before_get_LOCK_log SIGNAL waiting WAIT_FOR cont TIMEOUT 1")));
);
DBUG_EXECUTE_IF("inject_binlog_commit_before_get_LOCK_log",
DBUG_ASSERT(!debug_sync_set_action(leader->thd, STRING_WITH_LEN
("commit_before_get_LOCK_log SIGNAL waiting WAIT_FOR cont TIMEOUT 1")));
);
#endif
/*
Lock the LOCK_log(), and once we get it, collect any additional writes
that queued up while we were waiting.
*/
DEBUG_SYNC(leader->thd, "commit_before_get_LOCK_log");
mysql_mutex_lock(&LOCK_log);
DEBUG_SYNC(leader->thd, "commit_after_get_LOCK_log");
/*
Lock the LOCK_log(), and once we get it, collect any additional writes
that queued up while we were waiting.
*/
DEBUG_SYNC(leader->thd, "commit_before_get_LOCK_log");
mysql_mutex_lock(&LOCK_log);
DEBUG_SYNC(leader->thd, "commit_after_get_LOCK_log");
mysql_mutex_lock(&LOCK_prepare_ordered);
if (opt_binlog_commit_wait_count)
wait_for_sufficient_commits();
/*
Note that wait_for_sufficient_commits() may have released and
re-acquired the LOCK_log and LOCK_prepare_ordered if it needed to wait.
*/
current= group_commit_queue;
group_commit_queue= NULL;
mysql_mutex_unlock(&LOCK_prepare_ordered);
binlog_id= current_binlog_id;
mysql_mutex_lock(&LOCK_prepare_ordered);
if (opt_binlog_commit_wait_count)
wait_for_sufficient_commits();
/*
Note that wait_for_sufficient_commits() may have released and
re-acquired the LOCK_log and LOCK_prepare_ordered if it needed to wait.
*/
current= group_commit_queue;
group_commit_queue= NULL;
mysql_mutex_unlock(&LOCK_prepare_ordered);
/* As the queue is in reverse order of entering, reverse it. */
last_in_queue= current;
while (current)
{
group_commit_entry *next= current->next;
/*
Now that group commit is started, we can clear the flag; there is no
longer any use in waiters on this commit trying to trigger it early.
*/
current->thd->waiting_on_group_commit= false;
current->next= queue;
queue= current;
current= next;
}
DBUG_ASSERT(leader == queue /* the leader should be first in queue */);
/* Now we have in queue the list of transactions to be committed in order. */
}
DBUG_ASSERT(is_open());
if (likely(is_open())) // Should always be true
/* As the queue is in reverse order of entering, reverse it. */
last_in_queue= current;
while (current)
{
commit_id= (last_in_queue == leader ? 0 : (uint64)leader->thd->query_id);
group_commit_entry *next= current->next;
/*
Now that group commit is started, we can clear the flag; there is no
longer any use in waiters on this commit trying to trigger it early.
*/
current->thd->waiting_on_group_commit= false;
current->next= queue;
queue= current;
current= next;
}
DBUG_ASSERT(leader == queue /* the leader should be first in queue */);
/* Now we have in queue the list of transactions to be committed in order. */
trx_group_commit_with_engines(leader, last_in_queue, false);
DBUG_VOID_RETURN;
}
void MYSQL_BIN_LOG::trx_group_commit_with_engines(group_commit_entry *leader,
group_commit_entry *tail,
bool commit_by_rotate)
{
uint xid_count= 0;
bool check_purge= false;
ulong UNINIT_VAR(binlog_id);
my_off_t UNINIT_VAR(commit_offset);
group_commit_entry *current;
DBUG_ENTER("MYSQL_BIN_LOG::trx_group_commit_with_engines");
mysql_mutex_assert_owner(&LOCK_log);
binlog_id= current_binlog_id;
if (likely(is_open())) // Binlog could be closed if rotation fails
{
uint64 commit_id= (leader == tail ? 0 : (uint64) leader->thd->query_id);
DBUG_EXECUTE_IF("binlog_force_commit_id",
{
const LEX_CSTRING commit_name= { STRING_WITH_LEN("commit_id") };
@ -8937,7 +9107,7 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
current->error and let the thread do the error reporting itself once
we wake it up.
*/
for (current= queue; current != NULL; current= current->next)
for (current= leader; current != NULL; current= current->next)
{
set_current_thd(current->thd);
binlog_cache_mngr *cache_mngr= current->cache_mngr;
@ -8948,10 +9118,11 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
*/
DBUG_ASSERT(!cache_mngr->stmt_cache.empty() ||
!cache_mngr->trx_cache.empty() ||
current->thd->transaction->xid_state.is_explicit_XA());
current->thd->transaction->xid_state.is_explicit_XA() ||
commit_by_rotate);
if (unlikely((current->error= write_transaction_or_stmt(current,
commit_id))))
if (unlikely((current->error= write_transaction_or_stmt(
current, commit_id, commit_by_rotate))))
current->commit_errno= errno;
strmake_buf(cache_mngr->last_commit_pos_file, log_file_name);
@ -8984,7 +9155,7 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
bool synced= 0;
if (unlikely(flush_and_sync(&synced)))
{
for (current= queue; current != NULL; current= current->next)
for (current= leader; current != NULL; current= current->next)
{
if (!current->error)
{
@ -9004,7 +9175,7 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
mysql_mutex_assert_not_owner(&LOCK_after_binlog_sync);
mysql_mutex_assert_not_owner(&LOCK_commit_ordered);
for (current= queue; current != NULL; current= current->next)
for (current= leader; current != NULL; current= current->next)
{
#ifdef HAVE_REPLICATION
/*
@ -9102,7 +9273,7 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
bool first __attribute__((unused))= true;
bool last __attribute__((unused));
for (current= queue; current != NULL; current= current->next)
for (current= leader; current != NULL; current= current->next)
{
last= current->next == NULL;
#ifdef HAVE_REPLICATION
@ -9155,8 +9326,8 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
in this function, so parent does not need to and we need not set these
values).
*/
last_in_queue->check_purge= check_purge;
last_in_queue->binlog_id= binlog_id;
tail->check_purge= check_purge;
tail->binlog_id= binlog_id;
/* Note that we return with LOCK_commit_ordered locked! */
DBUG_VOID_RETURN;
@ -9166,7 +9337,7 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
Wakeup each participant waiting for our group commit, first calling the
commit_ordered() methods for any transactions doing 2-phase commit.
*/
current= queue;
current= leader;
while (current != NULL)
{
group_commit_entry *next;
@ -9204,10 +9375,9 @@ MYSQL_BIN_LOG::trx_group_commit_leader(group_commit_entry *leader)
DBUG_VOID_RETURN;
}
int
MYSQL_BIN_LOG::write_transaction_or_stmt(group_commit_entry *entry,
uint64 commit_id)
int MYSQL_BIN_LOG::write_transaction_or_stmt(group_commit_entry *entry,
uint64 commit_id,
bool commit_by_rotate)
{
binlog_cache_mngr *mngr= entry->cache_mngr;
bool has_xid= entry->end_event->get_type_code() == XID_EVENT;
@ -9228,10 +9398,17 @@ MYSQL_BIN_LOG::write_transaction_or_stmt(group_commit_entry *entry,
DBUG_ASSERT(!(entry->using_stmt_cache && !mngr->stmt_cache.empty() &&
mngr->get_binlog_cache_log(FALSE)->error));
if (write_gtid_event(entry->thd, is_prepared_xa(entry->thd),
entry->using_trx_cache, commit_id,
has_xid, entry->ro_1pc))
DBUG_RETURN(ER_ERROR_ON_WRITE);
/*
gtid will be written when renaming the binlog cache to binlog file,
if commit_by_rotate is true. Thus skip write_gtid_event here.
*/
if (likely(!commit_by_rotate))
{
if (write_gtid_event(entry->thd, is_prepared_xa(entry->thd),
entry->using_trx_cache, commit_id,
false /* commit_by_rotate */, has_xid, entry->ro_1pc))
DBUG_RETURN(ER_ERROR_ON_WRITE);
}
if (entry->using_stmt_cache && !mngr->stmt_cache.empty() &&
write_cache(entry->thd, mngr->get_binlog_cache_data(FALSE)))
@ -11140,7 +11317,7 @@ int TC_LOG_BINLOG::unlog(ulong cookie, my_xid xid)
if (!BINLOG_COOKIE_IS_DUMMY(cookie))
mark_xid_done(BINLOG_COOKIE_GET_ID(cookie), true);
/*
See comment in trx_group_commit_leader() - if rotate() gave a failure,
See comment in trx_group_commit_with_engines() - if rotate() gave a failure,
we delay the return of error code to here.
*/
DBUG_RETURN(BINLOG_COOKIE_GET_ERROR_FLAG(cookie));
@ -12938,3 +13115,243 @@ void wsrep_register_binlog_handler(THD *thd, bool trx)
}
#endif /* WITH_WSREP */
inline bool Binlog_commit_by_rotate::should_commit_by_rotate(
const MYSQL_BIN_LOG::group_commit_entry *entry) const
{
binlog_cache_data *trx_cache= entry->cache_mngr->get_binlog_cache_data(true);
binlog_cache_data *stmt_cache=
entry->cache_mngr->get_binlog_cache_data(false);
if (likely(trx_cache->get_byte_position() <=
opt_binlog_commit_by_rotate_threshold &&
stmt_cache->get_byte_position() <=
opt_binlog_commit_by_rotate_threshold))
return false;
binlog_cache_data *cache_data= trx_cache;
if (unlikely(entry->using_stmt_cache && !stmt_cache->empty()))
cache_data= stmt_cache;
/*
Don't do rename if no space reserved or no nothing written in the tmp
file. It happens in the case binlog cache buffer is larger than
threshold
*/
if (cache_data->file_reserved_bytes() == 0 ||
cache_data->cache_log.disk_writes == 0)
return false;
/*
- The binlog cache file is not encrypted in the same way with binlog, so it
cannot be renamed to binlog file.
- It is not supported to rename both statement cache and transaction cache
to binlog files at the same time.
- opt_optimize_thread_scheduling is just for testing purpose, it is usually
enabled, skip the disabled case to make the code simple.
*/
if (encrypt_binlog || !opt_optimize_thread_scheduling ||
(entry->using_stmt_cache && entry->using_trx_cache &&
!stmt_cache->empty() && !trx_cache->empty()))
return false;
return true;
}
bool Binlog_commit_by_rotate::commit(MYSQL_BIN_LOG::group_commit_entry *entry)
{
bool check_purge= false;
THD *thd= entry->thd;
binlog_cache_mngr *cache_mngr= entry->cache_mngr;
binlog_cache_data *cache_data= cache_mngr->get_binlog_cache_data(true);
if (unlikely(!entry->using_trx_cache || cache_data->empty()))
cache_data= cache_mngr->get_binlog_cache_data(false);
/* Call them before enter log_lock to avoid holding the lock long */
if (cache_data->sync_temp_file())
return true;
/*
If there was a rollback_to_savepoint happened before, the real length of
tmp file can be greater than the file_end_pos. Truncate the cache tmp
file to file_end_pos of this cache.
*/
my_chsize(cache_data->cache_log.file, cache_data->temp_file_length(), 0,
MYF(0));
if (thd->wait_for_prior_commit())
return true;
// It will be released by trx_group_commit_with_engines
mysql_mutex_lock(&mysql_bin_log.LOCK_log);
enum enum_binlog_checksum_alg expected_alg=
mysql_bin_log.checksum_alg_reset != BINLOG_CHECKSUM_ALG_UNDEF
? mysql_bin_log.checksum_alg_reset
: (enum_binlog_checksum_alg) binlog_checksum_options;
/*
In legacy mode, all events should has a valid position this done by
updating log_pos field when writing events from binlog cache to binlog
file. Thus rename binlog cache to binlog file is not supported in legacy
mode.
if the cache's checksum alg is not same to the binlog's checksum, it needs
to recalculate the checksum. Thus rename binlog cache to binlog file is
not supported.
*/
if (!mysql_bin_log.is_open() || opt_binlog_legacy_event_pos ||
(expected_alg != cache_data->checksum_opt))
{
mysql_mutex_unlock(&mysql_bin_log.LOCK_log);
// It cannot do rename, so go to group commit
return mysql_bin_log.write_transaction_with_group_commit(entry);
}
m_entry= entry;
m_replaced= false;
m_cache_data= cache_data;
ulong prev_binlog_id= mysql_bin_log.current_binlog_id;
/*
Rotate will call replace_binlog_file() to rename the transaction's binlog
cache to the new binlog file.
*/
if (mysql_bin_log.rotate(true, &check_purge, true /* commit_by_rotate */))
{
DBUG_ASSERT(!m_replaced);
DBUG_ASSERT(!mysql_bin_log.is_open());
mysql_mutex_unlock(&mysql_bin_log.LOCK_log);
return true;
}
if (!m_replaced)
{
mysql_mutex_unlock(&mysql_bin_log.LOCK_log);
if (check_purge)
mysql_bin_log.checkpoint_and_purge(prev_binlog_id);
// It cannot do rename, so go to group commit
return mysql_bin_log.write_transaction_with_group_commit(entry);
}
DBUG_EXECUTE_IF("binlog_commit_by_rotate_crash_after_rotate",
DBUG_SUICIDE(););
/* Seek binlog file to the end */
reinit_io_cache(&mysql_bin_log.log_file, WRITE_CACHE,
cache_data->temp_file_length(), false, true);
status_var_add(m_entry->thd->status_var.binlog_bytes_written,
cache_data->get_byte_position());
cache_data->detach_temp_file();
entry->next= nullptr;
mysql_bin_log.trx_group_commit_with_engines(entry, entry, true);
mysql_mutex_assert_not_owner(&mysql_bin_log.LOCK_log);
if (check_purge)
mysql_bin_log.checkpoint_and_purge(prev_binlog_id);
return false;
}
bool Binlog_commit_by_rotate::replace_binlog_file()
{
size_t binlog_size= my_b_tell(&mysql_bin_log.log_file);
size_t required_size= binlog_size;
// space for Gtid_log_event
required_size+= LOG_EVENT_HEADER_LEN + Gtid_log_event::max_data_length +
BINLOG_CHECKSUM_LEN;
DBUG_EXECUTE_IF("simulate_required_size_too_big", required_size= 10000;);
if (required_size > m_cache_data->file_reserved_bytes())
{
sql_print_information("Could not rename binlog cache to binlog(as "
"requested by --binlog-commit-by-rotate-threshold). "
"Required %llu bytes but only %llu bytes reserved.",
required_size, m_cache_data->file_reserved_bytes());
return false;
}
File new_log_fd= -1;
bool ret= false;
/* Create fd for the cache file as a new binlog file fd */
new_log_fd= mysql_file_open(key_file_binlog, m_cache_data->temp_file_name(),
O_BINARY | O_CLOEXEC | O_WRONLY, MYF(MY_WME));
if (new_log_fd == -1)
return false;
/* Copy the part which has been flushed to binlog file to binlog cache */
if (mysql_bin_log.log_file.pos_in_file > 0)
{
size_t copy_len= 0;
uchar buf[IO_SIZE];
int read_fd=
mysql_file_open(key_file_binlog, mysql_bin_log.get_log_fname(),
O_RDONLY | O_BINARY | O_SHARE, MYF(MY_WME));
if (read_fd == -1)
goto err;
while (copy_len < mysql_bin_log.log_file.pos_in_file)
{
int read_len= (int) mysql_file_read(read_fd, buf, IO_SIZE, MYF(MY_WME));
if (read_len < 0 ||
mysql_file_write(new_log_fd, buf, read_len,
MYF(MY_WME | MY_NABP | MY_WAIT_IF_FULL)))
{
mysql_file_close(read_fd, MYF(0));
goto err;
}
copy_len+= read_len;
}
mysql_file_close(read_fd, MYF(0));
}
// Set the cache file as binlog file.
mysql_file_close(mysql_bin_log.log_file.file, MYF(0));
mysql_bin_log.log_file.file= new_log_fd;
new_log_fd= -1;
my_delete(mysql_bin_log.get_log_fname(), MYF(0));
/* Any error happens after the file is deleted should return true. */
ret= true;
if (mysql_bin_log.write_gtid_event(
m_entry->thd, is_prepared_xa(m_entry->thd), m_entry->using_trx_cache,
0 /* commit_id */, true /* commit_by_rotate */,
m_entry->end_event->get_type_code() == XID_EVENT, m_entry->ro_1pc))
goto err;
DBUG_EXECUTE_IF("binlog_commit_by_rotate_crash_before_rename",
DBUG_SUICIDE(););
if (DBUG_IF("simulate_rename_binlog_cache_to_binlog_error") ||
my_rename(m_cache_data->temp_file_name(), mysql_bin_log.get_log_fname(),
MYF(MY_WME)))
goto err;
sql_print_information("Renamed binlog cache to binlog %s",
mysql_bin_log.get_log_fname());
m_replaced= true;
return false;
err:
if (new_log_fd != -1)
mysql_file_close(new_log_fd, MYF(0));
return ret;
}
size_t Binlog_commit_by_rotate::get_gtid_event_pad_data_size()
{
size_t begin_pos= my_b_tell(&mysql_bin_log.log_file);
// Gtid_event's data size doesn't include event header and checksum
size_t pad_data_to_size=
m_cache_data->file_reserved_bytes() - begin_pos - LOG_EVENT_HEADER_LEN;
if (binlog_checksum_options != BINLOG_CHECKSUM_ALG_OFF)
pad_data_to_size-= BINLOG_CHECKSUM_LEN;
return pad_data_to_size;
}

View File

@ -600,9 +600,12 @@ class binlog_cache_mngr;
class binlog_cache_data;
struct rpl_gtid;
struct wait_for_commit;
class Binlog_commit_by_rotate;
class MYSQL_BIN_LOG: public TC_LOG, private Event_log
{
friend Binlog_commit_by_rotate;
#ifdef HAVE_PSI_INTERFACE
/** The instrumentation key to use for @ LOCK_index. */
PSI_mutex_key m_key_LOCK_index;
@ -756,18 +759,24 @@ class MYSQL_BIN_LOG: public TC_LOG, private Event_log
new_file() is locking. new_file_without_locking() does not acquire
LOCK_log.
*/
int new_file_impl();
int new_file_impl(bool commit_by_rotate);
void do_checkpoint_request(ulong binlog_id);
int write_transaction_or_stmt(group_commit_entry *entry, uint64 commit_id);
int write_transaction_or_stmt(group_commit_entry *entry, uint64 commit_id,
bool commit_by_rotate);
int queue_for_group_commit(group_commit_entry *entry);
bool write_transaction_to_binlog_events(group_commit_entry *entry);
bool write_transaction_with_group_commit(group_commit_entry *entry);
void write_transaction_handle_error(group_commit_entry *entry);
void trx_group_commit_leader(group_commit_entry *leader);
void trx_group_commit_with_engines(group_commit_entry *leader,
group_commit_entry *tail,
bool commit_by_rotate);
bool is_xidlist_idle_nolock();
void update_gtid_index(uint32 offset, rpl_gtid gtid);
public:
void purge(bool all);
int new_file_without_locking();
int new_file_without_locking(bool commit_by_rotate);
/*
A list of struct xid_count_per_binlog is used to keep track of how many
XIDs are in prepared, but not committed, state in each binlog. And how
@ -997,7 +1006,8 @@ public:
enum cache_type io_cache_type_arg,
ulong max_size,
bool null_created,
bool need_mutex);
bool need_mutex,
bool commit_by_rotate = false);
bool open_index_file(const char *index_file_name_arg,
const char *log_name, bool need_mutex);
/* Use this to start writing a new log file */
@ -1037,7 +1047,8 @@ public:
bool is_active(const char* log_file_name);
bool can_purge_log(const char *log_file_name, bool interactive);
int update_log_index(LOG_INFO* linfo, bool need_update_threads);
int rotate(bool force_rotate, bool* check_purge);
int rotate(bool force_rotate, bool *check_purge,
bool commit_by_rotate= false);
void checkpoint_and_purge(ulong binlog_id);
int rotate_and_purge(bool force_rotate, DYNAMIC_ARRAY* drop_gtid_domain= NULL);
/**
@ -1117,6 +1128,7 @@ public:
bool is_xidlist_idle();
bool write_gtid_event(THD *thd, bool standalone, bool is_transactional,
uint64 commit_id,
bool commit_by_rotate,
bool has_xid= false, bool ro_1pc= false);
int read_state_from_file();
int write_state_to_file();

115
sql/log_cache.cc Normal file
View File

@ -0,0 +1,115 @@
#include "my_global.h"
#include "log_cache.h"
#include "handler.h"
#include "my_sys.h"
#include "mysql/psi/mysql_file.h"
#include "mysql/service_wsrep.h"
const char *BINLOG_CACHE_DIR= "#binlog_cache_files";
char binlog_cache_dir[FN_REFLEN];
extern uint32 binlog_cache_reserved_size();
bool binlog_cache_data::init_file_reserved_bytes()
{
// Session's cache file is not created, so created here.
if (cache_log.file == -1)
{
char name[FN_REFLEN];
/* Cache file is named with PREFIX + binlog_cache_data object's address */
snprintf(name, FN_REFLEN, "%s/%s_%llu", cache_log.dir, cache_log.prefix,
(ulonglong) this);
if ((cache_log.file=
mysql_file_open(0, name, O_CREAT | O_RDWR, MYF(MY_WME))) < 0)
{
sql_print_error("Failed to open binlog cache temporary file %s", name);
cache_log.error= -1;
return true;
}
}
#ifdef WITH_WSREP
/*
WSREP code accesses cache_log directly, so don't reserve space if WSREP is
on.
*/
if (unlikely(wsrep_on(current_thd)))
return false;
#endif
m_file_reserved_bytes= binlog_cache_reserved_size();
cache_log.pos_in_file= m_file_reserved_bytes;
cache_log.seek_not_done= 1;
return false;
}
void binlog_cache_data::detach_temp_file()
{
mysql_file_close(cache_log.file, MYF(0));
cache_log.file= -1;
reset();
}
extern void ignore_db_dirs_append(const char *dirname_arg);
bool init_binlog_cache_dir()
{
size_t length;
uint max_tmp_file_name_len=
2 /* prefix */ + 10 /* max len of thread_id */ + 1 /* underline */;
ignore_db_dirs_append(BINLOG_CACHE_DIR);
dirname_part(binlog_cache_dir, log_bin_basename, &length);
/*
Must ensure the full name of the tmp file is shorter than FN_REFLEN, to
avoid overflowing the name buffer in write and commit.
*/
if (length + strlen(BINLOG_CACHE_DIR) + max_tmp_file_name_len >= FN_REFLEN)
{
sql_print_error("Could not create binlog cache dir %s%s. It is too long.",
binlog_cache_dir, BINLOG_CACHE_DIR);
return true;
}
memcpy(binlog_cache_dir + length, BINLOG_CACHE_DIR,
strlen(BINLOG_CACHE_DIR));
binlog_cache_dir[length + strlen(BINLOG_CACHE_DIR)]= 0;
MY_DIR *dir_info= my_dir(binlog_cache_dir, MYF(0));
if (!dir_info)
{
/* Make a dir for binlog cache temp files if not exist. */
if (my_mkdir(binlog_cache_dir, 0777, MYF(0)) < 0)
{
sql_print_error("Could not create binlog cache dir %s.",
binlog_cache_dir);
return true;
}
return false;
}
/* Try to delete all cache files in the directory. */
for (uint i= 0; i < dir_info->number_of_files; i++)
{
FILEINFO *file= dir_info->dir_entry + i;
if (strncmp(file->name, LOG_PREFIX, strlen(LOG_PREFIX)))
{
sql_print_warning("%s is in %s/, but it is not a binlog cache file",
file->name, BINLOG_CACHE_DIR);
continue;
}
char file_path[FN_REFLEN];
fn_format(file_path, file->name, binlog_cache_dir, "",
MYF(MY_REPLACE_DIR));
my_delete(file_path, MYF(0));
}
my_dirend(dir_info);
return false;
}

View File

@ -22,6 +22,16 @@ static constexpr my_off_t MY_OFF_T_UNDEF= ~0ULL;
/** Truncate cache log files bigger than this */
static constexpr my_off_t CACHE_FILE_TRUNC_SIZE = 65536;
/**
Create binlog cache directory if it doesn't exist, otherwise delete all
files existing in the directory.
@retval false Succeeds to initialize the directory.
@retval true Failed to initialize the directory.
*/
bool init_binlog_cache_dir();
extern char binlog_cache_dir[FN_REFLEN];
/*
Helper classes to store non-transactional and transactional data
@ -35,7 +45,7 @@ public:
before_stmt_pos(MY_OFF_T_UNDEF), m_pending(0), status(0),
incident(FALSE), precompute_checksums(precompute_checksums),
saved_max_binlog_cache_size(0), ptr_binlog_cache_use(0),
ptr_binlog_cache_disk_use(0)
ptr_binlog_cache_disk_use(0), m_file_reserved_bytes(0)
{
/*
Read the current checksum setting. We will use this setting to decide
@ -50,6 +60,10 @@ public:
~binlog_cache_data()
{
DBUG_ASSERT(empty());
if (cache_log.file != -1 && !encrypt_tmp_files)
unlink(my_filename(cache_log.file));
close_cached_file(&cache_log);
}
@ -67,7 +81,7 @@ public:
bool empty() const
{
return (pending() == NULL &&
(my_b_write_tell(&cache_log) == 0 ||
(my_b_write_tell(&cache_log) - m_file_reserved_bytes == 0 ||
((status & (LOGGED_ROW_EVENT | LOGGED_CRITICAL)) == 0)));
}
@ -97,6 +111,8 @@ public:
bool truncate_file= (cache_log.file != -1 &&
my_b_write_tell(&cache_log) >
MY_MIN(CACHE_FILE_TRUNC_SIZE, binlog_stmt_cache_size));
// m_file_reserved_bytes must be reset to 0, before truncate.
m_file_reserved_bytes= 0;
truncate(0,1); // Forget what's in cache
checksum_opt= !precompute_checksums ? BINLOG_CHECKSUM_ALG_OFF :
(enum_binlog_checksum_alg)binlog_checksum_options;
@ -112,7 +128,8 @@ public:
my_off_t get_byte_position() const
{
return my_b_tell(&cache_log);
DBUG_ASSERT(cache_log.type == WRITE_CACHE);
return my_b_tell(&cache_log) - m_file_reserved_bytes;
}
my_off_t get_prev_position() const
@ -172,6 +189,81 @@ public:
status|= status_arg;
}
/**
This function is called everytime when anything is being written into the
cache_log. To support rename binlog cache to binlog file, the cache_log
should be initialized with reserved space.
*/
bool write_prepare(size_t write_length)
{
/* Data will exceed the buffer size in this write */
if (unlikely(cache_log.write_pos + write_length > cache_log.write_end &&
cache_log.pos_in_file == 0))
{
/* Only session's binlog cache need to reserve space. */
if (cache_log.dir == binlog_cache_dir && !encrypt_tmp_files)
return init_file_reserved_bytes();
}
return false;
}
/**
For session's binlog cache, it have to call this function to skip the
reserved before reading the cache file.
*/
bool init_for_read()
{
return reinit_io_cache(&cache_log, READ_CACHE, m_file_reserved_bytes, 0, 0);
}
/**
For session's binlog cache, it have to call this function to get the
actual data length.
*/
my_off_t length_for_read() const
{
DBUG_ASSERT(cache_log.type == READ_CACHE);
return cache_log.end_of_file - m_file_reserved_bytes;
}
/**
It function returns the cache file's actual length which includes the
reserved space.
*/
my_off_t temp_file_length()
{
return my_b_tell(&cache_log);
}
uint32 file_reserved_bytes() { return m_file_reserved_bytes; }
/**
Flush and sync the data of the file into storage.
@retval true Error happens
@retval false Succeeds
*/
bool sync_temp_file()
{
DBUG_ASSERT(cache_log.file != -1);
if (my_b_flush_io_cache(&cache_log, 1) ||
mysql_file_sync(cache_log.file, MYF(0)))
return true;
return false;
}
/**
Copy the name of the cache file to the argument name.
*/
const char *temp_file_name() { return my_filename(cache_log.file); }
/**
It is called after renaming the cache file to a binlog file. The file
now is a binlog file, so detach it from the binlog cache.
*/
void detach_temp_file();
/*
Cache to store data before copying it to the binary log.
*/
@ -253,6 +345,12 @@ private:
*/
ulong *ptr_binlog_cache_disk_use;
/*
Stores the bytes reserved at the begin of the cache file. It could be
0 for cases that reserved space are not supported. see write_prepare().
*/
uint32 m_file_reserved_bytes {0};
/*
It truncates the cache to a certain position. This includes deleting the
pending event.
@ -266,12 +364,18 @@ private:
delete pending();
set_pending(0);
}
my_bool res __attribute__((unused))=
reinit_io_cache(&cache_log, WRITE_CACHE, pos, 0, reset_cache);
my_bool res __attribute__((unused))= reinit_io_cache(
&cache_log, WRITE_CACHE, pos + m_file_reserved_bytes, 0, reset_cache);
DBUG_ASSERT(res == 0);
cache_log.end_of_file= saved_max_binlog_cache_size;
}
/**
Reserve required space at the begin of the tempoary file. It will create
the temporary file if it doesn't exist.
*/
bool init_file_reserved_bytes();
binlog_cache_data& operator=(const binlog_cache_data& info);
binlog_cache_data(const binlog_cache_data& info);
};

View File

@ -3340,6 +3340,14 @@ public:
uint64 sa_seq_no; // start alter identifier for CA/RA
#ifdef MYSQL_SERVER
event_xid_t xid;
/*
Pad the event to this size if it is not zero. It is only used for renaming
a binlog cache to binlog file. There is some reserved space for gtid event
and the events at the begin of the binlog file. There must be some space
left after the events are filled. Thus the left space is padded into the
gtid event with 0.
*/
uint64 pad_to_size;
#else
event_mysql_xid_t xid;
#endif
@ -3404,6 +3412,11 @@ public:
static const uchar FL_EXTRA_THREAD_ID= 16; // thread_id like in BEGIN Query
#ifdef MYSQL_SERVER
static const uint max_data_length= GTID_HEADER_LEN + 2 + sizeof(XID)
+ 1 /* flags_extra: */
+ 4 /* Extra Engines */
+ 4 /* FL_EXTRA_THREAD_ID */;
Gtid_log_event(THD *thd_arg, uint64 seq_no, uint32 domain_id, bool standalone,
uint16 flags, bool is_transactional, uint64 commit_id,
bool has_xid= false, bool is_ro_1pc= false);

View File

@ -29,6 +29,7 @@
#include "unireg.h"
#include "log_event.h"
#include "log_cache.h"
#include "sql_base.h" // close_thread_tables
#include "sql_cache.h" // QUERY_CACHE_FLAGS_SIZE
#include "sql_locale.h" // MY_LOCALE, my_locale_by_number, my_locale_en_US
@ -690,6 +691,9 @@ void Log_event::init_show_field_list(THD *thd, List<Item>* field_list)
int Log_event_writer::write_internal(const uchar *pos, size_t len)
{
DBUG_ASSERT(!ctx || encrypt_or_write == &Log_event_writer::encrypt_and_write);
if (cache_data && cache_data->write_prepare(len))
return 1;
if (my_b_safe_write(file, pos, len))
{
DBUG_PRINT("error", ("write to log failed: %d", my_errno));
@ -2839,7 +2843,7 @@ Gtid_log_event::Gtid_log_event(THD *thd_arg, uint64 seq_no_arg,
bool ro_1pc)
: Log_event(thd_arg, flags_arg, is_transactional),
seq_no(seq_no_arg), commit_id(commit_id_arg), domain_id(domain_id_arg),
flags2((standalone ? FL_STANDALONE : 0) |
pad_to_size(0), flags2((standalone ? FL_STANDALONE : 0) |
(commit_id_arg ? FL_GROUP_COMMIT_ID : 0)),
flags_extra(0), extra_engines(0),
thread_id(thd_arg->variables.pseudo_thread_id)
@ -2959,10 +2963,7 @@ Gtid_log_event::peek(const uchar *event_start, size_t event_len,
bool
Gtid_log_event::write(Log_event_writer *writer)
{
uchar buf[GTID_HEADER_LEN + 2 + sizeof(XID)
+ 1 /* flags_extra: */
+ 4 /* Extra Engines */
+ 4 /* FL_EXTRA_THREAD_ID */];
uchar buf[max_data_length];
size_t write_len= 13;
int8store(buf, seq_no);
@ -3042,6 +3043,27 @@ Gtid_log_event::write(Log_event_writer *writer)
bzero(buf+write_len, GTID_HEADER_LEN-write_len);
write_len= GTID_HEADER_LEN;
}
if (unlikely(pad_to_size > write_len))
{
if (write_header(writer, pad_to_size) ||
write_data(writer, buf, write_len))
return true;
pad_to_size-= write_len;
char pad_buf[IO_SIZE];
bzero(pad_buf, pad_to_size);
while (pad_to_size)
{
uint64 size= pad_to_size >= IO_SIZE ? IO_SIZE : pad_to_size;
if (write_data(writer, pad_buf, size))
return true;
pad_to_size-= size;
}
return write_footer(writer);
}
return write_header(writer, write_len) ||
write_data(writer, buf, write_len) ||
write_footer(writer);

View File

@ -120,7 +120,7 @@
#include "sp_cache.h"
#include "sql_reload.h" // reload_acl_and_cache
#include "sp_head.h" // init_sp_psi_keys
#include "log_cache.h"
#include <mysqld_default_groups.h>
#ifdef HAVE_POLL_H
@ -5615,6 +5615,8 @@ static int init_server_components()
mysql_mutex_unlock(log_lock);
if (unlikely(error))
unireg_abort(1);
if (unlikely(init_binlog_cache_dir()))
unireg_abort(1);
}
#ifdef HAVE_REPLICATION

View File

@ -1795,7 +1795,8 @@ static Sys_var_on_access_global<Sys_var_ulong,
Sys_max_binlog_size(
"max_binlog_size",
"Binary log will be rotated automatically when the size exceeds this "
"value",
"value, unless `binlog_large_commit_threshold` causes rotation "
"prematurely",
GLOBAL_VAR(max_binlog_size), CMD_LINE(REQUIRED_ARG),
VALID_RANGE(IO_SIZE, 1024*1024L*1024L), DEFAULT(1024*1024L*1024L),
BLOCK_SIZE(IO_SIZE), NO_MUTEX_GUARD, NOT_IN_BINLOG, ON_CHECK(0),
@ -3267,7 +3268,10 @@ static Sys_var_ulonglong Sys_thread_stack(
BLOCK_SIZE(1024));
static Sys_var_charptr_fscs Sys_tmpdir(
"tmpdir", "Path for temporary files. Several paths may "
"tmpdir",
"Path for temporary files. Files that are created in background for "
"binlogging by user threads are placed in a separate location "
"(see `binlog_large_commit_threshold` option). Several paths may "
"be specified, separated by a "
#if defined(_WIN32)
"semicolon (;)"
@ -7408,3 +7412,18 @@ static Sys_var_enum Sys_block_encryption_mode(
"AES_ENCRYPT() and AES_DECRYPT() functions",
SESSION_VAR(block_encryption_mode), CMD_LINE(REQUIRED_ARG),
block_encryption_mode_values, DEFAULT(0));
extern ulonglong opt_binlog_commit_by_rotate_threshold;
static Sys_var_ulonglong Sys_binlog_large_commit_threshold(
"binlog_large_commit_threshold",
"Increases transaction concurrency for large transactions (i.e. "
"those with sizes larger than this value) by using the large "
"transaction's cache file as a new binary log, and rotating the "
"active binary log to the large transaction's cache file at commit "
"time. This avoids the default commit logic that copies the "
"transaction cache data to the end of the active binary log file "
"while holding a lock that prevents other transactions from "
"binlogging",
GLOBAL_VAR(opt_binlog_commit_by_rotate_threshold),
CMD_LINE(REQUIRED_ARG), VALID_RANGE(10 * 1024 * 1024, ULLONG_MAX),
DEFAULT(128 * 1024 * 1024), BLOCK_SIZE(1));