- WL#3239 "log CREATE TABLE in Maria"
- WL#3240 "log DROP TABLE in Maria" - similarly, log RENAME TABLE, REPAIR/OPTIMIZE TABLE, and DELETE no_WHERE_clause (== the DELETE which just truncates the files) - create_rename_lsn added to MARIA_SHARE's state - all these operations (except DROP TABLE) also update the table's create_rename_lsn, which is needed for the correctness of Recovery (see function comment of _ma_repair_write_log_record() in ma_check.c) - write a COMMIT record when transaction commits. - don't log REDOs/UNDOs if this is an internal temporary table like inside ALTER TABLE (I expect this to be a big win). There was already no logging for user-created "CREATE TEMPORARY" tables. - don't fsync files/directories if the table is not transactional - in translog_write_record(), autogenerate a 2-byte-id for the table and log the "id->name" pair (LOGREC_FILE_ID); log LOGREC_LONG_TRANSACTION_ID; automatically store the table's 2-byte-id in any log record. - preparations for Checkpoint: translog_get_horizon(); pausing Checkpoint when some dirty pages are unknown; capturing trn->rec_lsn, trn->first_undo_lsn for Checkpoint and log's low-water-mark computing. - assertions, comments. storage/maria/Makefile.am: more files to build storage/maria/ha_maria.cc: - logging a REPAIR log record if REPAIR/OPTIMIZE was successful. - ha_maria::data_file_type does not have to be set in every info() call, just do it once in open(). - if caller said that transactionality can be disabled (like if caller is ALTER TABLE) i.e. thd->transaction.on==FALSE, then we temporarily disable transactionality of the table in external_lock(); that will ensure that no REDOs/UNDOs are logged for this possibly massive write operation (they are not needed, as if any write fails, the table will be dropped). We re-enable in external_lock(F_UNLCK), which in ALTER TABLE happens before the tmp table replaces the original one (which is good, as thus the final table will have a REDO RENAME and a correct create_rename_lsn). - when we commit we also have to write a log record, so trnman_commit_trn() calls become ma_commit() calls - at end of engine's initialization, we are potentially entering a multi-threaded dangerous world (clients are going to be accepted) and so some assertions of mutex-owning become enforceable, for that we set maria_multi_threaded=TRUE (see ma_control_file.c) storage/maria/ha_maria.h: new member ha_maria::save_transactional (see also ha_maria.cc) storage/maria/ma_blockrec.c: - fixing comments according to discussion with Monty - if a table is transactional but temporarily non-transactional (like in ALTER TABLE), we need to give a sensible LSN to the pages (and, if we give 0, pagecache asserts). - translog_write_record() now takes care of storing the share's 2-byte-id in the log record storage/maria/ma_blockrec.h: fixing comment according to discussion with Monty storage/maria/ma_check.c: When REPAIR/OPTIMIZE modify the data/index file, if this is a transactional table, they must sync it; if they remove files or rename files, they must sync the directory, so that everything is durable. This is just applying to REPAIR/OPTIMIZE the logic already implemented in CREATE/DROP/RENAME a few months ago. Adding a function to write a LOGREC_REPAIR_TABLE at end of REPAIR/OPTIMIZE (called only by ha_maria, not by maria_chk), and to update the table's create_rename_lsn. storage/maria/ma_close.c: fix for a future bug storage/maria/ma_control_file.c: ensuring that if Maria is running in multi-threaded mode, anybody wanting to write to the control file and update last_checkpoint_lsn/last_logno owns the log's lock. storage/maria/ma_control_file.h: see ma_control_file.c storage/maria/ma_create.c: when creating a table: - sync it and its directory only if this is a transactional table and there is a log (no point in syncing in maria_chk) - decouple the two uses of linkname/linkname_ptr (for index file and for data file) into more variables, as we need to know all links until the moment we write the LOGREC_CREATE_TABLE. - set share.data_file_type early so that _ma_initialize_data_file() knows it (Monty's bugfix so that a table always has at least a bitmap page when it is created; so data-file is not 0 bytes anymore). - log a LOGREC_CREATE_TABLE; it contains the bytes which we have just written to the index file's header. Update table's create_rename_lsn. - syncing of kfile had been bugified in a previous merge, correcting - syncing of dfile is now needed as it's not empty anymore - in _ma_initialize_data_file(), use share's block_size and not the global one. This is a gratuitous change, both variables are equal, just that I find it more future-proof to use share-bound variable rather than global one. storage/maria/ma_delete_all.c: log a LOGREC_DELETE_ALL record when doing ma_delete_all_rows(); update create_rename_lsn then. storage/maria/ma_delete_table.c: - logging LOGREC_DROP_TABLE; knowing if this is needed, requires knowing if the table is transactional, which requires opening the table. - we need to sync directories only if the table is transactional storage/maria/ma_extra.c: questions storage/maria/ma_init.c: when maria_end() is called, engine is not multithreaded storage/maria/ma_loghandler.c: - translog_inited has to be visible to ma_create() (see how it is used in ma_create()) - checkpoint record will be a single record, not three - no REDO for TRUNCATE (TRUNCATE calls ma_create() internally so will log a REDO_CREATE) - adding REDO for DELETE no_WHERE_clause (fast DELETE of all rows by truncating the files), REPAIR. - MY_WAIT_IF_FULL to wait&retry if a log write hits a full disk - in translog_write_record(), if MARIA_SHARE does not yet have a 2-byte-id, generate one for it and log LOGREC_FILE_ID; automatically store this short id into log records. - in translog_write_record(), if transaction has not logged its long trid, log LOGREC_LONG_TRANSACTION_ID. - For Checkpoint, we need to know the current end-of-log: adding translog_get_horizon(). - For Control File, adding an assertion that the thread owns the log's lock (control file is protected by this lock) storage/maria/ma_loghandler.h: Changes in log records (see ma_loghandler.c). new prototypes, new functions. storage/maria/ma_loghandler_lsn.h: adding a type LSN_WITH_FLAGS especially for TRN::first_undo_lsn, where the most significant byte is used for flags. storage/maria/ma_open.c: storing the create_rename_lsn in the index file's header (in the state, precisely) and retrieving it from there. storage/maria/ma_pagecache.c: - my set_if_bigger was wrong, correcting it - if the first_in_switch list is not empty, it means that changed_blocks misses some dirty pages, so Checkpoint cannot run and needs to wait. A variable missing_blocks_in_changed_list is added to tell that (should it be named missing_blocks_in_changed_blocks?) - pagecache_collect_changed_blocks_with_lsn() now also tells the minimum rec_lsn (needed for low-water mark computation). storage/maria/ma_pagecache.h: see ma_pagecache.c storage/maria/ma_panic.c: comment storage/maria/ma_range.c: comment storage/maria/ma_rename.c: - logging LOGREC_RENAME_TABLE; knowing if this is needed, requires knowing if the table is transactional, which requires opening the table. - update create_rename_lsn - we need to sync directories only if the table is transactional storage/maria/ma_static.c: comment storage/maria/ma_test_all.sh: - tip for Valgrind-ing ma_test_all - do "export maria_path=somepath" before calling ma_test_all, if you want to run ma_test_all out of storage/maria (useful to have parallel runs, like one normal and one Valgrind, they must not use the same tables so need to run in different directories) storage/maria/maria_def.h: - state now contains, in memory and on disk, the create_rename_lsn - share now contains a 2-byte-id storage/maria/trnman.c: preparations for Checkpoint: capture trn->rec_lsn, trn->first_undo_lsn; minimum first_undo_lsn needed to know log's low-water-mark storage/maria/trnman.h: using most significant byte of first_undo_lsn to hold miscellaneous flags, for now TRANSACTION_LOGGED_LONG_ID. dummy_transaction_object is already declared in ma_static.c. storage/maria/trnman_public.h: dummy_transaction_object was declared in all files including trnman_public.h, while in fact it's a single object. new prototype storage/maria/unittest/ma_test_loghandler-t.c: update for new prototype storage/maria/unittest/ma_test_loghandler_multigroup-t.c: update for new prototype storage/maria/unittest/ma_test_loghandler_multithread-t.c: update for new prototype storage/maria/unittest/ma_test_loghandler_pagecache-t.c: update for new prototype storage/maria/ma_commit.c: function which wraps: - writing a LOGREC_COMMIT record (==commit on disk) - calling trnman_commit_trn() (=commit in memory) storage/maria/ma_commit.h: new header file .tree-is-private: this file is now needed to keep our tree private (don't push it to public trees). When 5.1 is merged into mysql-maria, we can abandon our maria-specific post-commit trigger; .tree_is_private will take care of keeping commit mails private. Don't push this file to public trees.
This commit is contained in:
parent
fd9bd58029
commit
1a96259191
0
.tree-is-private
Normal file
0
.tree-is-private
Normal file
@ -54,7 +54,8 @@ noinst_HEADERS = maria_def.h ma_rt_index.h ma_rt_key.h ma_rt_mbr.h \
|
||||
ma_sp_defs.h ma_fulltext.h ma_ftdefs.h ma_ft_test1.h \
|
||||
ma_ft_eval.h trnman.h lockman.h tablockman.h \
|
||||
ma_control_file.h ha_maria.h ma_blockrec.h \
|
||||
ma_loghandler.h ma_loghandler_lsn.h ma_pagecache.h
|
||||
ma_loghandler.h ma_loghandler_lsn.h ma_pagecache.h \
|
||||
ma_commit.h
|
||||
ma_test1_DEPENDENCIES= $(LIBRARIES)
|
||||
ma_test1_LDADD= @CLIENT_EXTRA_LDFLAGS@ libmaria.a \
|
||||
$(top_builddir)/storage/myisam/libmyisam.a \
|
||||
@ -112,7 +113,8 @@ libmaria_a_SOURCES = ma_init.c ma_open.c ma_extra.c ma_info.c ma_rkey.c \
|
||||
ha_maria.cc trnman.c lockman.c tablockman.c \
|
||||
ma_rt_index.c ma_rt_key.c ma_rt_mbr.c ma_rt_split.c \
|
||||
ma_sp_key.c ma_control_file.c ma_loghandler.c \
|
||||
ma_pagecache.c ma_pagecaches.c
|
||||
ma_pagecache.c ma_pagecaches.c \
|
||||
ma_commit.c
|
||||
CLEANFILES = test?.MA? FT?.MA? isam.log ma_test_all ma_rt_test.MA? sp_test.MA?
|
||||
|
||||
SUFFIXES = .sh
|
||||
|
@ -30,6 +30,7 @@
|
||||
#include "maria_def.h"
|
||||
#include "ma_rt_index.h"
|
||||
#include "ma_blockrec.h"
|
||||
#include "ma_commit.h"
|
||||
|
||||
#define MARIA_CANNOT_ROLLBACK HA_NO_TRANSACTIONS
|
||||
#ifdef MARIA_CANNOT_ROLLBACK
|
||||
@ -690,7 +691,8 @@ int ha_maria::open(const char *name, int mode, uint test_if_locked)
|
||||
info(HA_STATUS_NO_LOCK | HA_STATUS_VARIABLE | HA_STATUS_CONST);
|
||||
if (!(test_if_locked & HA_OPEN_WAIT_IF_LOCKED))
|
||||
VOID(maria_extra(file, HA_EXTRA_WAIT_LOCK, 0));
|
||||
if (file->s->data_file_type != STATIC_RECORD)
|
||||
save_transactional= file->s->base.transactional;
|
||||
if ((data_file_type= file->s->data_file_type) != STATIC_RECORD)
|
||||
int_table_flags |= HA_REC_NOT_IN_SEQ;
|
||||
if (file->s->options & (HA_OPTION_CHECKSUM | HA_OPTION_COMPRESS_RECORD))
|
||||
int_table_flags |= HA_HAS_CHECKSUM;
|
||||
@ -1178,6 +1180,8 @@ int ha_maria::repair(THD *thd, HA_CHECK ¶m, bool do_optimize)
|
||||
llstr(rows, llbuff),
|
||||
llstr(file->state->records, llbuff2));
|
||||
}
|
||||
if (!error)
|
||||
error= _ma_repair_write_log_record(¶m, file);
|
||||
}
|
||||
else
|
||||
{
|
||||
@ -1806,7 +1810,6 @@ int ha_maria::info(uint flag)
|
||||
MY_APPEND_EXT | MY_UNPACK_FILENAME);
|
||||
if (strcmp(name_buff, maria_info.index_file_name))
|
||||
index_file_name=maria_info.index_file_name;
|
||||
data_file_type= maria_info.data_file_type;
|
||||
}
|
||||
if (flag & HA_STATUS_ERRKEY)
|
||||
{
|
||||
@ -1860,7 +1863,7 @@ int ha_maria::external_lock(THD *thd, int lock_type)
|
||||
{
|
||||
TRN *trn= THD_TRN;
|
||||
DBUG_ENTER("ha_maria::external_lock");
|
||||
if (!file->s->base.transactional)
|
||||
if (!save_transactional)
|
||||
goto skip_transaction;
|
||||
if (!trn && lock_type != F_UNLCK) /* no transaction yet - open it now */
|
||||
{
|
||||
@ -1884,6 +1887,19 @@ int ha_maria::external_lock(THD *thd, int lock_type)
|
||||
trans_register_ha(thd, FALSE, maria_hton);
|
||||
trnman_new_statement(trn);
|
||||
}
|
||||
if (!thd->transaction.on)
|
||||
{
|
||||
/*
|
||||
No need to log REDOs/UNDOs. If this is an internal temporary table
|
||||
which will be renamed to a permanent table (like in ALTER TABLE),
|
||||
the rename happens after unlocking so will be durable (and the table
|
||||
will get its create_rename_lsn).
|
||||
Note: if we wanted to enable users to have an old backup and apply
|
||||
tons of archived logs to roll-forward, we could then not disable
|
||||
REDOs/UNDOs in this case.
|
||||
*/
|
||||
file->s->base.transactional= FALSE;
|
||||
}
|
||||
}
|
||||
else
|
||||
{
|
||||
@ -1894,7 +1910,8 @@ int ha_maria::external_lock(THD *thd, int lock_type)
|
||||
{
|
||||
/* autocommit ? rollback a transaction */
|
||||
#ifdef MARIA_CANNOT_ROLLBACK
|
||||
trnman_commit_trn(trn);
|
||||
if (ma_commit(trn))
|
||||
DBUG_RETURN(1);
|
||||
THD_TRN= 0;
|
||||
#else
|
||||
if (!(thd->options & (OPTION_NOT_AUTOCOMMIT | OPTION_BEGIN)))
|
||||
@ -1906,6 +1923,7 @@ int ha_maria::external_lock(THD *thd, int lock_type)
|
||||
#endif
|
||||
}
|
||||
}
|
||||
file->s->base.transactional= save_transactional;
|
||||
}
|
||||
skip_transaction:
|
||||
DBUG_RETURN(maria_lock_database(file, !table->s->tmp_table ?
|
||||
@ -1916,7 +1934,7 @@ skip_transaction:
|
||||
int ha_maria::start_stmt(THD *thd, thr_lock_type lock_type)
|
||||
{
|
||||
TRN *trn= THD_TRN;
|
||||
if (file->s->base.transactional)
|
||||
if (save_transactional)
|
||||
{
|
||||
DBUG_ASSERT(trn); // this may be called only after external_lock()
|
||||
DBUG_ASSERT(trnman_has_locked_tables(trn));
|
||||
@ -2186,8 +2204,7 @@ static int maria_commit(handlerton *hton __attribute__ ((unused)),
|
||||
DBUG_RETURN(0); // end of statement
|
||||
DBUG_PRINT("info", ("THD_TRN set to 0x0"));
|
||||
THD_TRN= 0;
|
||||
DBUG_RETURN(trnman_commit_trn(trn) ?
|
||||
HA_ERR_OUT_OF_MEM : 0); // end of transaction
|
||||
DBUG_RETURN(ma_commit(trn)); // end of transaction
|
||||
}
|
||||
|
||||
|
||||
@ -2212,6 +2229,7 @@ static int maria_rollback(handlerton *hton __attribute__ ((unused)),
|
||||
|
||||
static int ha_maria_init(void *p)
|
||||
{
|
||||
int res;
|
||||
maria_hton= (handlerton *)p;
|
||||
maria_hton->state= SHOW_OPTION_YES;
|
||||
maria_hton->db_type= DB_TYPE_MARIA;
|
||||
@ -2223,14 +2241,16 @@ static int ha_maria_init(void *p)
|
||||
maria_hton->flags= HTON_CAN_RECREATE | HTON_SUPPORT_LOG_TABLES;
|
||||
bzero(maria_log_pagecache, sizeof(*maria_log_pagecache));
|
||||
maria_data_root= mysql_real_data_home;
|
||||
return (test(maria_init() || ma_control_file_create_or_open() ||
|
||||
(init_pagecache(maria_log_pagecache,
|
||||
TRANSLOG_PAGECACHE_SIZE, 0, 0,
|
||||
TRANSLOG_PAGE_SIZE) == 0) ||
|
||||
translog_init(maria_data_root, TRANSLOG_FILE_SIZE,
|
||||
MYSQL_VERSION_ID, server_id, maria_log_pagecache,
|
||||
TRANSLOG_DEFAULT_FLAGS) ||
|
||||
trnman_init()));
|
||||
res= maria_init() || ma_control_file_create_or_open() ||
|
||||
(init_pagecache(maria_log_pagecache,
|
||||
TRANSLOG_PAGECACHE_SIZE, 0, 0,
|
||||
TRANSLOG_PAGE_SIZE) == 0) ||
|
||||
translog_init(maria_data_root, TRANSLOG_FILE_SIZE,
|
||||
MYSQL_VERSION_ID, server_id, maria_log_pagecache,
|
||||
TRANSLOG_DEFAULT_FLAGS) ||
|
||||
trnman_init();
|
||||
maria_multi_threaded= TRUE;
|
||||
return res;
|
||||
}
|
||||
|
||||
|
||||
|
@ -39,6 +39,11 @@ class ha_maria :public handler
|
||||
char *data_file_name, *index_file_name;
|
||||
enum data_file_type data_file_type;
|
||||
bool can_enable_indexes;
|
||||
/**
|
||||
@brief for temporarily disabling table's transactionality
|
||||
(if THD::transaction::on is false), remember the original value here
|
||||
*/
|
||||
bool save_transactional;
|
||||
int repair(THD * thd, HA_CHECK ¶m, bool optimize);
|
||||
|
||||
public:
|
||||
|
@ -171,11 +171,14 @@
|
||||
started and we can then delete TRANSID and VER_PTR from the row to
|
||||
gain more space.
|
||||
|
||||
If a row is deleted in Maria, we change TRANSID to current transid and
|
||||
change VER_PTR to point to the undo record for the delete. The undo
|
||||
record must contain the original TRANSID, so that another transaction
|
||||
can use this to check if they should use the found row or go to the
|
||||
previous row pointed to by the VER_PTR in the undo row.
|
||||
If a row is deleted in Maria, we change TRANSID to the deleting
|
||||
transaction's id, change VER_PTR to point to the undo record for the delete,
|
||||
and add DELETE_TRANSID (the id of the transaction which last
|
||||
inserted/updated the row before its deletion). DELETE_TRANSID allows an old
|
||||
transaction to avoid reading the log to know if it can see the last version
|
||||
before delete (in other words it reduces the probability of having to follow
|
||||
VER_PTR). TODO: depending on a compilation option, evaluate the performance
|
||||
impact of not storing DELETE_TRANSID (which would make the row smaller).
|
||||
|
||||
Description of the different parts:
|
||||
|
||||
@ -391,7 +394,12 @@ my_bool _ma_once_end_block_record(MARIA_SHARE *share)
|
||||
share->temporary ? FLUSH_IGNORE_CHANGED :
|
||||
FLUSH_RELEASE))
|
||||
res= 1;
|
||||
if (my_close(share->bitmap.file.file, MYF(MY_WME)))
|
||||
/*
|
||||
File must be synced as it is going out of the maria_open_list and so
|
||||
becoming unknown to Checkpoint.
|
||||
*/
|
||||
if (my_sync(share->bitmap.file.file, MYF(MY_WME)) ||
|
||||
my_close(share->bitmap.file.file, MYF(MY_WME)))
|
||||
res= 1;
|
||||
/*
|
||||
Trivial assignment to guard against multiple invocations
|
||||
@ -400,6 +408,8 @@ my_bool _ma_once_end_block_record(MARIA_SHARE *share)
|
||||
*/
|
||||
share->bitmap.file.file= -1;
|
||||
}
|
||||
if (share->id != 0)
|
||||
translog_deassign_id_from_share(share);
|
||||
return res;
|
||||
}
|
||||
|
||||
@ -573,7 +583,14 @@ void _ma_unpin_all_pages(MARIA_HA *info, LSN undo_lsn)
|
||||
DBUG_ASSERT(undo_lsn != 0 || !info->s->base.transactional);
|
||||
|
||||
if (!info->s->base.transactional)
|
||||
undo_lsn= 0; /* Avoid assert in key cache */
|
||||
{
|
||||
/*
|
||||
If this is a transactional table but with transactionality temporarily
|
||||
disabled (like in ALTER TABLE) we need to give a sensible LSN to pages
|
||||
and not 0. If this is not a transactional table it will reduce to 0.
|
||||
*/
|
||||
undo_lsn= info->s->state.create_rename_lsn;
|
||||
}
|
||||
|
||||
while (pinned_page-- != page_link)
|
||||
pagecache_unlock_by_link(info->s->pagecache, pinned_page->link,
|
||||
@ -1133,7 +1150,6 @@ static my_bool write_tail(MARIA_HA *info,
|
||||
LSN lsn;
|
||||
|
||||
/* Log REDO changes of tail page */
|
||||
fileid_store(log_data, info->dfile.file);
|
||||
page_store(log_data+ FILEID_STORE_SIZE, block->page);
|
||||
dirpos_store(log_data+ FILEID_STORE_SIZE + PAGE_STORE_SIZE,
|
||||
row_pos.rownr);
|
||||
@ -1143,7 +1159,8 @@ static my_bool write_tail(MARIA_HA *info,
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 1].length= length;
|
||||
if (translog_write_record(&lsn, LOGREC_REDO_INSERT_ROW_TAIL,
|
||||
info->trn, share, sizeof(log_data) + length,
|
||||
TRANSLOG_INTERNAL_PARTS + 2, log_array))
|
||||
TRANSLOG_INTERNAL_PARTS + 2, log_array,
|
||||
log_data))
|
||||
DBUG_RETURN(1);
|
||||
}
|
||||
|
||||
@ -1388,7 +1405,6 @@ static my_bool free_full_pages(MARIA_HA *info, MARIA_ROW *row)
|
||||
size_t extents_length= row->extents_count * ROW_EXTENT_SIZE;
|
||||
DBUG_ENTER("free_full_pages");
|
||||
|
||||
fileid_store(log_data, info->dfile.file);
|
||||
pagerange_store(log_data + FILEID_STORE_SIZE,
|
||||
row->extents_count);
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
|
||||
@ -1397,7 +1413,8 @@ static my_bool free_full_pages(MARIA_HA *info, MARIA_ROW *row)
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 1].length= extents_length;
|
||||
if (translog_write_record(&lsn, LOGREC_REDO_PURGE_BLOCKS, info->trn,
|
||||
info->s, sizeof(log_data) + extents_length,
|
||||
TRANSLOG_INTERNAL_PARTS + 2, log_array))
|
||||
TRANSLOG_INTERNAL_PARTS + 2, log_array,
|
||||
log_data))
|
||||
DBUG_RETURN(1);
|
||||
|
||||
DBUG_RETURN (_ma_bitmap_free_full_pages(info, row->extents,
|
||||
@ -1431,7 +1448,6 @@ static my_bool free_full_page_range(MARIA_HA *info, ulonglong page, uint count)
|
||||
{
|
||||
LSN lsn;
|
||||
DBUG_ASSERT(info->trn->rec_lsn);
|
||||
fileid_store(log_data, info->dfile.file);
|
||||
pagerange_store(log_data + FILEID_STORE_SIZE, 1);
|
||||
int5store(log_data + FILEID_STORE_SIZE + PAGERANGE_STORE_SIZE,
|
||||
page);
|
||||
@ -1442,7 +1458,8 @@ static my_bool free_full_page_range(MARIA_HA *info, ulonglong page, uint count)
|
||||
|
||||
if (translog_write_record(&lsn, LOGREC_REDO_PURGE_BLOCKS,
|
||||
info->trn, info->s, sizeof(log_data),
|
||||
TRANSLOG_INTERNAL_PARTS + 1, log_array))
|
||||
TRANSLOG_INTERNAL_PARTS + 1, log_array,
|
||||
log_data))
|
||||
res= 1;
|
||||
|
||||
}
|
||||
@ -1455,24 +1472,25 @@ static my_bool free_full_page_range(MARIA_HA *info, ulonglong page, uint count)
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
Write a record to a (set of) pages
|
||||
/**
|
||||
@brief Write a record to a (set of) pages
|
||||
|
||||
SYNOPSIS
|
||||
write_block_record()
|
||||
info Maria handler
|
||||
old_record Orignal record in case of update; NULL in case of insert
|
||||
record Record we should write
|
||||
row Statistics about record (calculated by calc_record_size())
|
||||
map_blocks On which pages the record should be stored
|
||||
row_pos Position on head page where to put head part of record
|
||||
@param info Maria handler
|
||||
@param old_record Original record in case of update; NULL in case of
|
||||
insert
|
||||
@param record Record we should write
|
||||
@param row Statistics about record (calculated by
|
||||
calc_record_size())
|
||||
@param map_blocks On which pages the record should be stored
|
||||
@param row_pos Position on head page where to put head part of
|
||||
record
|
||||
|
||||
NOTES
|
||||
On return all pinned pages are released.
|
||||
@note
|
||||
On return all pinned pages are released.
|
||||
|
||||
RETURN
|
||||
0 ok
|
||||
1 error
|
||||
@return Operation status
|
||||
@retval 0 OK
|
||||
@retval 1 Error
|
||||
*/
|
||||
|
||||
static my_bool write_block_record(MARIA_HA *info,
|
||||
@ -1940,7 +1958,6 @@ static my_bool write_block_record(MARIA_HA *info,
|
||||
size_t data_length= (size_t) (data - row_pos->data);
|
||||
|
||||
/* Log REDO changes of head page */
|
||||
fileid_store(log_data, info->dfile.file);
|
||||
page_store(log_data+ FILEID_STORE_SIZE, head_block->page);
|
||||
dirpos_store(log_data+ FILEID_STORE_SIZE + PAGE_STORE_SIZE,
|
||||
row_pos->rownr);
|
||||
@ -1950,7 +1967,8 @@ static my_bool write_block_record(MARIA_HA *info,
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 1].length= data_length;
|
||||
if (translog_write_record(&lsn, LOGREC_REDO_INSERT_ROW_HEAD, info->trn,
|
||||
share, sizeof(log_data) + data_length,
|
||||
TRANSLOG_INTERNAL_PARTS + 2, log_array))
|
||||
TRANSLOG_INTERNAL_PARTS + 2, log_array,
|
||||
log_data))
|
||||
goto disk_err;
|
||||
}
|
||||
|
||||
@ -2010,7 +2028,6 @@ static my_bool write_block_record(MARIA_HA *info,
|
||||
NullS))
|
||||
goto disk_err;
|
||||
}
|
||||
fileid_store(log_data, info->dfile.file);
|
||||
log_pos= log_data + FILEID_STORE_SIZE;
|
||||
log_array_pos= log_array+ TRANSLOG_INTERNAL_PARTS+1;
|
||||
|
||||
@ -2068,7 +2085,7 @@ static my_bool write_block_record(MARIA_HA *info,
|
||||
error= translog_write_record(&lsn, LOGREC_REDO_INSERT_ROW_BLOBS,
|
||||
info->trn, share, log_entry_length,
|
||||
(uint) (log_array_pos - log_array),
|
||||
log_array);
|
||||
log_array, log_data);
|
||||
if (log_array != tmp_log_array)
|
||||
my_free((gptr) log_array, MYF(0));
|
||||
if (error)
|
||||
@ -2084,7 +2101,6 @@ static my_bool write_block_record(MARIA_HA *info,
|
||||
|
||||
/* LOGREC_UNDO_ROW_INSERT & LOGREC_UNDO_ROW_INSERT share same header */
|
||||
lsn_store(log_data, info->trn->undo_lsn);
|
||||
fileid_store(log_data + LSN_STORE_SIZE, info->dfile.file);
|
||||
page_store(log_data+ LSN_STORE_SIZE + FILEID_STORE_SIZE,
|
||||
head_block->page);
|
||||
dirpos_store(log_data+ LSN_STORE_SIZE + FILEID_STORE_SIZE +
|
||||
@ -2099,7 +2115,8 @@ static my_bool write_block_record(MARIA_HA *info,
|
||||
/* Write UNDO log record for the INSERT */
|
||||
if (translog_write_record(&lsn, LOGREC_UNDO_ROW_INSERT,
|
||||
info->trn, share, sizeof(log_data),
|
||||
TRANSLOG_INTERNAL_PARTS + 1, log_array))
|
||||
TRANSLOG_INTERNAL_PARTS + 1, log_array,
|
||||
log_data + LSN_STORE_SIZE))
|
||||
goto disk_err;
|
||||
}
|
||||
else
|
||||
@ -2114,7 +2131,7 @@ static my_bool write_block_record(MARIA_HA *info,
|
||||
if (translog_write_record(&lsn, LOGREC_UNDO_ROW_UPDATE, info->trn,
|
||||
share, sizeof(log_data) + row_length,
|
||||
TRANSLOG_INTERNAL_PARTS + 1 + row_parts_count,
|
||||
log_array))
|
||||
log_array, log_data + LSN_STORE_SIZE))
|
||||
goto disk_err;
|
||||
}
|
||||
}
|
||||
@ -2164,6 +2181,15 @@ crashed:
|
||||
my_errno= HA_ERR_WRONG_IN_RECORD;
|
||||
|
||||
disk_err:
|
||||
/**
|
||||
@todo RECOVERY we are going to let dirty pages go to disk while we have
|
||||
logged UNDO, this violates WAL. If we have not written any full pages,
|
||||
all dirty pages are pinned so we could just delete them from the
|
||||
pagecache. Moreover, we have written some REDOs without a closing UNDO,
|
||||
it's possible that a next operation by this transaction succeeds and then
|
||||
Recovery would glue the "orphan REDOs" to the succeeded operation and
|
||||
execute the failed REDOs.
|
||||
*/
|
||||
/* Unpin all pinned pages to not cause problems for disk cache */
|
||||
_ma_unpin_all_pages(info, 0);
|
||||
|
||||
@ -2229,20 +2255,18 @@ my_bool _ma_write_block_record(MARIA_HA *info __attribute__ ((unused)),
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
Remove row written by _ma_write_block_record
|
||||
/**
|
||||
@brief Remove row written by _ma_write_block_record()
|
||||
|
||||
SYNOPSIS
|
||||
_ma_abort_write_block_record()
|
||||
info Maria handler
|
||||
@param info Maria handler
|
||||
|
||||
INFORMATION
|
||||
This is called in case we got a duplicate unique key while
|
||||
writing keys.
|
||||
@note
|
||||
This is called in case we got a duplicate unique key while
|
||||
writing keys.
|
||||
|
||||
RETURN
|
||||
0 ok
|
||||
1 error
|
||||
@return Operation status
|
||||
@retval 0 OK
|
||||
@retval 1 Error
|
||||
*/
|
||||
|
||||
my_bool _ma_write_abort_block_record(MARIA_HA *info)
|
||||
@ -2288,16 +2312,19 @@ my_bool _ma_write_abort_block_record(MARIA_HA *info)
|
||||
really undo a failed insert. Note that this UNDO will cause recover
|
||||
to ignore the LOGREC_UNDO_ROW_INSERT that is the previous entry
|
||||
in the UNDO chain.
|
||||
We will soon change that: we will here execute the UNDO records
|
||||
generated while we were trying to write the row; this will log some CLRs
|
||||
which will replace this LOGREC_UNDO_PURGE. RECOVERY TODO BUG.
|
||||
*/
|
||||
/**
|
||||
@todo RECOVERY BUG
|
||||
We will soon change that: we will here execute the UNDO records
|
||||
generated while we were trying to write the row; this will log some
|
||||
CLRs which will replace this LOGREC_UNDO_PURGE.
|
||||
*/
|
||||
lsn_store(log_data, info->trn->undo_lsn);
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
|
||||
if (translog_write_record(&lsn, LOGREC_UNDO_ROW_PURGE,
|
||||
info->trn, info->s, sizeof(log_data),
|
||||
TRANSLOG_INTERNAL_PARTS + 1, log_array))
|
||||
info->trn, NULL, sizeof(log_data),
|
||||
TRANSLOG_INTERNAL_PARTS + 1, log_array, NULL))
|
||||
res= 1;
|
||||
}
|
||||
_ma_unpin_all_pages(info, info->trn->undo_lsn);
|
||||
@ -2514,7 +2541,6 @@ static my_bool delete_head_or_tail(MARIA_HA *info,
|
||||
DBUG_ASSERT(share->pagecache->block_size == block_size);
|
||||
|
||||
/* Log REDO data */
|
||||
fileid_store(log_data, info->dfile.file);
|
||||
page_store(log_data+ FILEID_STORE_SIZE, page);
|
||||
dirpos_store(log_data+ FILEID_STORE_SIZE + PAGE_STORE_SIZE,
|
||||
record_number);
|
||||
@ -2524,7 +2550,8 @@ static my_bool delete_head_or_tail(MARIA_HA *info,
|
||||
if (translog_write_record(&lsn, (head ? LOGREC_REDO_PURGE_ROW_HEAD :
|
||||
LOGREC_REDO_PURGE_ROW_TAIL),
|
||||
info->trn, share, sizeof(log_data),
|
||||
TRANSLOG_INTERNAL_PARTS + 1, log_array))
|
||||
TRANSLOG_INTERNAL_PARTS + 1, log_array,
|
||||
log_data))
|
||||
DBUG_RETURN(1);
|
||||
if (pagecache_write(share->pagecache,
|
||||
&info->dfile, page, 0,
|
||||
@ -2545,7 +2572,6 @@ static my_bool delete_head_or_tail(MARIA_HA *info,
|
||||
PAGE_STORE_SIZE + PAGERANGE_STORE_SIZE];
|
||||
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
|
||||
|
||||
fileid_store(log_data, info->dfile.file);
|
||||
pagerange_store(log_data + FILEID_STORE_SIZE, 1);
|
||||
page_store(log_data+ FILEID_STORE_SIZE + PAGERANGE_STORE_SIZE, page);
|
||||
pagerange_store(log_data + FILEID_STORE_SIZE + PAGERANGE_STORE_SIZE +
|
||||
@ -2554,7 +2580,8 @@ static my_bool delete_head_or_tail(MARIA_HA *info,
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
|
||||
if (translog_write_record(&lsn, LOGREC_REDO_PURGE_BLOCKS,
|
||||
info->trn, share, sizeof(log_data),
|
||||
TRANSLOG_INTERNAL_PARTS + 1, log_array))
|
||||
TRANSLOG_INTERNAL_PARTS + 1, log_array,
|
||||
log_data))
|
||||
DBUG_RETURN(1);
|
||||
DBUG_ASSERT(empty_space >= info->s->bitmap.sizes[0]);
|
||||
}
|
||||
@ -2631,7 +2658,6 @@ my_bool _ma_delete_block_record(MARIA_HA *info, const byte *record)
|
||||
|
||||
/* Write UNDO record */
|
||||
lsn_store(log_data, info->trn->undo_lsn);
|
||||
fileid_store(log_data+ LSN_STORE_SIZE, info->dfile.file);
|
||||
page_store(log_data+ LSN_STORE_SIZE + FILEID_STORE_SIZE, page);
|
||||
dirpos_store(log_data+ LSN_STORE_SIZE + FILEID_STORE_SIZE +
|
||||
PAGE_STORE_SIZE, record_number);
|
||||
@ -2645,7 +2671,7 @@ my_bool _ma_delete_block_record(MARIA_HA *info, const byte *record)
|
||||
if (translog_write_record(&lsn, LOGREC_UNDO_ROW_DELETE, info->trn,
|
||||
info->s, sizeof(log_data) + row_length,
|
||||
TRANSLOG_INTERNAL_PARTS + 1 + row_parts_count,
|
||||
info->log_row_parts))
|
||||
info->log_row_parts, log_data + LSN_STORE_SIZE))
|
||||
goto err;
|
||||
|
||||
}
|
||||
|
@ -96,7 +96,7 @@ enum en_page_type { UNALLOCATED_PAGE, HEAD_PAGE, TAIL_PAGE, BLOB_PAGE, MAX_PAGE_
|
||||
/******* defines that affects allocation (density) of data *******/
|
||||
|
||||
/*
|
||||
If the tail part (from the main block or a blob) uses more than 75 % of
|
||||
If the tail part (from the main block or a blob) would use more than 75 % of
|
||||
the size of page, store the tail on a full page instead of a shared
|
||||
tail page.
|
||||
*/
|
||||
|
@ -53,6 +53,7 @@
|
||||
#endif
|
||||
#include "ma_rt_index.h"
|
||||
#include "ma_blockrec.h"
|
||||
#include "trnman_public.h"
|
||||
|
||||
/* Functions defined in this file */
|
||||
|
||||
@ -2132,11 +2133,15 @@ err:
|
||||
/* Replace the actual file with the temporary file */
|
||||
if (new_file >= 0)
|
||||
{
|
||||
myf sync_dir= (share->base.transactional && !share->temporary) ?
|
||||
MY_SYNC_DIR : 0;
|
||||
my_close(new_file,MYF(0));
|
||||
info->dfile.file= new_file= -1;
|
||||
if (maria_change_to_newfile(share->data_file_name,MARIA_NAME_DEXT,
|
||||
DATA_TMP_EXT, (param->testflag & T_BACKUP_DATA ?
|
||||
MYF(MY_REDEL_MAKE_BACKUP): MYF(0))) ||
|
||||
DATA_TMP_EXT,
|
||||
MYF((param->testflag & T_BACKUP_DATA ?
|
||||
MY_REDEL_MAKE_BACKUP : 0) |
|
||||
sync_dir)) ||
|
||||
_ma_open_datafile(info,share,-1))
|
||||
got_error=1;
|
||||
}
|
||||
@ -2328,6 +2333,8 @@ int maria_sort_index(HA_CHECK *param, register MARIA_HA *info, my_string name)
|
||||
int old_lock;
|
||||
MARIA_SHARE *share=info->s;
|
||||
MARIA_STATE_INFO old_state;
|
||||
myf sync_dir= (share->base.transactional && !share->temporary) ?
|
||||
MY_SYNC_DIR : 0;
|
||||
DBUG_ENTER("maria_sort_index");
|
||||
|
||||
/* cannot sort index files with R-tree indexes */
|
||||
@ -2388,7 +2395,7 @@ int maria_sort_index(HA_CHECK *param, register MARIA_HA *info, my_string name)
|
||||
share->kfile.file = -1;
|
||||
VOID(my_close(new_file,MYF(MY_WME)));
|
||||
if (maria_change_to_newfile(share->index_file_name, MARIA_NAME_IEXT,
|
||||
INDEX_TMP_EXT, MYF(0)) ||
|
||||
INDEX_TMP_EXT, sync_dir) ||
|
||||
_ma_open_keyfile(share))
|
||||
goto err2;
|
||||
info->lock_type= F_UNLCK; /* Force maria_readinfo to lock */
|
||||
@ -2604,6 +2611,8 @@ int maria_repair_by_sort(HA_CHECK *param, register MARIA_HA *info,
|
||||
char llbuff[22];
|
||||
MARIA_SORT_INFO sort_info;
|
||||
ulonglong key_map=share->state.key_map;
|
||||
myf sync_dir= (share->base.transactional && !share->temporary) ?
|
||||
MY_SYNC_DIR : 0;
|
||||
DBUG_ENTER("maria_repair_by_sort");
|
||||
|
||||
start_records=info->state->records;
|
||||
@ -2922,8 +2931,9 @@ err:
|
||||
info->dfile.file= new_file= -1;
|
||||
if (maria_change_to_newfile(share->data_file_name,MARIA_NAME_DEXT,
|
||||
DATA_TMP_EXT,
|
||||
(param->testflag & T_BACKUP_DATA ?
|
||||
MYF(MY_REDEL_MAKE_BACKUP): MYF(0))) ||
|
||||
MYF((param->testflag & T_BACKUP_DATA ?
|
||||
MY_REDEL_MAKE_BACKUP : 0) |
|
||||
sync_dir)) ||
|
||||
_ma_open_datafile(info,share,-1))
|
||||
got_error=1;
|
||||
}
|
||||
@ -3022,6 +3032,8 @@ int maria_repair_parallel(HA_CHECK *param, register MARIA_HA *info,
|
||||
MARIA_SORT_INFO sort_info;
|
||||
ulonglong key_map=share->state.key_map;
|
||||
pthread_attr_t thr_attr;
|
||||
myf sync_dir= (share->base.transactional && !share->temporary) ?
|
||||
MY_SYNC_DIR : 0;
|
||||
DBUG_ENTER("maria_repair_parallel");
|
||||
|
||||
start_records=info->state->records;
|
||||
@ -3445,8 +3457,9 @@ err:
|
||||
info->dfile.file= new_file= -1;
|
||||
if (maria_change_to_newfile(share->data_file_name,MARIA_NAME_DEXT,
|
||||
DATA_TMP_EXT,
|
||||
(param->testflag & T_BACKUP_DATA ?
|
||||
MYF(MY_REDEL_MAKE_BACKUP): MYF(0))) ||
|
||||
MYF((param->testflag & T_BACKUP_DATA ?
|
||||
MY_REDEL_MAKE_BACKUP : 0) |
|
||||
sync_dir)) ||
|
||||
_ma_open_datafile(info,share,-1))
|
||||
got_error=1;
|
||||
}
|
||||
@ -5135,3 +5148,64 @@ static void restore_data_file_type(MARIA_SHARE *share)
|
||||
share->data_file_type= share->state.header.data_file_type=
|
||||
share->pack.header_length= 0;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
@brief Writes a LOGREC_REPAIR_TABLE record and updates create_rename_lsn
|
||||
|
||||
REPAIR/OPTIMIZE have replaced the data/index file with a new file
|
||||
and so, in this scenario:
|
||||
@verbatim
|
||||
CHECKPOINT - REDO_INSERT - COMMIT - ... - REPAIR - ... - crash
|
||||
@endverbatim
|
||||
we do not want Recovery to apply the REDO_INSERT to the table, as it would
|
||||
then possibly wrongly extend the table. By updating create_rename_lsn at
|
||||
the end of REPAIR, we know that REDO_INSERT will be skipped.
|
||||
|
||||
@param param description of the REPAIR operation
|
||||
@param info table
|
||||
|
||||
@return Operation status
|
||||
@retval 0 ok
|
||||
@retval 1 error (disk problem)
|
||||
*/
|
||||
|
||||
int _ma_repair_write_log_record(const HA_CHECK *param, MARIA_HA *info)
|
||||
{
|
||||
MARIA_SHARE *share= info->s;
|
||||
/* Only called from ha_maria.cc, not maria_check, so translog is inited */
|
||||
if (share->base.transactional && !share->temporary)
|
||||
{
|
||||
/* For now this record is only informative */
|
||||
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
|
||||
uchar log_data[LSN_STORE_SIZE];
|
||||
compile_time_assert(LSN_STORE_SIZE >= (FILEID_STORE_SIZE + 4));
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= FILEID_STORE_SIZE + 4;
|
||||
/*
|
||||
testflag gives an idea of what REPAIR did (in particular T_QUICK
|
||||
or not: did it touch the data file or not?).
|
||||
*/
|
||||
int4store(log_data + FILEID_STORE_SIZE, param->testflag);
|
||||
if (unlikely(translog_write_record(&share->state.create_rename_lsn,
|
||||
LOGREC_REDO_REPAIR_TABLE,
|
||||
&dummy_transaction_object, share,
|
||||
log_array[TRANSLOG_INTERNAL_PARTS +
|
||||
0].length,
|
||||
sizeof(log_array)/sizeof(log_array[0]),
|
||||
log_array, log_data)))
|
||||
return 1;
|
||||
/*
|
||||
But this piece is really needed, to have the new table's content durable
|
||||
and to not apply old REDOs to the new table. The table's existence was
|
||||
made durable earlier (MY_SYNC_DIR passed to maria_change_to_newfile()).
|
||||
*/
|
||||
lsn_store(log_data, share->state.create_rename_lsn);
|
||||
DBUG_ASSERT(info->dfile.file >= 0);
|
||||
DBUG_ASSERT(share->kfile.file >= 0);
|
||||
return (my_pwrite(share->kfile.file, log_data, sizeof(log_data),
|
||||
sizeof(share->state.header) + 2, MYF(MY_NABP)) ||
|
||||
_ma_sync_table_files(info));
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
|
@ -57,14 +57,6 @@ int maria_close(register MARIA_HA *info)
|
||||
info->opt_flag&= ~(READ_CACHE_USED | WRITE_CACHE_USED);
|
||||
}
|
||||
flag= !--share->reopen;
|
||||
/*
|
||||
RECOVERY TODO:
|
||||
If "flag" is TRUE, in the line below we are going to make the table
|
||||
unknown to future checkpoints, so it needs to have fsync'ed itself
|
||||
entirely (bitmap, pages, etc) at this point.
|
||||
The flushing is currently done a few lines further (which is ok, as we
|
||||
still hold THR_LOCK_maria), but syncing is missing.
|
||||
*/
|
||||
maria_open_list=list_delete(maria_open_list,&info->open_list);
|
||||
pthread_mutex_unlock(&share->intern_lock);
|
||||
|
||||
@ -82,7 +74,12 @@ int maria_close(register MARIA_HA *info)
|
||||
FLUSH_IGNORE_CHANGED :
|
||||
FLUSH_RELEASE)))
|
||||
error= my_errno;
|
||||
|
||||
/*
|
||||
File must be synced as it is going out of the maria_open_list and so
|
||||
becoming unknown to Checkpoint.
|
||||
*/
|
||||
if (my_sync(share->kfile.file, MYF(MY_WME)))
|
||||
error= my_errno;
|
||||
/*
|
||||
If we are crashed, we can safely flush the current state as it will
|
||||
not change the crashed state.
|
||||
|
71
storage/maria/ma_commit.c
Normal file
71
storage/maria/ma_commit.c
Normal file
@ -0,0 +1,71 @@
|
||||
/* Copyright (C) 2007 MySQL AB
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; version 2 of the License.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
|
||||
|
||||
#include "maria_def.h"
|
||||
#include "trnman.h"
|
||||
|
||||
/**
|
||||
@brief writes a COMMIT record to log and commits transaction in memory
|
||||
|
||||
@param trn transaction
|
||||
|
||||
@return Operation status
|
||||
@retval 0 ok
|
||||
@retval 1 error (disk error or out of memory)
|
||||
*/
|
||||
|
||||
int ma_commit(TRN *trn)
|
||||
{
|
||||
if (trn->undo_lsn == 0) /* no work done, rollback (cheaper than commit) */
|
||||
return trnman_rollback_trn(trn);
|
||||
/*
|
||||
- if COMMIT record is written before trnman_commit_trn():
|
||||
if Checkpoint comes in the middle it will see trn is not committed,
|
||||
then if crash, Recovery might roll back trn (if min(rec_lsn) is after
|
||||
COMMIT record) and this is not an issue as
|
||||
* transaction's updates were not made visible to other transactions
|
||||
* "commit ok" was not sent to client
|
||||
Alternatively, Recovery might commit trn (if min(rec_lsn) is before COMMIT
|
||||
record), which is ok too. All in all it means that "trn committed" is not
|
||||
100% equal to "COMMIT record written".
|
||||
- if COMMIT record is written after trnman_commit_trn():
|
||||
if crash happens between the two, trn will be rolled back which is an
|
||||
issue (transaction's updates were made visible to other transactions).
|
||||
So we need to go the first way.
|
||||
*/
|
||||
/**
|
||||
@todo RECOVERY share's state is written to disk only in
|
||||
maria_lock_database(), so COMMIT record is not the last record of the
|
||||
transaction! It is probably an issue. Recovery of the state is a problem
|
||||
not yet solved.
|
||||
*/
|
||||
LSN commit_lsn;
|
||||
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS];
|
||||
/*
|
||||
We do not store "thd->transaction.xid_state.xid" for now, it will be
|
||||
needed only when we support XA.
|
||||
*/
|
||||
return
|
||||
translog_write_record(&commit_lsn, LOGREC_COMMIT,
|
||||
trn, NULL, 0,
|
||||
sizeof(log_array)/sizeof(log_array[0]),
|
||||
log_array, NULL) ||
|
||||
translog_flush(commit_lsn) || trnman_commit_trn(trn);
|
||||
/*
|
||||
Note: if trnman_commit_trn() fails above, we have already
|
||||
written the COMMIT record, so Checkpoint and Recovery will see the
|
||||
transaction as committed.
|
||||
*/
|
||||
}
|
18
storage/maria/ma_commit.h
Normal file
18
storage/maria/ma_commit.h
Normal file
@ -0,0 +1,18 @@
|
||||
/* Copyright (C) 2007 MySQL AB
|
||||
|
||||
This program is free software; you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License as published by
|
||||
the Free Software Foundation; version 2 of the License.
|
||||
|
||||
This program is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
|
||||
|
||||
C_MODE_START
|
||||
int ma_commit(TRN *trn);
|
||||
C_MODE_END
|
@ -50,6 +50,13 @@
|
||||
LSN last_checkpoint_lsn;
|
||||
uint32 last_logno;
|
||||
|
||||
/**
|
||||
@brief If log's lock should be asserted when writing to control file.
|
||||
|
||||
Can be re-used by any function which needs to be thread-safe except when
|
||||
it is called at startup.
|
||||
*/
|
||||
my_bool maria_multi_threaded= FALSE;
|
||||
|
||||
/*
|
||||
Control file is less then 512 bytes (a disk sector),
|
||||
@ -203,6 +210,8 @@ err:
|
||||
the last_checkpoint_lsn and last_logno global variables.
|
||||
Called when we have created a new log (after syncing this log's creation)
|
||||
and when we have written a checkpoint (after syncing this log record).
|
||||
Variables last_checkpoint_lsn and last_logno must be protected by caller
|
||||
using log's lock, unless this function is called at startup.
|
||||
|
||||
SYNOPSIS
|
||||
ma_control_file_write_and_force()
|
||||
@ -233,12 +242,14 @@ int ma_control_file_write_and_force(const LSN checkpoint_lsn, uint32 logno,
|
||||
DBUG_ENTER("ma_control_file_write_and_force");
|
||||
|
||||
DBUG_ASSERT(control_file_fd >= 0); /* must be open */
|
||||
#ifndef DBUG_OFF
|
||||
if (maria_multi_threaded)
|
||||
translog_lock_assert_owner();
|
||||
#endif
|
||||
|
||||
memcpy(buffer + CONTROL_FILE_MAGIC_STRING_OFFSET,
|
||||
CONTROL_FILE_MAGIC_STRING, CONTROL_FILE_MAGIC_STRING_SIZE);
|
||||
|
||||
/* TODO: you need some protection to be able to read last_* global vars */
|
||||
|
||||
if (objs_to_write == CONTROL_FILE_UPDATE_ONLY_LSN)
|
||||
update_checkpoint_lsn= TRUE;
|
||||
else if (objs_to_write == CONTROL_FILE_UPDATE_ONLY_LOGNO)
|
||||
@ -270,7 +281,6 @@ int ma_control_file_write_and_force(const LSN checkpoint_lsn, uint32 logno,
|
||||
my_sync(control_file_fd, MYF(MY_WME)))
|
||||
DBUG_RETURN(1);
|
||||
|
||||
/* TODO: you need some protection to be able to write last_* global vars */
|
||||
if (update_checkpoint_lsn)
|
||||
last_checkpoint_lsn= checkpoint_lsn;
|
||||
if (update_logno)
|
||||
|
@ -43,6 +43,8 @@ extern LSN last_checkpoint_lsn;
|
||||
*/
|
||||
extern uint32 last_logno;
|
||||
|
||||
extern my_bool maria_multi_threaded;
|
||||
|
||||
typedef enum enum_control_file_error {
|
||||
CONTROL_FILE_OK= 0,
|
||||
CONTROL_FILE_TOO_SMALL,
|
||||
|
@ -19,6 +19,7 @@
|
||||
#include "ma_sp_defs.h"
|
||||
#include <my_bit.h>
|
||||
#include "ma_blockrec.h"
|
||||
#include "trnman_public.h"
|
||||
|
||||
#if defined(MSDOS) || defined(__WIN__)
|
||||
#ifdef __WIN__
|
||||
@ -51,7 +52,8 @@ int maria_create(const char *name, enum data_file_type datafile_type,
|
||||
unique_key_parts,fulltext_keys,offset, not_block_record_extra_length;
|
||||
uint max_field_lengths, extra_header_size;
|
||||
ulong reclength, real_reclength,min_pack_length;
|
||||
char filename[FN_REFLEN],linkname[FN_REFLEN], *linkname_ptr;
|
||||
char filename[FN_REFLEN], dlinkname[FN_REFLEN], *dlinkname_ptr= NULL,
|
||||
klinkname[FN_REFLEN], *klinkname_ptr= NULL;
|
||||
ulong pack_reclength;
|
||||
ulonglong tot_length,max_rows, tmp;
|
||||
enum en_fieldtype type;
|
||||
@ -62,11 +64,12 @@ int maria_create(const char *name, enum data_file_type datafile_type,
|
||||
HA_KEYSEG *keyseg,tmp_keyseg;
|
||||
MARIA_COLUMNDEF *column, *end_column;
|
||||
ulong *rec_per_key_part;
|
||||
my_off_t key_root[HA_MAX_POSSIBLE_KEY];
|
||||
my_off_t key_root[HA_MAX_POSSIBLE_KEY], kfile_size_before_extension;
|
||||
MARIA_CREATE_INFO tmp_create_info;
|
||||
my_bool tmp_table= FALSE; /* cache for presence of HA_OPTION_TMP_TABLE */
|
||||
my_bool forced_packed;
|
||||
myf sync_dir= MY_SYNC_DIR;
|
||||
myf sync_dir= 0;
|
||||
uchar *log_data= NULL;
|
||||
DBUG_ENTER("maria_create");
|
||||
DBUG_PRINT("enter", ("keys: %u columns: %u uniques: %u flags: %u",
|
||||
keys, columns, uniques, flags));
|
||||
@ -250,8 +253,9 @@ int maria_create(const char *name, enum data_file_type datafile_type,
|
||||
if (flags & HA_CREATE_TMP_TABLE)
|
||||
{
|
||||
options|= HA_OPTION_TMP_TABLE;
|
||||
tmp_table= TRUE;
|
||||
create_mode|= O_EXCL | O_NOFOLLOW;
|
||||
/* temp tables are not crash-safe (dropped at restart) */
|
||||
/* "CREATE TEMPORARY" tables are not crash-safe (dropped at restart) */
|
||||
ci->transactional= FALSE;
|
||||
}
|
||||
share.base.null_bytes= ci->null_bytes;
|
||||
@ -624,6 +628,7 @@ int maria_create(const char *name, enum data_file_type datafile_type,
|
||||
|
||||
share.state.dellink = HA_OFFSET_ERROR;
|
||||
share.state.first_bitmap_with_space= 0;
|
||||
share.state.create_rename_lsn= 0;
|
||||
share.state.process= (ulong) getpid();
|
||||
share.state.unique= (ulong) 0;
|
||||
share.state.update_count=(ulong) 0;
|
||||
@ -671,11 +676,15 @@ int maria_create(const char *name, enum data_file_type datafile_type,
|
||||
#endif
|
||||
|
||||
/* max_data_file_length and max_key_file_length are recalculated on open */
|
||||
if (options & HA_OPTION_TMP_TABLE)
|
||||
{
|
||||
tmp_table= TRUE;
|
||||
sync_dir= 0;
|
||||
if (tmp_table)
|
||||
share.base.max_data_file_length= (my_off_t) ci->data_file_length;
|
||||
else if (ci->transactional && translog_inited)
|
||||
{
|
||||
/*
|
||||
we have checked translog_inited above, because maria_chk may call us
|
||||
(via maria_recreate_table()) and it does not have a log.
|
||||
*/
|
||||
sync_dir= MY_SYNC_DIR;
|
||||
}
|
||||
|
||||
if (datafile_type == BLOCK_RECORD)
|
||||
@ -712,9 +721,9 @@ int maria_create(const char *name, enum data_file_type datafile_type,
|
||||
MY_UNPACK_FILENAME | (have_iext ? MY_REPLACE_EXT :
|
||||
MY_APPEND_EXT));
|
||||
}
|
||||
fn_format(linkname, name, "", MARIA_NAME_IEXT,
|
||||
fn_format(klinkname, name, "", MARIA_NAME_IEXT,
|
||||
MY_UNPACK_FILENAME|MY_APPEND_EXT);
|
||||
linkname_ptr=linkname;
|
||||
klinkname_ptr= klinkname;
|
||||
/*
|
||||
Don't create the table if the link or file exists to ensure that one
|
||||
doesn't accidently destroy another table.
|
||||
@ -730,7 +739,6 @@ int maria_create(const char *name, enum data_file_type datafile_type,
|
||||
(MY_UNPACK_FILENAME |
|
||||
(flags & HA_DONT_TOUCH_DATA) ? MY_RETURN_REAL_PATH : 0) |
|
||||
MY_APPEND_EXT);
|
||||
linkname_ptr=0;
|
||||
/*
|
||||
Replace the current file.
|
||||
Don't sync dir now if the data file has the same path.
|
||||
@ -753,7 +761,7 @@ int maria_create(const char *name, enum data_file_type datafile_type,
|
||||
goto err;
|
||||
}
|
||||
|
||||
if ((file= my_create_with_symlink(linkname_ptr, filename, 0, create_mode,
|
||||
if ((file= my_create_with_symlink(klinkname_ptr, filename, 0, create_mode,
|
||||
MYF(MY_WME|create_flag))) < 0)
|
||||
goto err;
|
||||
errpos=1;
|
||||
@ -780,24 +788,24 @@ int maria_create(const char *name, enum data_file_type datafile_type,
|
||||
MY_UNPACK_FILENAME |
|
||||
(have_dext ? MY_REPLACE_EXT : MY_APPEND_EXT));
|
||||
}
|
||||
fn_format(linkname, name, "",MARIA_NAME_DEXT,
|
||||
fn_format(dlinkname, name, "",MARIA_NAME_DEXT,
|
||||
MY_UNPACK_FILENAME | MY_APPEND_EXT);
|
||||
linkname_ptr=linkname;
|
||||
dlinkname_ptr= dlinkname;
|
||||
create_flag=0;
|
||||
}
|
||||
else
|
||||
{
|
||||
fn_format(filename,name,"", MARIA_NAME_DEXT,
|
||||
MY_UNPACK_FILENAME | MY_APPEND_EXT);
|
||||
linkname_ptr=0;
|
||||
create_flag=MY_DELETE_OLD;
|
||||
}
|
||||
if ((dfile=
|
||||
my_create_with_symlink(linkname_ptr, filename, 0, create_mode,
|
||||
my_create_with_symlink(dlinkname_ptr, filename, 0, create_mode,
|
||||
MYF(MY_WME | create_flag | sync_dir))) < 0)
|
||||
goto err;
|
||||
errpos=3;
|
||||
|
||||
share.data_file_type= datafile_type;
|
||||
if (_ma_initialize_data_file(dfile, &share))
|
||||
goto err;
|
||||
}
|
||||
@ -925,53 +933,108 @@ int maria_create(const char *name, enum data_file_type datafile_type,
|
||||
goto err;
|
||||
}
|
||||
|
||||
if ((kfile_size_before_extension= my_tell(file,MYF(0))) == MY_FILEPOS_ERROR)
|
||||
goto err;
|
||||
#ifndef DBUG_OFF
|
||||
if ((uint) my_tell(file,MYF(0)) != info_length)
|
||||
{
|
||||
uint pos= (uint) my_tell(file,MYF(0));
|
||||
DBUG_PRINT("warning",("info_length: %d != used_length: %d",
|
||||
info_length, pos));
|
||||
}
|
||||
if (kfile_size_before_extension != info_length)
|
||||
DBUG_PRINT("warning",("info_length: %u != used_length: %u",
|
||||
info_length, (uint)kfile_size_before_extension));
|
||||
#endif
|
||||
|
||||
if (sync_dir)
|
||||
{
|
||||
/*
|
||||
we log the first bytes and then the size to which we extend; this is
|
||||
not log 1 KB of mostly zeroes if this is a small table.
|
||||
*/
|
||||
char empty_string[]= "";
|
||||
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 3];
|
||||
uint total_rec_length= 0;
|
||||
uint i;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= 1 + 2 +
|
||||
kfile_size_before_extension;
|
||||
/* we are needing maybe 64 kB, so don't use the stack */
|
||||
log_data= my_malloc(log_array[TRANSLOG_INTERNAL_PARTS + 0].length, MYF(0));
|
||||
if ((log_data == NULL) ||
|
||||
my_pread(file, 1 + 2 + log_data, kfile_size_before_extension,
|
||||
0, MYF(MY_NABP)))
|
||||
goto err_no_lock;
|
||||
/*
|
||||
remember if the data file was created or not, to know if Recovery can
|
||||
do it or not, in the future
|
||||
*/
|
||||
log_data[0]= test(flags & HA_DONT_TOUCH_DATA);
|
||||
int2store(log_data + 1, kfile_size_before_extension);
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= log_data;
|
||||
/* symlink description is also needed for re-creation by Recovery: */
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 1].str=
|
||||
dlinkname_ptr ? dlinkname : empty_string;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 1].length=
|
||||
strlen(log_array[TRANSLOG_INTERNAL_PARTS + 1].str);
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 2].str=
|
||||
klinkname_ptr ? klinkname : empty_string;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 2].length=
|
||||
strlen(log_array[TRANSLOG_INTERNAL_PARTS + 2].str);
|
||||
for (i= TRANSLOG_INTERNAL_PARTS;
|
||||
i < (sizeof(log_array)/sizeof(log_array[0])); i++)
|
||||
total_rec_length+= log_array[i].length;
|
||||
/*
|
||||
For this record to be of any use for Recovery, we need the upper
|
||||
MySQL layer to be crash-safe, which it is not now (that would require
|
||||
work using the ddl_log of sql/sql_table.cc); when it is, we should
|
||||
reconsider the moment of writing this log record (before or after op,
|
||||
under THR_LOCK_maria or not...), how to use it in Recovery, and force
|
||||
the log. For now this record is just informative.
|
||||
Note that in case of TRUNCATE TABLE we also come here.
|
||||
When in CREATE/TRUNCATE (or DROP or RENAME or REPAIR) we have not called
|
||||
external_lock(), so have no TRN. It does not matter, as all these
|
||||
operations are non-transactional and sync their files.
|
||||
*/
|
||||
if (unlikely(translog_write_record(&share.state.create_rename_lsn,
|
||||
LOGREC_REDO_CREATE_TABLE,
|
||||
&dummy_transaction_object, NULL,
|
||||
total_rec_length,
|
||||
sizeof(log_array)/sizeof(log_array[0]),
|
||||
log_array, NULL)))
|
||||
goto err_no_lock;
|
||||
/*
|
||||
store LSN into file, needed for Recovery to not be confused if a
|
||||
DROP+CREATE happened (applying REDOs to the wrong table).
|
||||
If such direct my_pwrite() to a fixed offset is too "hackish", I can
|
||||
call ma_state_info_write() again but it will be less efficient.
|
||||
*/
|
||||
lsn_store(log_data, share.state.create_rename_lsn);
|
||||
if (my_pwrite(file, log_data, LSN_STORE_SIZE,
|
||||
sizeof(share.state.header) + 2, MYF(MY_NABP)))
|
||||
goto err_no_lock;
|
||||
my_free(log_data, MYF(0));
|
||||
}
|
||||
|
||||
/* Enlarge files */
|
||||
DBUG_PRINT("info", ("enlarge to keystart: %lu",
|
||||
(ulong) share.base.keystart));
|
||||
if (my_chsize(file,(ulong) share.base.keystart,0,MYF(0)))
|
||||
goto err;
|
||||
|
||||
if (sync_dir && my_sync(file, MYF(0)))
|
||||
goto err;
|
||||
|
||||
if (! (flags & HA_DONT_TOUCH_DATA))
|
||||
{
|
||||
#ifdef USE_RELOC
|
||||
if (my_chsize(dfile,share.base.min_pack_length*ci->reloc_rows,0,MYF(0)))
|
||||
goto err;
|
||||
if (!tmp_table && my_sync(file, MYF(0)))
|
||||
goto err;
|
||||
#endif
|
||||
/* if !USE_RELOC, there was no write to the file, no need to sync it */
|
||||
errpos=2;
|
||||
if (my_close(dfile,MYF(0)))
|
||||
if ((sync_dir && my_sync(dfile, MYF(0))) || my_close(dfile,MYF(0)))
|
||||
goto err;
|
||||
}
|
||||
errpos=0;
|
||||
pthread_mutex_unlock(&THR_LOCK_maria);
|
||||
res= 0;
|
||||
my_free((char*) rec_per_key_part,MYF(0));
|
||||
errpos=0;
|
||||
if (my_close(file,MYF(0)))
|
||||
res= my_errno;
|
||||
/*
|
||||
RECOVERY TODO
|
||||
Write a log record describing the CREATE operation (just the file
|
||||
names, link names, and the full header's content).
|
||||
For this record to be of any use for Recovery, we need the upper
|
||||
MySQL layer to be crash-safe, which it is not now (that would require work
|
||||
using the ddl_log of sql/sql_table.cc); when is is, we should reconsider
|
||||
the moment of writing this log record (before or after op, under
|
||||
THR_LOCK_maria or not...), how to use it in Recovery, and force the log.
|
||||
For now this record is just informative.
|
||||
If operation failed earlier, we clean up in "err:" and the MySQL layer
|
||||
will clean up the frm, so we needn't write anything to the log.
|
||||
*/
|
||||
my_free((char*) rec_per_key_part,MYF(0));
|
||||
DBUG_RETURN(res);
|
||||
|
||||
err:
|
||||
@ -996,6 +1059,7 @@ err_no_lock:
|
||||
MY_UNPACK_FILENAME | MY_APPEND_EXT),
|
||||
sync_dir);
|
||||
}
|
||||
my_free(log_data, MYF(MY_ALLOW_ZERO_PTR));
|
||||
my_free((char*) rec_per_key_part, MYF(0));
|
||||
DBUG_RETURN(my_errno=save_errno); /* return the fatal errno */
|
||||
}
|
||||
@ -1086,9 +1150,9 @@ int _ma_initialize_data_file(File dfile, MARIA_SHARE *share)
|
||||
{
|
||||
if (share->data_file_type == BLOCK_RECORD)
|
||||
{
|
||||
if (my_chsize(dfile, maria_block_size, 0, MYF(MY_WME)))
|
||||
if (my_chsize(dfile, share->base.block_size, 0, MYF(MY_WME)))
|
||||
return 1;
|
||||
share->state.state.data_file_length= maria_block_size;
|
||||
share->state.state.data_file_length= share->base.block_size;
|
||||
_ma_bitmap_delete_all(share);
|
||||
}
|
||||
return 0;
|
||||
|
@ -17,21 +17,38 @@
|
||||
/* This clears the status information and truncates files */
|
||||
|
||||
#include "maria_def.h"
|
||||
#include "trnman_public.h"
|
||||
|
||||
/**
|
||||
@brief deletes all rows from a table
|
||||
|
||||
@param info Maria handler
|
||||
|
||||
@return Operation status
|
||||
@retval 0 ok
|
||||
@retval 1 error
|
||||
*/
|
||||
|
||||
int maria_delete_all_rows(MARIA_HA *info)
|
||||
{
|
||||
uint i;
|
||||
MARIA_SHARE *share=info->s;
|
||||
MARIA_STATE_INFO *state=&share->state;
|
||||
my_bool log_record;
|
||||
DBUG_ENTER("maria_delete_all_rows");
|
||||
|
||||
if (share->options & HA_OPTION_READ_ONLY_DATA)
|
||||
{
|
||||
DBUG_RETURN(my_errno=EACCES);
|
||||
}
|
||||
/* LOCK TODO take X-lock on table here */
|
||||
/**
|
||||
@todo LOCK take X-lock on table here.
|
||||
When we have versioning, if some other thread is looking at this table,
|
||||
we cannot shrink the file like this.
|
||||
*/
|
||||
if (_ma_readinfo(info,F_WRLCK,1))
|
||||
DBUG_RETURN(my_errno);
|
||||
log_record= share->base.transactional && !share->temporary;
|
||||
if (_ma_mark_file_changed(info))
|
||||
goto err;
|
||||
|
||||
@ -54,27 +71,13 @@ int maria_delete_all_rows(MARIA_HA *info)
|
||||
*/
|
||||
flush_pagecache_blocks(share->pagecache, &share->kfile,
|
||||
FLUSH_IGNORE_CHANGED);
|
||||
/*
|
||||
RECOVERY TODO Log the two chsize and header modifications and force the
|
||||
log. So that if crash between the two chsize, we finish the work at
|
||||
Recovery. For this scenario:
|
||||
"TRUNCATE TABLE t1; DROP TABLE t1; RENAME TABLE t2 to t1; crash;"
|
||||
Recovery mustn't truncate the new t1, so the log records of TRUNCATE
|
||||
should be applied only if t1 exists and its ZeroDirtyPagesLSN is smaller
|
||||
than the records'. See more comments below.
|
||||
*/
|
||||
if (my_chsize(info->dfile.file, 0, 0, MYF(MY_WME)) ||
|
||||
my_chsize(share->kfile.file, share->base.keystart, 0, MYF(MY_WME)) )
|
||||
goto err;
|
||||
|
||||
if (_ma_initialize_data_file(info->dfile.file, info->s))
|
||||
if (_ma_initialize_data_file(info->dfile.file, share))
|
||||
goto err;
|
||||
|
||||
/*
|
||||
RECOVERY TODO Consider updating ZeroDirtyPagesLSN here. It is
|
||||
not a necessity (it is one only in RENAME commands) but an optional
|
||||
optimization which will allow some REDO skipping at Recovery.
|
||||
*/
|
||||
VOID(_ma_writeinfo(info,WRITEINFO_UPDATE_KEYFILE));
|
||||
#ifdef HAVE_MMAP
|
||||
/* Resize mmaped area */
|
||||
@ -82,24 +85,48 @@ int maria_delete_all_rows(MARIA_HA *info)
|
||||
_ma_remap_file(info, (my_off_t)0);
|
||||
rw_unlock(&info->s->mmap_lock);
|
||||
#endif
|
||||
/*
|
||||
RECOVERY TODO Until we have the TRUNCATE log record and take it into
|
||||
account for log-low-water-mark calculation and use it in Recovery, we need
|
||||
to sync.
|
||||
*/
|
||||
if (_ma_sync_table_files(info))
|
||||
goto err;
|
||||
if (log_record)
|
||||
{
|
||||
/* For now this record is only informative */
|
||||
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
|
||||
uchar log_data[LSN_STORE_SIZE];
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= FILEID_STORE_SIZE;
|
||||
if (unlikely(translog_write_record(&share->state.create_rename_lsn,
|
||||
LOGREC_REDO_DELETE_ALL,
|
||||
info->trn, share, 0,
|
||||
sizeof(log_array)/sizeof(log_array[0]),
|
||||
log_array, log_data)))
|
||||
goto err;
|
||||
/*
|
||||
store LSN into file. It is an optimization so that all old REDOs for
|
||||
this table are ignored (scenario: checkpoint, INSERT1s, DELETE ALL;
|
||||
INSERT2s, crash: then Recovery can skip INSERT1s). It also allows us to
|
||||
ignore the present record at Recovery.
|
||||
Note that storing the LSN could not be done by _ma_writeinfo() above as
|
||||
the table is locked at this moment. So we need to do it by ourselves.
|
||||
*/
|
||||
lsn_store(log_data, share->state.create_rename_lsn);
|
||||
if (my_pwrite(share->kfile.file, log_data, sizeof(log_data),
|
||||
sizeof(share->state.header) + 2, MYF(MY_NABP)) ||
|
||||
_ma_sync_table_files(info))
|
||||
goto err;
|
||||
/**
|
||||
@todo RECOVERY Until we take into account the log record above
|
||||
for log-low-water-mark calculation and use it in Recovery, we need
|
||||
to sync above.
|
||||
*/
|
||||
}
|
||||
allow_break(); /* Allow SIGHUP & SIGINT */
|
||||
DBUG_RETURN(0);
|
||||
|
||||
err:
|
||||
{
|
||||
int save_errno=my_errno;
|
||||
/* RECOVERY TODO log the header modifications */
|
||||
VOID(_ma_writeinfo(info,WRITEINFO_UPDATE_KEYFILE));
|
||||
info->update|=HA_STATE_WRITTEN; /* Buffer changed */
|
||||
/* RECOVERY TODO until we log above we have to sync */
|
||||
if (_ma_sync_table_files(info) && !save_errno)
|
||||
/** @todo RECOVERY until we use the log record above we have to sync */
|
||||
if (log_record &&_ma_sync_table_files(info) && !save_errno)
|
||||
save_errno= my_errno;
|
||||
allow_break(); /* Allow SIGHUP & SIGINT */
|
||||
DBUG_RETURN(my_errno=save_errno);
|
||||
|
@ -13,11 +13,18 @@
|
||||
along with this program; if not, write to the Free Software
|
||||
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */
|
||||
|
||||
/*
|
||||
deletes a table
|
||||
*/
|
||||
|
||||
#include "ma_fulltext.h"
|
||||
#include "trnman_public.h"
|
||||
|
||||
/**
|
||||
@brief drops (deletes) a table
|
||||
|
||||
@param name table's name
|
||||
|
||||
@return Operation status
|
||||
@retval 0 ok
|
||||
@retval 1 error
|
||||
*/
|
||||
|
||||
int maria_delete_table(const char *name)
|
||||
{
|
||||
@ -25,56 +32,78 @@ int maria_delete_table(const char *name)
|
||||
#ifdef USE_RAID
|
||||
uint raid_type=0,raid_chunks=0;
|
||||
#endif
|
||||
MARIA_HA *info;
|
||||
myf sync_dir;
|
||||
DBUG_ENTER("maria_delete_table");
|
||||
|
||||
#ifdef EXTRA_DEBUG
|
||||
_ma_check_table_is_closed(name,"delete");
|
||||
#endif
|
||||
/* LOCK TODO take X-lock on table here */
|
||||
#ifdef USE_RAID
|
||||
/** @todo LOCK take X-lock on table */
|
||||
/*
|
||||
We need to know if this table is transactional.
|
||||
When built with RAID support, we also need to determine if this table
|
||||
makes use of the raid feature. If yes, we need to remove all raid
|
||||
chunks. This is done with my_raid_delete(). Unfortunately it is
|
||||
necessary to open the table just to check this. We use
|
||||
'open_for_repair' to be able to open even a crashed table. If even
|
||||
this open fails, we assume no raid configuration for this table
|
||||
and try to remove the normal data file only. This may however
|
||||
leave the raid chunks behind.
|
||||
*/
|
||||
if (!(info= maria_open(name, O_RDONLY, HA_OPEN_FOR_REPAIR)))
|
||||
{
|
||||
MARIA_HA *info;
|
||||
/*
|
||||
When built with RAID support, we need to determine if this table
|
||||
makes use of the raid feature. If yes, we need to remove all raid
|
||||
chunks. This is done with my_raid_delete(). Unfortunately it is
|
||||
necessary to open the table just to check this. We use
|
||||
'open_for_repair' to be able to open even a crashed table. If even
|
||||
this open fails, we assume no raid configuration for this table
|
||||
and try to remove the normal data file only. This may however
|
||||
leave the raid chunks behind.
|
||||
*/
|
||||
if (!(info= maria_open(name, O_RDONLY, HA_OPEN_FOR_REPAIR)))
|
||||
raid_type= 0;
|
||||
else
|
||||
{
|
||||
raid_type= info->s->base.raid_type;
|
||||
raid_chunks= info->s->base.raid_chunks;
|
||||
maria_close(info);
|
||||
}
|
||||
#ifdef USE_RAID
|
||||
raid_type= 0;
|
||||
#endif
|
||||
sync_dir= 0;
|
||||
}
|
||||
else
|
||||
{
|
||||
#ifdef USE_RAID
|
||||
raid_type= info->s->base.raid_type;
|
||||
raid_chunks= info->s->base.raid_chunks;
|
||||
#endif
|
||||
sync_dir= (info->s->base.transactional && !info->s->temporary) ?
|
||||
MY_SYNC_DIR : 0;
|
||||
maria_close(info);
|
||||
}
|
||||
#ifdef USE_RAID
|
||||
#ifdef EXTRA_DEBUG
|
||||
_ma_check_table_is_closed(name,"delete");
|
||||
#endif
|
||||
#endif /* USE_RAID */
|
||||
|
||||
if (sync_dir)
|
||||
{
|
||||
/*
|
||||
For this log record to be of any use for Recovery, we need the upper
|
||||
MySQL layer to be crash-safe in DDLs; when it is we should reconsider
|
||||
the moment of writing this log record, how to use it in Recovery, and
|
||||
force the log. For now this record is only informative.
|
||||
*/
|
||||
LSN lsn;
|
||||
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char *)name;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= strlen(name);
|
||||
if (unlikely(translog_write_record(&lsn, LOGREC_REDO_DROP_TABLE,
|
||||
&dummy_transaction_object, NULL,
|
||||
log_array[TRANSLOG_INTERNAL_PARTS +
|
||||
0].length,
|
||||
sizeof(log_array)/sizeof(log_array[0]),
|
||||
log_array, NULL)))
|
||||
DBUG_RETURN(1);
|
||||
}
|
||||
|
||||
fn_format(from,name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
|
||||
/*
|
||||
RECOVERY TODO log the two deletes below.
|
||||
Then do the file deletions.
|
||||
For this log record to be of any use for Recovery, we need the upper MySQL
|
||||
layer to be crash-safe in DDLs; when it is we should reconsider the moment
|
||||
of writing this log record, how to use it in Recovery, and force the log.
|
||||
For now this record is only informative.
|
||||
*/
|
||||
if (my_delete_with_symlink(from, MYF(MY_WME | MY_SYNC_DIR)))
|
||||
if (my_delete_with_symlink(from, MYF(MY_WME | sync_dir)))
|
||||
DBUG_RETURN(my_errno);
|
||||
fn_format(from,name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
|
||||
#ifdef USE_RAID
|
||||
if (raid_type)
|
||||
DBUG_RETURN(my_raid_delete(from, raid_chunks, MYF(MY_WME | MY_SYNC_DIR)) ?
|
||||
DBUG_RETURN(my_raid_delete(from, raid_chunks, MYF(MY_WME | sync_dir)) ?
|
||||
my_errno : 0);
|
||||
#endif
|
||||
DBUG_RETURN(my_delete_with_symlink(from, MYF(MY_WME | MY_SYNC_DIR)) ?
|
||||
DBUG_RETURN(my_delete_with_symlink(from, MYF(MY_WME | sync_dir)) ?
|
||||
my_errno : 0);
|
||||
}
|
||||
|
@ -21,21 +21,20 @@
|
||||
static void maria_extra_keyflag(MARIA_HA *info,
|
||||
enum ha_extra_function function);
|
||||
|
||||
/**
|
||||
@brief Set options and buffers to optimize table handling
|
||||
|
||||
/*
|
||||
Set options and buffers to optimize table handling
|
||||
@param name table's name
|
||||
@param info open table
|
||||
@param function operation
|
||||
@param extra_arg Pointer to extra argument (normally pointer to
|
||||
ulong); used when function is one of:
|
||||
HA_EXTRA_WRITE_CACHE
|
||||
HA_EXTRA_CACHE
|
||||
|
||||
SYNOPSIS
|
||||
maria_extra()
|
||||
info open table
|
||||
function operation
|
||||
extra_arg Pointer to extra argument (normally pointer to ulong)
|
||||
Used when function is one of:
|
||||
HA_EXTRA_WRITE_CACHE
|
||||
HA_EXTRA_CACHE
|
||||
RETURN VALUES
|
||||
0 ok
|
||||
# error
|
||||
@return Operation status
|
||||
@retval 0 ok
|
||||
@retval !=0 error
|
||||
*/
|
||||
|
||||
int maria_extra(MARIA_HA *info, enum ha_extra_function function,
|
||||
@ -265,14 +264,24 @@ int maria_extra(MARIA_HA *info, enum ha_extra_function function,
|
||||
pthread_mutex_unlock(&THR_LOCK_maria);
|
||||
break;
|
||||
case HA_EXTRA_PREPARE_FOR_DELETE:
|
||||
/* QQ: suggest to rename it to "PREPARE_FOR_DROP" */
|
||||
pthread_mutex_lock(&THR_LOCK_maria);
|
||||
share->last_version= 0L; /* Impossible version */
|
||||
#ifdef __WIN__
|
||||
/* Close the isam and data files as Win32 can't drop an open table */
|
||||
pthread_mutex_lock(&share->intern_lock);
|
||||
/*
|
||||
If this is Windows we remove blocks from pagecache. If not Windows we
|
||||
don't do it, so these pages stay in the pagecache? So they may later be
|
||||
flushed to a wrong file?
|
||||
Or is it that this flush_pagecache_blocks() never finds any blocks? Then
|
||||
why do we do it on Windows?
|
||||
Don't we wait for all instances to be closed before dropping the table?
|
||||
Do we ever do something useful here?
|
||||
BUG?
|
||||
*/
|
||||
if (flush_pagecache_blocks(share->pagecache, &share->kfile,
|
||||
(function == HA_EXTRA_FORCE_REOPEN ?
|
||||
FLUSH_RELEASE : FLUSH_IGNORE_CHANGED)))
|
||||
FLUSH_IGNORE_CHANGED))
|
||||
{
|
||||
error=my_errno;
|
||||
share->changed=1;
|
||||
@ -292,9 +301,11 @@ int maria_extra(MARIA_HA *info, enum ha_extra_function function,
|
||||
info->lock_type = F_UNLCK;
|
||||
}
|
||||
if (share->kfile.file >= 0)
|
||||
{
|
||||
_ma_decrement_open_count(info);
|
||||
if (share->kfile.file >= 0 && my_close(share->kfile,MYF(0)))
|
||||
error=my_errno;
|
||||
if (my_close(share->kfile,MYF(0)))
|
||||
error=my_errno;
|
||||
}
|
||||
{
|
||||
LIST *list_element ;
|
||||
for (list_element=maria_open_list ;
|
||||
@ -304,6 +315,9 @@ int maria_extra(MARIA_HA *info, enum ha_extra_function function,
|
||||
MARIA_HA *tmpinfo=(MARIA_HA*) list_element->data;
|
||||
if (tmpinfo->s == info->s)
|
||||
{
|
||||
/**
|
||||
@todo RECOVERY BUG: flush of bitmap and sync of dfile are missing
|
||||
*/
|
||||
if (tmpinfo->dfile.file >= 0 &&
|
||||
my_close(tmpinfo->dfile.file, MYF(0)))
|
||||
error = my_errno;
|
||||
|
@ -53,7 +53,7 @@ void maria_end(void)
|
||||
{
|
||||
if (maria_inited)
|
||||
{
|
||||
maria_inited= FALSE;
|
||||
maria_inited= maria_multi_threaded= FALSE;
|
||||
ft_free_stopwords();
|
||||
trnman_destroy();
|
||||
translog_destroy();
|
||||
|
@ -17,6 +17,14 @@
|
||||
#include "ma_blockrec.h"
|
||||
#include "trnman.h"
|
||||
|
||||
/**
|
||||
@file
|
||||
@brief Module which writes and reads to a transaction log
|
||||
|
||||
@todo LOG: in functions where the log's lock is required, a
|
||||
translog_assert_owner() could be added.
|
||||
*/
|
||||
|
||||
/* number of opened log files in the pagecache (should be at least 2) */
|
||||
#define OPENED_FILES_NUM 3
|
||||
|
||||
@ -166,7 +174,7 @@ static struct st_translog_descriptor log_descriptor;
|
||||
/* Marker for end of log */
|
||||
static byte end_of_log= 0;
|
||||
|
||||
static my_bool translog_inited;
|
||||
my_bool translog_inited= 0;
|
||||
|
||||
/* record classes */
|
||||
enum record_class
|
||||
@ -218,7 +226,7 @@ struct st_log_record_type_descriptor
|
||||
uint16 read_header_len;
|
||||
/* HOOK for writing the record called before lock */
|
||||
prewrite_rec_hook prewrite_hook;
|
||||
/* HOOK for writing the record called when LSN is known */
|
||||
/* HOOK for writing the record called when LSN is known, inside lock */
|
||||
inwrite_rec_hook inwrite_hook;
|
||||
/* HOOK for reading headers */
|
||||
read_rec_hook read_hook;
|
||||
@ -230,6 +238,13 @@ struct st_log_record_type_descriptor
|
||||
};
|
||||
|
||||
|
||||
#include <my_atomic.h>
|
||||
/* an array that maps id of a MARIA_SHARE to this MARIA_SHARE */
|
||||
static MARIA_SHARE **id_to_share= NULL;
|
||||
#define SHARE_ID_MAX 65535 /* array's size */
|
||||
/* lock for id_to_share */
|
||||
static my_atomic_rwlock_t LOCK_id_to_share;
|
||||
|
||||
static my_bool write_hook_for_redo(enum translog_record_type type,
|
||||
TRN *trn, LSN *lsn,
|
||||
struct st_translog_parts *parts);
|
||||
@ -291,7 +306,9 @@ static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_HEAD=
|
||||
write_hook_for_redo, NULL, 0};
|
||||
|
||||
static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_TAIL=
|
||||
{LOGRECTYPE_VARIABLE_LENGTH, 0, 8, NULL, NULL, NULL, 0};
|
||||
{LOGRECTYPE_VARIABLE_LENGTH, 0,
|
||||
FILEID_STORE_SIZE + PAGE_STORE_SIZE + DIRPOS_STORE_SIZE, NULL,
|
||||
write_hook_for_redo, NULL, 0};
|
||||
|
||||
static LOG_DESC INIT_LOGREC_REDO_INSERT_ROW_BLOB=
|
||||
{LOGRECTYPE_VARIABLE_LENGTH, 0, 8, NULL, write_hook_for_redo, NULL, 0};
|
||||
@ -376,15 +393,9 @@ static LOG_DESC INIT_LOGREC_COMMIT=
|
||||
static LOG_DESC INIT_LOGREC_COMMIT_WITH_UNDO_PURGE=
|
||||
{LOGRECTYPE_PSEUDOFIXEDLENGTH, 5, 5, NULL, NULL, NULL, 1};
|
||||
|
||||
static LOG_DESC INIT_LOGREC_CHECKPOINT_PAGE=
|
||||
{LOGRECTYPE_VARIABLE_LENGTH, 0, 6, NULL, NULL, NULL, 0};
|
||||
|
||||
static LOG_DESC INIT_LOGREC_CHECKPOINT_TRAN=
|
||||
static LOG_DESC INIT_LOGREC_CHECKPOINT=
|
||||
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0};
|
||||
|
||||
static LOG_DESC INIT_LOGREC_CHECKPOINT_TABL=
|
||||
{LOGRECTYPE_VARIABLE_LENGTH, 0, 8, NULL, NULL, NULL, 0};
|
||||
|
||||
static LOG_DESC INIT_LOGREC_REDO_CREATE_TABLE=
|
||||
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0};
|
||||
|
||||
@ -394,8 +405,13 @@ static LOG_DESC INIT_LOGREC_REDO_RENAME_TABLE=
|
||||
static LOG_DESC INIT_LOGREC_REDO_DROP_TABLE=
|
||||
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0};
|
||||
|
||||
static LOG_DESC INIT_LOGREC_REDO_TRUNCATE_TABLE=
|
||||
{LOGRECTYPE_VARIABLE_LENGTH, 0, 0, NULL, NULL, NULL, 0};
|
||||
static LOG_DESC INIT_LOGREC_REDO_DELETE_ALL=
|
||||
{LOGRECTYPE_FIXEDLENGTH, FILEID_STORE_SIZE, FILEID_STORE_SIZE,
|
||||
NULL, NULL, NULL, 0};
|
||||
|
||||
static LOG_DESC INIT_LOGREC_REDO_REPAIR_TABLE=
|
||||
{LOGRECTYPE_FIXEDLENGTH, FILEID_STORE_SIZE + 4, FILEID_STORE_SIZE + 4,
|
||||
NULL, NULL, NULL, 0};
|
||||
|
||||
static LOG_DESC INIT_LOGREC_FILE_ID=
|
||||
{LOGRECTYPE_VARIABLE_LENGTH, 0, 4, NULL, NULL, NULL, 0};
|
||||
@ -403,6 +419,7 @@ static LOG_DESC INIT_LOGREC_FILE_ID=
|
||||
static LOG_DESC INIT_LOGREC_LONG_TRANSACTION_ID=
|
||||
{LOGRECTYPE_FIXEDLENGTH, 6, 6, NULL, NULL, NULL, 0};
|
||||
|
||||
const myf log_write_flags= MY_WME | MY_NABP | MY_WAIT_IF_FULL;
|
||||
|
||||
static void loghandler_init()
|
||||
{
|
||||
@ -454,20 +471,18 @@ static void loghandler_init()
|
||||
INIT_LOGREC_COMMIT;
|
||||
log_record_type_descriptor[LOGREC_COMMIT_WITH_UNDO_PURGE]=
|
||||
INIT_LOGREC_COMMIT_WITH_UNDO_PURGE;
|
||||
log_record_type_descriptor[LOGREC_CHECKPOINT_PAGE]=
|
||||
INIT_LOGREC_CHECKPOINT_PAGE;
|
||||
log_record_type_descriptor[LOGREC_CHECKPOINT_TRAN]=
|
||||
INIT_LOGREC_CHECKPOINT_TRAN;
|
||||
log_record_type_descriptor[LOGREC_CHECKPOINT_TABL]=
|
||||
INIT_LOGREC_CHECKPOINT_TABL;
|
||||
log_record_type_descriptor[LOGREC_CHECKPOINT]=
|
||||
INIT_LOGREC_CHECKPOINT;
|
||||
log_record_type_descriptor[LOGREC_REDO_CREATE_TABLE]=
|
||||
INIT_LOGREC_REDO_CREATE_TABLE;
|
||||
log_record_type_descriptor[LOGREC_REDO_RENAME_TABLE]=
|
||||
INIT_LOGREC_REDO_RENAME_TABLE;
|
||||
log_record_type_descriptor[LOGREC_REDO_DROP_TABLE]=
|
||||
INIT_LOGREC_REDO_DROP_TABLE;
|
||||
log_record_type_descriptor[LOGREC_REDO_TRUNCATE_TABLE]=
|
||||
INIT_LOGREC_REDO_TRUNCATE_TABLE;
|
||||
log_record_type_descriptor[LOGREC_REDO_DELETE_ALL]=
|
||||
INIT_LOGREC_REDO_DELETE_ALL;
|
||||
log_record_type_descriptor[LOGREC_REDO_REPAIR_TABLE]=
|
||||
INIT_LOGREC_REDO_REPAIR_TABLE;
|
||||
log_record_type_descriptor[LOGREC_FILE_ID]=
|
||||
INIT_LOGREC_FILE_ID;
|
||||
log_record_type_descriptor[LOGREC_LONG_TRANSACTION_ID]=
|
||||
@ -554,6 +569,7 @@ static File open_logfile_by_number_no_cache(uint32 file_no)
|
||||
DBUG_ENTER("open_logfile_by_number_no_cache");
|
||||
|
||||
/* TODO: add O_DIRECT to open flags (when buffer is aligned) */
|
||||
/* TODO: use my_create() */
|
||||
if ((file= my_open(translog_filename_by_fileno(file_no, path),
|
||||
O_CREAT | O_BINARY | O_RDWR,
|
||||
MYF(MY_WME))) < 0)
|
||||
@ -615,7 +631,7 @@ static my_bool translog_write_file_header()
|
||||
bzero(page, sizeof(page_buff) - (page- page_buff));
|
||||
|
||||
DBUG_RETURN(my_pwrite(log_descriptor.log_file_num[0], page_buff,
|
||||
sizeof(page_buff), 0, MYF(MY_WME | MY_NABP)) != 0);
|
||||
sizeof(page_buff), 0, log_write_flags) != 0);
|
||||
}
|
||||
|
||||
|
||||
@ -1222,7 +1238,7 @@ static my_bool translog_buffer_next(TRANSLOG_ADDRESS *horizon,
|
||||
|
||||
|
||||
/*
|
||||
Set max LSN send to file
|
||||
Set max LSN sent to file
|
||||
|
||||
SYNOPSIS
|
||||
translog_set_sent_to_file()
|
||||
@ -1512,7 +1528,7 @@ static my_bool translog_buffer_flush(struct st_translog_buffer *buffer)
|
||||
}
|
||||
if (my_pwrite(buffer->file, (char*) buffer->buffer,
|
||||
buffer->size, LSN_OFFSET(buffer->offset),
|
||||
MYF(MY_WME | MY_NABP)))
|
||||
log_write_flags))
|
||||
{
|
||||
UNRECOVERABLE_ERROR(("Can't write buffer (%lu,0x%lx) size %lu "
|
||||
"to the disk (%d)",
|
||||
@ -2230,7 +2246,16 @@ my_bool translog_init(const char *directory,
|
||||
*/
|
||||
log_descriptor.flushed--; /* offset decreased */
|
||||
log_descriptor.sent_to_file--; /* offset decreased */
|
||||
|
||||
/*
|
||||
Log records will refer to a MARIA_SHARE by a unique 2-byte id; set up
|
||||
structures for generating 2-byte ids:
|
||||
*/
|
||||
my_atomic_rwlock_init(&LOCK_id_to_share);
|
||||
id_to_share= (MARIA_SHARE **) my_malloc(SHARE_ID_MAX*sizeof(MARIA_SHARE*),
|
||||
MYF(MY_WME|MY_ZEROFILL));
|
||||
if (unlikely(!id_to_share))
|
||||
DBUG_RETURN(1);
|
||||
id_to_share--; /* min id is 1 */
|
||||
translog_inited= 1;
|
||||
DBUG_RETURN(0);
|
||||
}
|
||||
@ -2303,6 +2328,8 @@ void translog_destroy()
|
||||
}
|
||||
pthread_mutex_destroy(&log_descriptor.sent_to_file_lock);
|
||||
my_close(log_descriptor.directory_fd, MYF(MY_WME));
|
||||
my_atomic_rwlock_destroy(&LOCK_id_to_share);
|
||||
my_free((gptr)(id_to_share + 1), MYF(MY_ALLOW_ZERO_PTR));
|
||||
translog_inited= 0;
|
||||
}
|
||||
DBUG_VOID_RETURN;
|
||||
@ -2362,6 +2389,14 @@ static inline my_bool translog_unlock()
|
||||
}
|
||||
|
||||
|
||||
#define translog_buffer_lock_assert_owner(B) \
|
||||
safe_mutex_assert_owner(&B->mutex);
|
||||
void translog_lock_assert_owner()
|
||||
{
|
||||
translog_buffer_lock_assert_owner(log_descriptor.bc.buffer);
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
Start new page
|
||||
|
||||
@ -4154,26 +4189,30 @@ err:
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
Write the log record
|
||||
/**
|
||||
@brief Writes the log record
|
||||
|
||||
SYNOPSIS
|
||||
translog_write_record()
|
||||
lsn LSN of the record will be written here
|
||||
type the log record type
|
||||
trn Transaction structure pointer for hooks by
|
||||
record log type, for short_id
|
||||
share MARIA_SHARE of table or NULL
|
||||
rec_len record length or 0 (count it)
|
||||
part_no number of parts or 0 (count it)
|
||||
parts_data zero ended (in case of number of parts is 0)
|
||||
array of LEX_STRINGs (parts), first
|
||||
TRANSLOG_INTERNAL_PARTS positions in the log
|
||||
should be unused (need for loghandler)
|
||||
If share has no 2-byte-id yet, gives an id to the share and logs
|
||||
LOGREC_FILE_ID. If transaction has not logged LOGREC_LONG_TRANSACTION_ID
|
||||
yet, logs it.
|
||||
|
||||
RETURN
|
||||
0 OK
|
||||
1 Error
|
||||
@param lsn LSN of the record will be written here
|
||||
@param type the log record type
|
||||
@param trn Transaction structure pointer for hooks by
|
||||
record log type, for short_id
|
||||
@param share MARIA_SHARE of table or NULL
|
||||
@param rec_len record length or 0 (count it)
|
||||
@param part_no number of parts or 0 (count it)
|
||||
@param parts_data zero ended (in case of number of parts is 0)
|
||||
array of LEX_STRINGs (parts), first
|
||||
TRANSLOG_INTERNAL_PARTS positions in the log
|
||||
should be unused (need for loghandler)
|
||||
@param store_share_id if share!=NULL then share's id will automatically
|
||||
be stored in the two first bytes pointed (so
|
||||
pointer is assumed to be !=NULL)
|
||||
@return Operation status
|
||||
@retval 0 OK
|
||||
@retval 1 Error
|
||||
*/
|
||||
|
||||
my_bool translog_write_record(LSN *lsn,
|
||||
@ -4181,7 +4220,8 @@ my_bool translog_write_record(LSN *lsn,
|
||||
TRN *trn, struct st_maria_share *share,
|
||||
translog_size_t rec_len,
|
||||
uint part_no,
|
||||
LEX_STRING *parts_data)
|
||||
LEX_STRING *parts_data,
|
||||
uchar *store_share_id)
|
||||
{
|
||||
struct st_translog_parts parts;
|
||||
LEX_STRING *part;
|
||||
@ -4191,10 +4231,41 @@ my_bool translog_write_record(LSN *lsn,
|
||||
DBUG_PRINT("enter", ("type: %u ShortTrID: %u",
|
||||
(uint) type, (uint)short_trid));
|
||||
|
||||
if (share && !share->base.transactional)
|
||||
if (share)
|
||||
{
|
||||
DBUG_PRINT("info", ("It is not transactional table"));
|
||||
DBUG_RETURN(0);
|
||||
if (!share->base.transactional)
|
||||
{
|
||||
DBUG_PRINT("info", ("It is not transactional table"));
|
||||
DBUG_RETURN(0);
|
||||
}
|
||||
if (unlikely(share->id == 0))
|
||||
{
|
||||
/*
|
||||
First log write for this MARIA_SHARE; give it a short id.
|
||||
When the lock manager is enabled and needs a short id, it should be
|
||||
assigned in the lock manager (because row locks will be taken before
|
||||
log records are written; for example SELECT FOR UPDATE takes locks but
|
||||
writes no log record.
|
||||
*/
|
||||
if (unlikely(translog_assign_id_to_share(share, trn)))
|
||||
DBUG_RETURN(1);
|
||||
}
|
||||
fileid_store(store_share_id, share->id);
|
||||
}
|
||||
if (unlikely(!(trn->first_undo_lsn & TRANSACTION_LOGGED_LONG_ID)))
|
||||
{
|
||||
LSN lsn;
|
||||
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 1];
|
||||
uchar log_data[6];
|
||||
int6store(log_data, trn->trid);
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
|
||||
trn->first_undo_lsn|= TRANSACTION_LOGGED_LONG_ID; /* no recursion */
|
||||
if (unlikely(translog_write_record(&lsn, LOGREC_LONG_TRANSACTION_ID,
|
||||
trn, NULL, sizeof(log_data),
|
||||
sizeof(log_array)/sizeof(log_array[0]),
|
||||
log_array, NULL)))
|
||||
DBUG_RETURN(1);
|
||||
}
|
||||
|
||||
parts.parts= parts_data;
|
||||
@ -4375,20 +4446,19 @@ void translog_free_record_header(TRANSLOG_HEADER_BUFFER *buff)
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
Set current horizon in the scanner data structure
|
||||
/**
|
||||
@brief Returns the current horizon at the end of the current log
|
||||
|
||||
SYNOPSIS
|
||||
translog_scanner_set_horizon()
|
||||
scanner Information about current chunk during scanning
|
||||
@return Horizon
|
||||
*/
|
||||
|
||||
static void translog_scanner_set_horizon(struct st_translog_scanner_data
|
||||
*scanner)
|
||||
TRANSLOG_ADDRESS translog_get_horizon()
|
||||
{
|
||||
TRANSLOG_ADDRESS res;
|
||||
translog_lock();
|
||||
scanner->horizon= log_descriptor.horizon;
|
||||
res= log_descriptor.horizon;
|
||||
translog_unlock();
|
||||
return res;
|
||||
}
|
||||
|
||||
|
||||
@ -4446,7 +4516,7 @@ my_bool translog_init_scanner(LSN lsn,
|
||||
|
||||
scanner->fixed_horizon= fixed_horizon;
|
||||
|
||||
translog_scanner_set_horizon(scanner);
|
||||
scanner->horizon= translog_get_horizon();
|
||||
DBUG_PRINT("info", ("horizon: (0x%lu,0x%lx)",
|
||||
(ulong) LSN_FILE_NO(scanner->horizon),
|
||||
(ulong) LSN_OFFSET(scanner->horizon)));
|
||||
@ -4499,7 +4569,7 @@ static my_bool translog_scanner_eol(TRANSLOG_SCANNER_DATA *scanner)
|
||||
DBUG_PRINT("info", ("Horizon is fixed and reached"));
|
||||
DBUG_RETURN(1);
|
||||
}
|
||||
translog_scanner_set_horizon(scanner);
|
||||
scanner->horizon= translog_get_horizon();
|
||||
DBUG_PRINT("info",
|
||||
("Horizon is re-read, EOL: %d",
|
||||
scanner->horizon <= (scanner->page_addr +
|
||||
@ -5368,17 +5438,31 @@ static void translog_force_current_buffer_to_finish()
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
Flush the log up to given LSN (included)
|
||||
/**
|
||||
@brief Flush the log up to given LSN (included)
|
||||
|
||||
SYNOPSIS
|
||||
translog_flush()
|
||||
lsn log record serial number up to which (inclusive)
|
||||
the log have to be flushed
|
||||
@param lsn log record serial number up to which (inclusive)
|
||||
the log has to be flushed
|
||||
|
||||
RETURN
|
||||
0 OK
|
||||
1 Error
|
||||
@return Operation status
|
||||
@retval 0 OK
|
||||
@retval 1 Error
|
||||
|
||||
@todo LOG: when a log write fails, we should not write to this log anymore
|
||||
(if we add more log records to this log they will be unreadable: we will hit
|
||||
the broken log record): all translog_flush() should be made to fail (because
|
||||
translog_flush() is when a a transaction wants something durable and we
|
||||
cannot make anything durable as log is corrupted). For that, a "my_bool
|
||||
st_translog_descriptor::write_error" could be set to 1 when a
|
||||
translog_write_record() or translog_flush() fails, and translog_flush()
|
||||
would test this var (and translog_write_record() could also test this var if
|
||||
it wants, though it's not absolutely needed).
|
||||
Then, either shut Maria down immediately, or switch to a new log (but if we
|
||||
get write error after write error, that would create too many logs).
|
||||
A popular open-source transactional engine intentionally crashes as soon as
|
||||
a log flush fails (we however don't want to crash the entire mysqld, but
|
||||
stopping all engine's operations immediately would make sense).
|
||||
Same applies to translog_write_record().
|
||||
*/
|
||||
|
||||
my_bool translog_flush(LSN lsn)
|
||||
@ -5469,24 +5553,55 @@ my_bool translog_flush(LSN lsn)
|
||||
/* We sync file when we are closing it => do nothing if file closed */
|
||||
}
|
||||
log_descriptor.flushed= sent_to_file;
|
||||
/** @todo LOG decide if syncing of directory is needed */
|
||||
rc|= my_sync(log_descriptor.directory_fd, MYF(MY_WME));
|
||||
translog_unlock();
|
||||
DBUG_RETURN(rc);
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
@brief Sets transaction's rec_lsn if needed
|
||||
|
||||
A transaction sometimes writes a REDO even before the page is in the
|
||||
pagecache (example: brand new head or tail pages; full pages). So, if
|
||||
Checkpoint happens just after the REDO write, it needs to know that the
|
||||
REDO phase must start before this REDO. Scanning the pagecache cannot
|
||||
tell that as the page is not in the cache. So, transaction sets its rec_lsn
|
||||
to the REDO's LSN or somewhere before, and Checkpoint reads the
|
||||
transaction's rec_lsn.
|
||||
|
||||
@todo move it to a separate file
|
||||
|
||||
@return Operation status, always 0 (success)
|
||||
*/
|
||||
|
||||
static my_bool write_hook_for_redo(enum translog_record_type type
|
||||
__attribute__ ((unused)),
|
||||
TRN *trn, LSN *lsn,
|
||||
struct st_translog_parts *parts
|
||||
__attribute__ ((unused)))
|
||||
{
|
||||
/*
|
||||
If the hook stays so simple, it would be faster to pass
|
||||
!trn->rec_lsn ? trn->rec_lsn : some_dummy_lsn
|
||||
to translog_write_record(), like Monty did in his original code, and not
|
||||
have a hook. For now we keep it like this.
|
||||
*/
|
||||
if (trn->rec_lsn == 0)
|
||||
trn->rec_lsn= *lsn;
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
@brief Sets transaction's undo_lsn, first_undo_lsn if needed
|
||||
|
||||
@todo move it to a separate file
|
||||
|
||||
@return Operation status, always 0 (success)
|
||||
*/
|
||||
|
||||
static my_bool write_hook_for_undo(enum translog_record_type type
|
||||
__attribute__ ((unused)),
|
||||
TRN *trn, LSN *lsn,
|
||||
@ -5494,11 +5609,109 @@ static my_bool write_hook_for_undo(enum translog_record_type type
|
||||
__attribute__ ((unused)))
|
||||
{
|
||||
trn->undo_lsn= *lsn;
|
||||
if (trn->first_undo_lsn == 0)
|
||||
trn->first_undo_lsn= *lsn;
|
||||
if (unlikely(LSN_WITH_FLAGS_TO_LSN(trn->first_undo_lsn) == 0))
|
||||
trn->first_undo_lsn=
|
||||
trn->undo_lsn | LSN_WITH_FLAGS_TO_FLAGS(trn->first_undo_lsn);
|
||||
return 0;
|
||||
/*
|
||||
when we implement purging, we will specialize this hook: UNDO_PURGE
|
||||
records will additionally set trn->undo_purge_lsn
|
||||
*/
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
@brief Gives a 2-byte-id to MARIA_SHARE and logs this fact
|
||||
|
||||
If a MARIA_SHARE does not yet have a 2-byte-id (unique over all currently
|
||||
open MARIA_SHAREs), give it one and record this assignment in the log
|
||||
(LOGREC_FILE_ID log record).
|
||||
|
||||
@param share table
|
||||
@param trn calling transaction
|
||||
|
||||
@return Operation status
|
||||
@retval 0 OK
|
||||
@retval 1 Error
|
||||
|
||||
@note Can be called even if share already has an id (then will do nothing)
|
||||
*/
|
||||
|
||||
int translog_assign_id_to_share(MARIA_SHARE *share, TRN *trn)
|
||||
{
|
||||
/*
|
||||
If you give an id to a non-BLOCK_RECORD table, you also need to release
|
||||
this id somewhere. Then you can change the assertion.
|
||||
*/
|
||||
DBUG_ASSERT(share->data_file_type == BLOCK_RECORD);
|
||||
/* re-check under mutex to avoid having 2 ids for the same share */
|
||||
pthread_mutex_lock(&share->intern_lock);
|
||||
if (likely(share->id == 0))
|
||||
{
|
||||
/* Inspired by set_short_trid() of trnman.c */
|
||||
int i= share->kfile.file % SHARE_ID_MAX + 1;
|
||||
my_atomic_rwlock_wrlock(&LOCK_id_to_share);
|
||||
/**
|
||||
@todo RECOVERY BUG: if all slots are used, and we're using rwlocks
|
||||
above, we will never exit the loop. To be discussed with Serg.
|
||||
*/
|
||||
for ( ; ; i= i % SHARE_ID_MAX + 1) /* the range is [1..SHARE_ID_MAX] */
|
||||
{
|
||||
void *tmp= NULL;
|
||||
if (id_to_share[i] == NULL &&
|
||||
my_atomic_casptr((void **)&id_to_share[i], &tmp, share))
|
||||
break;
|
||||
}
|
||||
my_atomic_rwlock_wrunlock(&LOCK_id_to_share);
|
||||
share->id= (uint16)i;
|
||||
DBUG_PRINT("info", ("id_to_share: 0x%lx -> %u", (ulong)share, i));
|
||||
LSN lsn;
|
||||
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 2];
|
||||
uchar log_data[FILEID_STORE_SIZE];
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= (char*) log_data;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= sizeof(log_data);
|
||||
/*
|
||||
open_file_name is an unresolved name (symlinks are not resolved, datadir
|
||||
is not realpath-ed, etc) which is good: the log can be moved to another
|
||||
directory and continue working.
|
||||
*/
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 1].str= share->open_file_name;
|
||||
/**
|
||||
@todo if we had the name's length in MARIA_SHARE we could avoid this
|
||||
strlen()
|
||||
*/
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 1].length=
|
||||
strlen(share->open_file_name);
|
||||
if (unlikely(translog_write_record(&lsn, LOGREC_FILE_ID, trn, share,
|
||||
sizeof(log_data) +
|
||||
log_array[TRANSLOG_INTERNAL_PARTS +
|
||||
1].length,
|
||||
sizeof(log_array)/sizeof(log_array[0]),
|
||||
log_array, log_data)))
|
||||
return 1;
|
||||
}
|
||||
pthread_mutex_unlock(&share->intern_lock);
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
/**
|
||||
@brief Recycles a MARIA_SHARE's short id.
|
||||
|
||||
@param share table
|
||||
|
||||
@note Must be called only if share has an id (i.e. id != 0)
|
||||
*/
|
||||
|
||||
void translog_deassign_id_from_share(MARIA_SHARE *share)
|
||||
{
|
||||
DBUG_PRINT("info", ("id_to_share: 0x%lx id %u -> 0",
|
||||
(ulong)share, share->id));
|
||||
/*
|
||||
We don't need any mutex as we are called only when closing the last
|
||||
instance of the table: no writes can be happening.
|
||||
*/
|
||||
my_atomic_rwlock_rdlock(&LOCK_id_to_share);
|
||||
my_atomic_storeptr((void **)&id_to_share[share->id], 0);
|
||||
my_atomic_rwlock_rdunlock(&LOCK_id_to_share);
|
||||
}
|
||||
|
@ -86,13 +86,12 @@ enum translog_record_type
|
||||
LOGREC_PREPARE_WITH_UNDO_PURGE,
|
||||
LOGREC_COMMIT,
|
||||
LOGREC_COMMIT_WITH_UNDO_PURGE,
|
||||
LOGREC_CHECKPOINT_PAGE,
|
||||
LOGREC_CHECKPOINT_TRAN,
|
||||
LOGREC_CHECKPOINT_TABL,
|
||||
LOGREC_CHECKPOINT,
|
||||
LOGREC_REDO_CREATE_TABLE,
|
||||
LOGREC_REDO_RENAME_TABLE,
|
||||
LOGREC_REDO_DROP_TABLE,
|
||||
LOGREC_REDO_TRUNCATE_TABLE,
|
||||
LOGREC_REDO_DELETE_ALL,
|
||||
LOGREC_REDO_REPAIR_TABLE,
|
||||
LOGREC_FILE_ID,
|
||||
LOGREC_LONG_TRANSACTION_ID,
|
||||
LOGREC_RESERVED_FUTURE_EXTENSION= 63
|
||||
@ -181,9 +180,7 @@ struct st_translog_reader_data
|
||||
};
|
||||
|
||||
struct st_transaction;
|
||||
#ifdef __cplusplus
|
||||
extern "C" {
|
||||
#endif
|
||||
C_MODE_START
|
||||
|
||||
/* Records types for unittests */
|
||||
#define LOGREC_FIXED_RECORD_0LSN_EXAMPLE 1
|
||||
@ -199,13 +196,12 @@ extern my_bool translog_init(const char *directory, uint32 log_file_max_size,
|
||||
uint32 server_version, uint32 server_id,
|
||||
PAGECACHE *pagecache, uint flags);
|
||||
|
||||
extern my_bool translog_write_record(LSN *lsn,
|
||||
enum translog_record_type type,
|
||||
struct st_transaction *trn,
|
||||
struct st_maria_share *share,
|
||||
translog_size_t rec_len,
|
||||
uint part_no,
|
||||
LEX_STRING *parts_data);
|
||||
extern my_bool
|
||||
translog_write_record(LSN *lsn, enum translog_record_type type,
|
||||
struct st_transaction *trn,
|
||||
struct st_maria_share *share,
|
||||
translog_size_t rec_len, uint part_no,
|
||||
LEX_STRING *parts_data, uchar *store_share_id);
|
||||
|
||||
extern void translog_destroy();
|
||||
|
||||
@ -232,7 +228,10 @@ extern translog_size_t translog_read_next_record_header(TRANSLOG_SCANNER_DATA
|
||||
*scanner,
|
||||
TRANSLOG_HEADER_BUFFER
|
||||
*buff);
|
||||
#ifdef __cplusplus
|
||||
}
|
||||
#endif
|
||||
|
||||
extern void translog_lock_assert_owner();
|
||||
extern TRANSLOG_ADDRESS translog_get_horizon();
|
||||
extern int translog_assign_id_to_share(struct st_maria_share *share,
|
||||
struct st_transaction *trn);
|
||||
extern void translog_deassign_id_from_share(struct st_maria_share *share);
|
||||
extern my_bool translog_inited;
|
||||
C_MODE_END
|
||||
|
@ -35,7 +35,7 @@ typedef TRANSLOG_ADDRESS LSN;
|
||||
/* checks LSN */
|
||||
#define LSN_VALID(L) DBUG_ASSERT((L) >= 0 && (L) < (uint64)0xFFFFFFFFFFFFFFLL)
|
||||
|
||||
/* size of stored LSN on a disk */
|
||||
/* size of stored LSN on a disk, don't change it! */
|
||||
#define LSN_STORE_SIZE 7
|
||||
|
||||
/* Puts LSN into buffer (dst) */
|
||||
@ -53,4 +53,12 @@ typedef TRANSLOG_ADDRESS LSN;
|
||||
|
||||
#define LSN_REPLACE_OFFSET(L, S) (LSN_FINE_NO_PART(L) | (S))
|
||||
|
||||
/*
|
||||
an 8-byte type whose most significant byte is used for "flags"; 7
|
||||
other bytes are a LSN.
|
||||
*/
|
||||
typedef LSN LSN_WITH_FLAGS;
|
||||
#define LSN_WITH_FLAGS_TO_LSN(x) (x & ULL(0x00FFFFFFFFFFFFFF))
|
||||
#define LSN_WITH_FLAGS_TO_FLAGS(x) (x & ULL(0xFF00000000000000))
|
||||
|
||||
#endif
|
||||
|
@ -919,12 +919,23 @@ static void setup_key_functions(register MARIA_KEYDEF *keyinfo)
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
Function to save and store the header in the index file (.MYI)
|
||||
/**
|
||||
@brief Function to save and store the header in the index file (.MYI)
|
||||
|
||||
@param file descriptor of the index file to write
|
||||
@param state state information to write to the file
|
||||
@param pWrite bitmap (determines the amount of information to
|
||||
write, and if my_write() or my_pwrite() should be
|
||||
used)
|
||||
|
||||
@return Operation status
|
||||
@retval 0 OK
|
||||
@retval 1 Error
|
||||
*/
|
||||
|
||||
uint _ma_state_info_write(File file, MARIA_STATE_INFO *state, uint pWrite)
|
||||
{
|
||||
/** @todo RECOVERY write it only at checkpoint time */
|
||||
uchar buff[MARIA_STATE_INFO_SIZE + MARIA_STATE_EXTRA_SIZE];
|
||||
uchar *ptr=buff;
|
||||
uint i, keys= (uint) state->header.keys;
|
||||
@ -935,6 +946,11 @@ uint _ma_state_info_write(File file, MARIA_STATE_INFO *state, uint pWrite)
|
||||
|
||||
/* open_count must be first because of _ma_mark_file_changed ! */
|
||||
mi_int2store(ptr,state->open_count); ptr+= 2;
|
||||
/*
|
||||
if you change the offset of this LSN inside the file, fix
|
||||
ma_create + ma_rename + ma_delete_all + backward-compatibility.
|
||||
*/
|
||||
lsn_store(ptr, state->create_rename_lsn); ptr+= LSN_STORE_SIZE;
|
||||
*ptr++= (uchar)state->changed;
|
||||
*ptr++= state->sortkey;
|
||||
mi_rowstore(ptr,state->state.records); ptr+= 8;
|
||||
@ -959,6 +975,7 @@ uint _ma_state_info_write(File file, MARIA_STATE_INFO *state, uint pWrite)
|
||||
{
|
||||
mi_sizestore(ptr,state->key_root[i]); ptr+= 8;
|
||||
}
|
||||
/** @todo RECOVERY key_del is a problem for recovery */
|
||||
mi_sizestore(ptr,state->key_del); ptr+= 8;
|
||||
if (pWrite & 2) /* From maria_chk */
|
||||
{
|
||||
@ -994,6 +1011,7 @@ byte *_ma_state_info_read(byte *ptr, MARIA_STATE_INFO *state)
|
||||
key_parts= mi_uint2korr(state->header.key_parts);
|
||||
|
||||
state->open_count = mi_uint2korr(ptr); ptr+= 2;
|
||||
state->create_rename_lsn= lsn_korr(ptr); ptr+= LSN_STORE_SIZE;
|
||||
state->changed= (my_bool) *ptr++;
|
||||
state->sortkey= (uint) *ptr++;
|
||||
state->state.records= mi_rowkorr(ptr); ptr+= 8;
|
||||
|
@ -114,6 +114,11 @@
|
||||
|
||||
/* TODO: put it to my_static.c */
|
||||
my_bool my_disable_flush_pagecache_blocks= 0;
|
||||
/**
|
||||
when flushing pages of a file, it can happen that we take some dirty blocks
|
||||
out of changed_blocks[]; Checkpoint must not run at this moment.
|
||||
*/
|
||||
uint changed_blocks_is_incomplete= 0;
|
||||
|
||||
#define STRUCT_PTR(TYPE, MEMBER, a) \
|
||||
(TYPE *) ((char *) (a) - offsetof(TYPE, MEMBER))
|
||||
@ -308,7 +313,7 @@ struct st_pagecache_block_link
|
||||
enum pagecache_page_type type; /* type of the block */
|
||||
uint hits_left; /* number of hits left until promotion */
|
||||
ulonglong last_hit_time; /* timestamp of the last hit */
|
||||
LSN rec_lsn; /* LSN when first became dirty */
|
||||
LSN rec_lsn; /**< LSN when first became dirty */
|
||||
KEYCACHE_CONDVAR *condvar; /* condition variable for 'no readers' event */
|
||||
};
|
||||
|
||||
@ -2523,7 +2528,8 @@ void pagecache_unlock(PAGECACHE *pagecache,
|
||||
{
|
||||
DBUG_ASSERT(lock == PAGECACHE_LOCK_WRITE_UNLOCK);
|
||||
DBUG_ASSERT(pin == PAGECACHE_UNPIN);
|
||||
set_if_bigger(block->rec_lsn, first_REDO_LSN_for_page);
|
||||
if (block->rec_lsn == 0)
|
||||
block->rec_lsn= first_REDO_LSN_for_page;
|
||||
}
|
||||
if (lsn != 0)
|
||||
{
|
||||
@ -2685,7 +2691,8 @@ void pagecache_unlock_by_link(PAGECACHE *pagecache,
|
||||
DBUG_ASSERT(lock == PAGECACHE_LOCK_WRITE_UNLOCK ||
|
||||
lock == PAGECACHE_LOCK_READ_UNLOCK);
|
||||
DBUG_ASSERT(pin == PAGECACHE_UNPIN);
|
||||
set_if_bigger(block->rec_lsn, first_REDO_LSN_for_page);
|
||||
if (block->rec_lsn == 0)
|
||||
block->rec_lsn= first_REDO_LSN_for_page;
|
||||
}
|
||||
if (lsn != 0)
|
||||
{
|
||||
@ -3279,8 +3286,8 @@ restart:
|
||||
if (need_lock_change)
|
||||
{
|
||||
/*
|
||||
RECOVERY TODO BUG We are doing an unlock here, so need to give the
|
||||
page its rec_lsn
|
||||
We don't set rec_lsn of the block; this is ok as for the
|
||||
Maria-block-record's pages, we always keep pages pinned here.
|
||||
*/
|
||||
if (make_lock_and_pin(pagecache, block,
|
||||
write_lock_change_table[lock].unlock_lock,
|
||||
@ -3500,22 +3507,21 @@ static int flush_cached_blocks(PAGECACHE *pagecache,
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
flush all key blocks for a file to disk, but don't do any mutex locks
|
||||
/**
|
||||
@brief flush all key blocks for a file to disk but don't do any mutex locks
|
||||
|
||||
flush_pagecache_blocks_int()
|
||||
pagecache pointer to a key cache data structure
|
||||
file handler for the file to flush to
|
||||
flush_type type of the flush
|
||||
@param pagecache pointer to a pagecache data structure
|
||||
@param file handler for the file to flush to
|
||||
@param flush_type type of the flush
|
||||
|
||||
NOTES
|
||||
This function doesn't do any mutex locks because it needs to be called
|
||||
both from flush_pagecache_blocks and flush_all_key_blocks (the later one
|
||||
does the mutex lock in the resize_pagecache() function).
|
||||
@note
|
||||
This function doesn't do any mutex locks because it needs to be called
|
||||
both from flush_pagecache_blocks and flush_all_key_blocks (the later one
|
||||
does the mutex lock in the resize_pagecache() function).
|
||||
|
||||
RETURN
|
||||
0 ok
|
||||
1 error
|
||||
@return Operation status
|
||||
@retval 0 OK
|
||||
@retval 1 Error
|
||||
*/
|
||||
|
||||
static int flush_pagecache_blocks_int(PAGECACHE *pagecache,
|
||||
@ -3547,6 +3553,7 @@ static int flush_pagecache_blocks_int(PAGECACHE *pagecache,
|
||||
#if defined(PAGECACHE_DEBUG)
|
||||
uint cnt= 0;
|
||||
#endif
|
||||
uint8 changed_blocks_is_incomplete_incremented= 0;
|
||||
|
||||
if (type != FLUSH_IGNORE_CHANGED)
|
||||
{
|
||||
@ -3636,16 +3643,23 @@ restart:
|
||||
else
|
||||
{
|
||||
/* Link the block into a list of blocks 'in switch' */
|
||||
/*
|
||||
RECOVERY TODO BUG this unlink_changed() is a serious problem for
|
||||
Maria's Checkpoint: it removes a page from the list of dirty
|
||||
pages, while it's still dirty. A solution is to abandon
|
||||
first_in_switch, just wait for this page to be
|
||||
flushed by somebody else, and loop. TODO: check all places
|
||||
where we remove a page from the list of dirty pages
|
||||
*/
|
||||
unlink_changed(block);
|
||||
link_changed(block, &first_in_switch);
|
||||
/*
|
||||
We have just removed a page from the list of dirty pages
|
||||
("changed_blocks") though it's still dirty (the flush by another
|
||||
thread has not yet happened). Checkpoint will miss the page and so
|
||||
must be blocked until that flush has happened.
|
||||
*/
|
||||
/**
|
||||
@todo RECOVERY: check all places where we remove a page from the
|
||||
list of dirty pages
|
||||
*/
|
||||
if (unlikely(!changed_blocks_is_incomplete_incremented))
|
||||
{
|
||||
changed_blocks_is_incomplete_incremented= 1;
|
||||
changed_blocks_is_incomplete++;
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -3683,6 +3697,8 @@ restart:
|
||||
KEYCACHE_DBUG_ASSERT(cnt <= pagecache->blocks_used);
|
||||
#endif
|
||||
}
|
||||
changed_blocks_is_incomplete-=
|
||||
changed_blocks_is_incomplete_incremented;
|
||||
/* The following happens very seldom */
|
||||
if (! (type == FLUSH_KEEP || type == FLUSH_FORCE_WRITE))
|
||||
{
|
||||
@ -3789,51 +3805,56 @@ int reset_pagecache_counters(const char *name, PAGECACHE *pagecache)
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
Allocates a buffer and stores in it some information about all dirty pages
|
||||
of type PAGECACHE_LSN_PAGE.
|
||||
/**
|
||||
@brief Allocates a buffer and stores in it some info about all dirty pages
|
||||
|
||||
SYNOPSIS
|
||||
pagecache_collect_changed_blocks_with_lsn()
|
||||
pagecache pointer to the page cache
|
||||
str (OUT) pointer to a LEX_STRING where the allocated buffer, and
|
||||
its size, will be put
|
||||
max_lsn (OUT) pointer to a LSN where the maximum rec_lsn of all
|
||||
relevant dirty pages will be put
|
||||
Does the allocation because the caller cannot know the size itself.
|
||||
Memory freeing is to be done by the caller (if the "str" member of the
|
||||
LEX_STRING is not NULL).
|
||||
Ignores all pages of another type than PAGECACHE_LSN_PAGE, because they
|
||||
are not interesting for a checkpoint record.
|
||||
The caller has the intention of doing checkpoints.
|
||||
|
||||
DESCRIPTION
|
||||
Does the allocation because the caller cannot know the size itself.
|
||||
Memory freeing is to be done by the caller (if the "str" member of the
|
||||
LEX_STRING is not NULL).
|
||||
Ignores all pages of another type than PAGECACHE_LSN_PAGE, because they
|
||||
are not interesting for a checkpoint record.
|
||||
The caller has the intention of doing checkpoints.
|
||||
|
||||
RETURN
|
||||
0 on success
|
||||
1 on error
|
||||
@param pagecache pointer to the page cache
|
||||
@param[out] str pointer to where the allocated buffer, and
|
||||
its size, will be put
|
||||
@param[out] min_rec_lsn pointer to where the minimum rec_lsn of all
|
||||
relevant dirty pages will be put
|
||||
@param[out] max_rec_lsn pointer to where the maximum rec_lsn of all
|
||||
relevant dirty pages will be put
|
||||
@return Operation status
|
||||
@retval 0 OK
|
||||
@retval 1 Error
|
||||
*/
|
||||
|
||||
my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
|
||||
LEX_STRING *str,
|
||||
LSN *max_lsn)
|
||||
LSN *min_rec_lsn,
|
||||
LSN *max_rec_lsn)
|
||||
{
|
||||
my_bool error= 0;
|
||||
ulong stored_list_size= 0;
|
||||
uint file_hash;
|
||||
char *ptr;
|
||||
LSN minimum_rec_lsn= ULONGLONG_MAX, maximum_rec_lsn= 0;
|
||||
DBUG_ENTER("pagecache_collect_changed_blocks_with_LSN");
|
||||
|
||||
*max_lsn= 0;
|
||||
DBUG_ASSERT(NULL == str->str);
|
||||
/*
|
||||
We lock the entire cache but will be quick, just reading/writing a few MBs
|
||||
of memory at most.
|
||||
When we enter here, we must be sure that no "first_in_switch" situation
|
||||
is happening or will happen (either we have to get rid of
|
||||
first_in_switch in the code or, first_in_switch has to increment a
|
||||
"danger" counter for this function to know it has to wait). TODO.
|
||||
*/
|
||||
pagecache_pthread_mutex_lock(&pagecache->cache_lock);
|
||||
while (changed_blocks_is_incomplete > 0)
|
||||
{
|
||||
/*
|
||||
Some pages are more recent in memory than on disk (=dirty) and are not
|
||||
in "changed_blocks" so we cannot know them. Wait.
|
||||
*/
|
||||
pagecache_pthread_mutex_unlock(&pagecache->cache_lock);
|
||||
sleep(1);
|
||||
pagecache_pthread_mutex_lock(&pagecache->cache_lock);
|
||||
}
|
||||
|
||||
/* Count how many dirty pages are interesting */
|
||||
for (file_hash= 0; file_hash < PAGECACHE_CHANGED_BLOCKS_HASH; file_hash++)
|
||||
@ -3851,35 +3872,15 @@ my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
|
||||
DBUG_ASSERT(block->status & PCBLOCK_CHANGED);
|
||||
if (block->type != PAGECACHE_LSN_PAGE)
|
||||
continue; /* no need to store it */
|
||||
/*
|
||||
In the current pagecache, rec_lsn is not set correctly:
|
||||
1) it is set on pagecache_unlock(), too late (a page is dirty
|
||||
(PCBLOCK_CHANGED) since the first pagecache_write()). So in this
|
||||
scenario:
|
||||
thread1: thread2:
|
||||
write_REDO
|
||||
pagecache_write() checkpoint : reclsn not known
|
||||
pagecache_unlock(sets rec_lsn)
|
||||
commit
|
||||
crash,
|
||||
at recovery we will wrongly skip the REDO. It also affects the
|
||||
low-water mark's computation.
|
||||
2) sometimes the unlocking can be an implicit action of
|
||||
pagecache_write(), without any call to pagecache_unlock(), then
|
||||
rec_lsn is not set.
|
||||
1) and 2) are critical problems.
|
||||
TODO: fix this when Monty has explained how he writes BLOB pages.
|
||||
*/
|
||||
if (block->rec_lsn == 0)
|
||||
{
|
||||
DBUG_ASSERT(0);
|
||||
goto err;
|
||||
}
|
||||
stored_list_size++;
|
||||
}
|
||||
}
|
||||
|
||||
str->length= 8+(4+4+8)*stored_list_size;
|
||||
str->length= 8 + /* number of dirty pages */
|
||||
(4 + /* file */
|
||||
4 + /* pageno */
|
||||
LSN_STORE_SIZE /* rec_lsn */
|
||||
) * stored_list_size;
|
||||
if (NULL == (str->str= my_malloc(str->length, MYF(MY_WME))))
|
||||
goto err;
|
||||
ptr= str->str;
|
||||
@ -3896,19 +3897,27 @@ my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
|
||||
{
|
||||
if (block->type != PAGECACHE_LSN_PAGE)
|
||||
continue; /* no need to store it in the checkpoint record */
|
||||
DBUG_ASSERT((4 == sizeof(block->hash_link->file.file)));
|
||||
DBUG_ASSERT((4 == sizeof(block->hash_link->pageno)));
|
||||
compile_time_assert((4 == sizeof(block->hash_link->file.file)));
|
||||
compile_time_assert((4 == sizeof(block->hash_link->pageno)));
|
||||
int4store(ptr, block->hash_link->file.file);
|
||||
ptr+= 4;
|
||||
int4store(ptr, block->hash_link->pageno);
|
||||
ptr+= 4;
|
||||
int8store(ptr, (ulonglong) block->rec_lsn);
|
||||
ptr+= 8;
|
||||
set_if_bigger(*max_lsn, block->rec_lsn);
|
||||
lsn_store(ptr, block->rec_lsn);
|
||||
ptr+= LSN_STORE_SIZE;
|
||||
if (block->rec_lsn != 0)
|
||||
{
|
||||
if (cmp_translog_addr(block->rec_lsn, minimum_rec_lsn) < 0)
|
||||
minimum_rec_lsn= block->rec_lsn;
|
||||
if (cmp_translog_addr(block->rec_lsn, maximum_rec_lsn) > 0)
|
||||
maximum_rec_lsn= block->rec_lsn;
|
||||
} /* otherwise, some trn->rec_lsn should hold the info */
|
||||
}
|
||||
}
|
||||
end:
|
||||
pagecache_pthread_mutex_unlock(&pagecache->cache_lock);
|
||||
*min_rec_lsn= minimum_rec_lsn;
|
||||
*max_rec_lsn= maximum_rec_lsn;
|
||||
DBUG_RETURN(error);
|
||||
|
||||
err:
|
||||
|
@ -239,6 +239,7 @@ extern my_bool pagecache_delete_pages(PAGECACHE *pagecache,
|
||||
extern void end_pagecache(PAGECACHE *keycache, my_bool cleanup);
|
||||
extern my_bool pagecache_collect_changed_blocks_with_lsn(PAGECACHE *pagecache,
|
||||
LEX_STRING *str,
|
||||
LSN *min_lsn,
|
||||
LSN *max_lsn);
|
||||
extern int reset_pagecache_counters(const char *name, PAGECACHE *pagecache);
|
||||
|
||||
|
@ -52,7 +52,12 @@ int maria_panic(enum ha_panic_function flag)
|
||||
info=(MARIA_HA*) list_element->data;
|
||||
switch (flag) {
|
||||
case HA_PANIC_CLOSE:
|
||||
pthread_mutex_unlock(&THR_LOCK_maria); /* Not exactly right... */
|
||||
/*
|
||||
If bad luck (if some tables would be used now, which normally does not
|
||||
happen in MySQL), as we release the mutex, the list may change and so
|
||||
we may crash.
|
||||
*/
|
||||
pthread_mutex_unlock(&THR_LOCK_maria);
|
||||
if (maria_close(info))
|
||||
error=my_errno;
|
||||
pthread_mutex_lock(&THR_LOCK_maria);
|
||||
|
@ -29,25 +29,22 @@ static uint _ma_keynr(MARIA_HA *info, MARIA_KEYDEF *keyinfo, byte *page,
|
||||
byte *keypos, uint *ret_max_key);
|
||||
|
||||
|
||||
/*
|
||||
Estimate how many records there is in a given range
|
||||
/**
|
||||
@brief Estimate how many records there is in a given range
|
||||
|
||||
SYNOPSIS
|
||||
maria_records_in_range()
|
||||
info MARIA handler
|
||||
inx Index to use
|
||||
min_key Min key. Is = 0 if no min range
|
||||
max_key Max key. Is = 0 if no max range
|
||||
@param info MARIA handler
|
||||
@param inx Index to use
|
||||
@param min_key Min key. Is = 0 if no min range
|
||||
@param max_key Max key. Is = 0 if no max range
|
||||
|
||||
NOTES
|
||||
We should ONLY return 0 if there is no rows in range
|
||||
@note
|
||||
We should ONLY return 0 if there is no rows in range
|
||||
|
||||
RETURN
|
||||
HA_POS_ERROR error (or we can't estimate number of rows)
|
||||
number Estimated number of rows
|
||||
@return Estimated number of rows or error
|
||||
@retval HA_POS_ERROR error (or we can't estimate number of rows)
|
||||
@retval number Estimated number of rows
|
||||
*/
|
||||
|
||||
|
||||
ha_rows maria_records_in_range(MARIA_HA *info, int inx, key_range *min_key,
|
||||
key_range *max_key)
|
||||
{
|
||||
@ -115,6 +112,13 @@ ha_rows maria_records_in_range(MARIA_HA *info, int inx, key_range *min_key,
|
||||
rw_unlock(&info->s->key_root_lock[inx]);
|
||||
fast_ma_writeinfo(info);
|
||||
|
||||
/**
|
||||
@todo LOCK
|
||||
If res==0 (no rows), if we need to guarantee repeatability of the search,
|
||||
we will need to set a next-key lock in this statement.
|
||||
Also SELECT COUNT(*)...
|
||||
*/
|
||||
|
||||
DBUG_PRINT("info",("records: %ld",(ulong) (res)));
|
||||
DBUG_RETURN(res);
|
||||
}
|
||||
|
@ -18,6 +18,18 @@
|
||||
*/
|
||||
|
||||
#include "ma_fulltext.h"
|
||||
#include "trnman_public.h"
|
||||
|
||||
/**
|
||||
@brief renames a table
|
||||
|
||||
@param old_name current name of table
|
||||
@param new_name table should be renamed to this name
|
||||
|
||||
@return Operation status
|
||||
@retval 0 OK
|
||||
@retval !=0 Error
|
||||
*/
|
||||
|
||||
int maria_rename(const char *old_name, const char *new_name)
|
||||
{
|
||||
@ -26,22 +38,73 @@ int maria_rename(const char *old_name, const char *new_name)
|
||||
#ifdef USE_RAID
|
||||
uint raid_type=0,raid_chunks=0;
|
||||
#endif
|
||||
MARIA_HA *info;
|
||||
MARIA_SHARE *share;
|
||||
myf sync_dir;
|
||||
DBUG_ENTER("maria_rename");
|
||||
|
||||
#ifdef EXTRA_DEBUG
|
||||
_ma_check_table_is_closed(old_name,"rename old_table");
|
||||
_ma_check_table_is_closed(new_name,"rename new table2");
|
||||
#endif
|
||||
/* LOCK TODO take X-lock on table here */
|
||||
/** @todo LOCK take X-lock on table */
|
||||
if (!(info= maria_open(old_name, O_RDWR, HA_OPEN_FOR_REPAIR)))
|
||||
DBUG_RETURN(my_errno);
|
||||
share= info->s;
|
||||
#ifdef USE_RAID
|
||||
raid_type = share->base.raid_type;
|
||||
raid_chunks = share->base.raid_chunks;
|
||||
#endif
|
||||
|
||||
sync_dir= (share->base.transactional && !share->temporary) ?
|
||||
MY_SYNC_DIR : 0;
|
||||
if (sync_dir)
|
||||
{
|
||||
MARIA_HA *info;
|
||||
if (!(info=maria_open(old_name, O_RDONLY, 0)))
|
||||
DBUG_RETURN(my_errno);
|
||||
raid_type = info->s->base.raid_type;
|
||||
raid_chunks = info->s->base.raid_chunks;
|
||||
maria_close(info);
|
||||
uchar log_data[LSN_STORE_SIZE];
|
||||
LEX_STRING log_array[TRANSLOG_INTERNAL_PARTS + 3];
|
||||
uint old_name_len= strlen(old_name), new_name_len= strlen(new_name);
|
||||
int2store(log_data, old_name_len);
|
||||
int2store(log_data + 2, new_name_len);
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].str= log_data;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 0].length= 2 + 2;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 1].str= (char *)old_name;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 1].length= old_name_len;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 2].str= (char *)new_name;
|
||||
log_array[TRANSLOG_INTERNAL_PARTS + 2].length= new_name_len;
|
||||
/*
|
||||
For this record to be of any use for Recovery, we need the upper
|
||||
MySQL layer to be crash-safe, which it is not now (that would require
|
||||
work using the ddl_log of sql/sql_table.cc); when it is, we should
|
||||
reconsider the moment of writing this log record (before or after op,
|
||||
under THR_LOCK_maria or not...), how to use it in Recovery, and force
|
||||
the log. For now this record is just informative.
|
||||
*/
|
||||
if (unlikely(translog_write_record(&share->state.create_rename_lsn,
|
||||
LOGREC_REDO_RENAME_TABLE,
|
||||
&dummy_transaction_object, NULL,
|
||||
2 + 2 + old_name_len + new_name_len,
|
||||
sizeof(log_array)/sizeof(log_array[0]),
|
||||
log_array, NULL)))
|
||||
{
|
||||
maria_close(info);
|
||||
DBUG_RETURN(1);
|
||||
}
|
||||
/*
|
||||
store LSN into file, needed for Recovery to not be confused if a
|
||||
RENAME happened (applying REDOs to the wrong table).
|
||||
*/
|
||||
lsn_store(log_data, share->state.create_rename_lsn);
|
||||
if (my_pwrite(share->kfile.file, log_data, sizeof(log_data),
|
||||
sizeof(share->state.header) + 2, MYF(MY_NABP)) ||
|
||||
my_sync(share->kfile.file, MYF(MY_WME)))
|
||||
{
|
||||
maria_close(info);
|
||||
DBUG_RETURN(1);
|
||||
}
|
||||
}
|
||||
|
||||
maria_close(info);
|
||||
#ifdef USE_RAID
|
||||
#ifdef EXTRA_DEBUG
|
||||
_ma_check_table_is_closed(old_name,"rename raidcheck");
|
||||
#endif
|
||||
@ -49,29 +112,18 @@ int maria_rename(const char *old_name, const char *new_name)
|
||||
|
||||
fn_format(from,old_name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
|
||||
fn_format(to,new_name,"",MARIA_NAME_IEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
|
||||
/*
|
||||
RECOVERY TODO log the two renames below. Update
|
||||
ZeroDirtyPagesLSN of the table on disk (=> sync the files), this is
|
||||
needed so that Recovery does not pick a wrong table.
|
||||
Then do the file renames.
|
||||
For this log record to be of any use for Recovery, we need the upper MySQL
|
||||
layer to be crash-safe in DDLs; when it is we should reconsider the moment
|
||||
of writing this log record, how to use it in Recovery, and force the log.
|
||||
For now this record is only informative. But ZeroDirtyPagesLSN is
|
||||
critically needed!
|
||||
*/
|
||||
if (my_rename_with_symlink(from, to, MYF(MY_WME | MY_SYNC_DIR)))
|
||||
if (my_rename_with_symlink(from, to, MYF(MY_WME | sync_dir)))
|
||||
DBUG_RETURN(my_errno);
|
||||
fn_format(from,old_name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
|
||||
fn_format(to,new_name,"",MARIA_NAME_DEXT,MY_UNPACK_FILENAME|MY_APPEND_EXT);
|
||||
#ifdef USE_RAID
|
||||
if (raid_type)
|
||||
data_file_rename_error= my_raid_rename(from, to, raid_chunks,
|
||||
MYF(MY_WME | MY_SYNC_DIR));
|
||||
MYF(MY_WME | sync_dir));
|
||||
else
|
||||
#endif
|
||||
data_file_rename_error=
|
||||
my_rename_with_symlink(from, to, MYF(MY_WME | MY_SYNC_DIR));
|
||||
my_rename_with_symlink(from, to, MYF(MY_WME | sync_dir));
|
||||
if (data_file_rename_error)
|
||||
{
|
||||
/*
|
||||
@ -81,7 +133,7 @@ int maria_rename(const char *old_name, const char *new_name)
|
||||
data_file_rename_error= my_errno;
|
||||
fn_format(from, old_name, "", MARIA_NAME_IEXT, MYF(MY_UNPACK_FILENAME|MY_APPEND_EXT));
|
||||
fn_format(to, new_name, "", MARIA_NAME_IEXT, MYF(MY_UNPACK_FILENAME|MY_APPEND_EXT));
|
||||
my_rename_with_symlink(to, from, MYF(MY_WME | MY_SYNC_DIR));
|
||||
my_rename_with_symlink(to, from, MYF(MY_WME | sync_dir));
|
||||
}
|
||||
DBUG_RETURN(data_file_rename_error);
|
||||
|
||||
|
@ -47,7 +47,13 @@ PAGECACHE *maria_pagecache= &maria_pagecache_var;
|
||||
PAGECACHE maria_log_pagecache_var;
|
||||
PAGECACHE *maria_log_pagecache= &maria_log_pagecache_var;
|
||||
|
||||
/* For using maria externally */
|
||||
/**
|
||||
@brief when transactionality does not matter we can use this transaction
|
||||
|
||||
Used in external programs like ma_test*, and also internally inside
|
||||
libmaria when there is no transaction around and the operation isn't
|
||||
transactional (CREATE/DROP/RENAME/OPTIMIZE/REPAIR).
|
||||
*/
|
||||
TRN dummy_transaction_object;
|
||||
|
||||
/* Enough for comparing if number is zero */
|
||||
|
@ -3,10 +3,16 @@
|
||||
# Execute some simple basic test on MyISAM libary to check if things
|
||||
# works at all.
|
||||
|
||||
# If you want to run this in Valgrind, you should use --trace-children=yes,
|
||||
# so that it detects problems in ma_test* and not in the shell script
|
||||
valgrind="valgrind --alignment=8 --leak-check=yes"
|
||||
silent="-s"
|
||||
suffix=""
|
||||
#set -x -v -e
|
||||
if [ -z "$maria_path" ]
|
||||
then
|
||||
maria_path="."
|
||||
fi
|
||||
|
||||
run_tests()
|
||||
{
|
||||
@ -14,139 +20,139 @@ run_tests()
|
||||
#
|
||||
# First some simple tests
|
||||
#
|
||||
./ma_test1$suffix $silent $row_type
|
||||
./maria_chk$suffix -se test1
|
||||
./ma_test1$suffix $silent -N $row_type
|
||||
./maria_chk$suffix -se test1
|
||||
./ma_test1$suffix $silent -P --checksum $row_type
|
||||
./maria_chk$suffix -se test1
|
||||
./ma_test1$suffix $silent -P -N $row_type
|
||||
./maria_chk$suffix -se test1
|
||||
./ma_test1$suffix $silent -B -N -R2 $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -k 480 --unique $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -N -R1 $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -p $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -p -N --unique $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -p -N --key_length=127 --checksum $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -p -N --key_length=128 $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -p --key_length=480 $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -B $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -B --key_length=64 --unique $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -B -k 480 --checksum $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -B -k 480 -N --unique --checksum $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -m $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -m -P --unique --checksum $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -m -P --key_length=480 --key_cache $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -m -p $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -w --unique $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -w --key_length=64 --checksum $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -w -N --key_length=480 $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -w --key_length=480 --checksum $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -b -N $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -a -b --key_length=480 $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent -p -B --key_length=480 $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent --checksum --unique $row_type
|
||||
./maria_chk$suffix -se test1
|
||||
./ma_test1$suffix $silent --unique $row_type
|
||||
./maria_chk$suffix -se test1
|
||||
$maria_path/ma_test1$suffix $silent $row_type
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
$maria_path/ma_test1$suffix $silent -N $row_type
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
$maria_path/ma_test1$suffix $silent -P --checksum $row_type
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
$maria_path/ma_test1$suffix $silent -P -N $row_type
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
$maria_path/ma_test1$suffix $silent -B -N -R2 $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -k 480 --unique $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -N -R1 $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -p $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -p -N --unique $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -p -N --key_length=127 --checksum $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -p -N --key_length=128 $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -p --key_length=480 $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -B $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -B --key_length=64 --unique $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -B -k 480 --checksum $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -B -k 480 -N --unique --checksum $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -m $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -m -P --unique --checksum $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -m -P --key_length=480 --key_cache $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -m -p $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -w --unique $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -w --key_length=64 --checksum $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -w -N --key_length=480 $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -w --key_length=480 --checksum $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -b -N $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -a -b --key_length=480 $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent -p -B --key_length=480 $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent --checksum --unique $row_type
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
$maria_path/ma_test1$suffix $silent --unique $row_type
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
|
||||
./ma_test1$suffix $silent --key_multiple -N -S $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent --key_multiple -a -p --key_length=480 $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent --key_multiple -a -B --key_length=480 $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
./ma_test1$suffix $silent --key_multiple -P -S $row_type
|
||||
./maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent --key_multiple -N -S $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent --key_multiple -a -p --key_length=480 $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent --key_multiple -a -B --key_length=480 $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
$maria_path/ma_test1$suffix $silent --key_multiple -P -S $row_type
|
||||
$maria_path/maria_chk$suffix -sm test1
|
||||
|
||||
./maria_pack$suffix --force -s test1
|
||||
./maria_chk$suffix -ess test1
|
||||
$maria_path/maria_pack$suffix --force -s test1
|
||||
$maria_path/maria_chk$suffix -ess test1
|
||||
|
||||
./ma_test2$suffix $silent -L -K -W -P $row_type
|
||||
./maria_chk$suffix -sm test2
|
||||
./ma_test2$suffix $silent -L -K -W -P -A $row_type
|
||||
./maria_chk$suffix -sm test2
|
||||
./ma_test2$suffix $silent -L -K -P -R3 -m50 -b1000000 $row_type
|
||||
./maria_chk$suffix -sm test2
|
||||
./ma_test2$suffix $silent -L -B $row_type
|
||||
./maria_chk$suffix -sm test2
|
||||
./ma_test2$suffix $silent -D -B -c $row_type
|
||||
./maria_chk$suffix -sm test2
|
||||
./ma_test2$suffix $silent -m10000 -e4096 -K $row_type
|
||||
./maria_chk$suffix -sm test2
|
||||
./ma_test2$suffix $silent -m10000 -e8192 -K $row_type
|
||||
./maria_chk$suffix -sm test2
|
||||
./ma_test2$suffix $silent -m10000 -e16384 -E16384 -K -L $row_type
|
||||
./maria_chk$suffix -sm test2
|
||||
$maria_path/ma_test2$suffix $silent -L -K -W -P $row_type
|
||||
$maria_path/maria_chk$suffix -sm test2
|
||||
$maria_path/ma_test2$suffix $silent -L -K -W -P -A $row_type
|
||||
$maria_path/maria_chk$suffix -sm test2
|
||||
$maria_path/ma_test2$suffix $silent -L -K -P -R3 -m50 -b1000000 $row_type
|
||||
$maria_path/maria_chk$suffix -sm test2
|
||||
$maria_path/ma_test2$suffix $silent -L -B $row_type
|
||||
$maria_path/maria_chk$suffix -sm test2
|
||||
$maria_path/ma_test2$suffix $silent -D -B -c $row_type
|
||||
$maria_path/maria_chk$suffix -sm test2
|
||||
$maria_path/ma_test2$suffix $silent -m10000 -e4096 -K $row_type
|
||||
$maria_path/maria_chk$suffix -sm test2
|
||||
$maria_path/ma_test2$suffix $silent -m10000 -e8192 -K $row_type
|
||||
$maria_path/maria_chk$suffix -sm test2
|
||||
$maria_path/ma_test2$suffix $silent -m10000 -e16384 -E16384 -K -L $row_type
|
||||
$maria_path/maria_chk$suffix -sm test2
|
||||
}
|
||||
|
||||
run_repair_tests()
|
||||
{
|
||||
row_type=$1
|
||||
./ma_test1$suffix $silent --checksum $row_type
|
||||
./maria_chk$suffix -se test1
|
||||
./maria_chk$suffix -rs test1
|
||||
./maria_chk$suffix -se test1
|
||||
./maria_chk$suffix -rqs test1
|
||||
./maria_chk$suffix -se test1
|
||||
./maria_chk$suffix -rs --correct-checksum test1
|
||||
./maria_chk$suffix -se test1
|
||||
./maria_chk$suffix -rqs --correct-checksum test1
|
||||
./maria_chk$suffix -se test1
|
||||
./maria_chk$suffix -ros --correct-checksum test1
|
||||
./maria_chk$suffix -se test1
|
||||
./maria_chk$suffix -rqos --correct-checksum test1
|
||||
./maria_chk$suffix -se test1
|
||||
$maria_path/ma_test1$suffix $silent --checksum $row_type
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
$maria_path/maria_chk$suffix -rs test1
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
$maria_path/maria_chk$suffix -rqs test1
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
$maria_path/maria_chk$suffix -rs --correct-checksum test1
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
$maria_path/maria_chk$suffix -rqs --correct-checksum test1
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
$maria_path/maria_chk$suffix -ros --correct-checksum test1
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
$maria_path/maria_chk$suffix -rqos --correct-checksum test1
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
}
|
||||
|
||||
run_pack_tests()
|
||||
{
|
||||
row_type=$1
|
||||
# check of maria_pack / maria_chk
|
||||
./ma_test1$suffix $silent --checksum $row_type
|
||||
./maria_pack$suffix --force -s test1
|
||||
./maria_chk$suffix -ess test1
|
||||
./maria_chk$suffix -rqs test1
|
||||
./maria_chk$suffix -es test1
|
||||
./maria_chk$suffix -rs test1
|
||||
./maria_chk$suffix -es test1
|
||||
./maria_chk$suffix -rus test1
|
||||
./maria_chk$suffix -es test1
|
||||
$maria_path/ma_test1$suffix $silent --checksum $row_type
|
||||
$maria_path/maria_pack$suffix --force -s test1
|
||||
$maria_path/maria_chk$suffix -ess test1
|
||||
$maria_path/maria_chk$suffix -rqs test1
|
||||
$maria_path/maria_chk$suffix -es test1
|
||||
$maria_path/maria_chk$suffix -rs test1
|
||||
$maria_path/maria_chk$suffix -es test1
|
||||
$maria_path/maria_chk$suffix -rus test1
|
||||
$maria_path/maria_chk$suffix -es test1
|
||||
|
||||
./ma_test1$suffix $silent --checksum -S $row_type
|
||||
./maria_chk$suffix -se test1
|
||||
./maria_chk$suffix -ros test1
|
||||
./maria_chk$suffix -rqs test1
|
||||
./maria_chk$suffix -se test1
|
||||
$maria_path/ma_test1$suffix $silent --checksum -S $row_type
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
$maria_path/maria_chk$suffix -ros test1
|
||||
$maria_path/maria_chk$suffix -rqs test1
|
||||
$maria_path/maria_chk$suffix -se test1
|
||||
|
||||
./maria_pack$suffix --force -s test1
|
||||
./maria_chk$suffix -rqs test1
|
||||
./maria_chk$suffix -es test1
|
||||
./maria_chk$suffix -rus test1
|
||||
./maria_chk$suffix -es test1
|
||||
$maria_path/maria_pack$suffix --force -s test1
|
||||
$maria_path/maria_chk$suffix -rqs test1
|
||||
$maria_path/maria_chk$suffix -es test1
|
||||
$maria_path/maria_chk$suffix -rus test1
|
||||
$maria_path/maria_chk$suffix -es test1
|
||||
}
|
||||
|
||||
echo "Running tests with dynamic row format"
|
||||
@ -169,27 +175,27 @@ run_tests "-M -T"
|
||||
# Tests that gives warnings
|
||||
#
|
||||
|
||||
./ma_test2$suffix $silent -L -K -W -P -S -R1 -m500
|
||||
./maria_chk$suffix -sm test2
|
||||
$maria_path/ma_test2$suffix $silent -L -K -W -P -S -R1 -m500
|
||||
$maria_path/maria_chk$suffix -sm test2
|
||||
echo "ma_test2$suffix $silent -L -K -R1 -m2000 ; Should give error 135"
|
||||
./ma_test2$suffix $silent -L -K -R1 -m2000
|
||||
echo "./maria_chk$suffix -sm test2 will warn that 'Datafile is almost full'"
|
||||
./maria_chk$suffix -sm test2
|
||||
./maria_chk$suffix -ssm test2
|
||||
$maria_path/ma_test2$suffix $silent -L -K -R1 -m2000
|
||||
echo "$maria_path/maria_chk$suffix -sm test2 will warn that 'Datafile is almost full'"
|
||||
$maria_path/maria_chk$suffix -sm test2
|
||||
$maria_path/maria_chk$suffix -ssm test2
|
||||
|
||||
#
|
||||
# Some timing tests
|
||||
#
|
||||
time ./ma_test2$suffix $silent
|
||||
time ./ma_test2$suffix $silent -S
|
||||
time ./ma_test2$suffix $silent -M
|
||||
time ./ma_test2$suffix $silent -B
|
||||
time ./ma_test2$suffix $silent -L
|
||||
time ./ma_test2$suffix $silent -K
|
||||
time ./ma_test2$suffix $silent -K -B
|
||||
time ./ma_test2$suffix $silent -L -B
|
||||
time ./ma_test2$suffix $silent -L -K -B
|
||||
time ./ma_test2$suffix $silent -L -K -W -B
|
||||
time ./ma_test2$suffix $silent -L -K -W -B -S
|
||||
time ./ma_test2$suffix $silent -L -K -W -B -M
|
||||
time ./ma_test2$suffix $silent -D -K -W -B -S
|
||||
time $maria_path/ma_test2$suffix $silent
|
||||
time $maria_path/ma_test2$suffix $silent -S
|
||||
time $maria_path/ma_test2$suffix $silent -M
|
||||
time $maria_path/ma_test2$suffix $silent -B
|
||||
time $maria_path/ma_test2$suffix $silent -L
|
||||
time $maria_path/ma_test2$suffix $silent -K
|
||||
time $maria_path/ma_test2$suffix $silent -K -B
|
||||
time $maria_path/ma_test2$suffix $silent -L -B
|
||||
time $maria_path/ma_test2$suffix $silent -L -K -B
|
||||
time $maria_path/ma_test2$suffix $silent -L -K -W -B
|
||||
time $maria_path/ma_test2$suffix $silent -L -K -W -B -S
|
||||
time $maria_path/ma_test2$suffix $silent -L -K -W -B -M
|
||||
time $maria_path/ma_test2$suffix $silent -D -K -W -B -S
|
||||
|
@ -93,6 +93,7 @@ typedef struct st_maria_state_info
|
||||
uint sortkey; /* sorted by this key (not used) */
|
||||
uint open_count;
|
||||
uint8 changed; /* Changed since mariachk */
|
||||
LSN create_rename_lsn; /**< LSN when table was last created/renamed */
|
||||
|
||||
/* the following isn't saved on disk */
|
||||
uint state_diff_length; /* Should be 0 */
|
||||
@ -101,7 +102,8 @@ typedef struct st_maria_state_info
|
||||
} MARIA_STATE_INFO;
|
||||
|
||||
|
||||
#define MARIA_STATE_INFO_SIZE (24 + 4 + 11*8 + 4*4 + 8 + 3*4 + 5*8)
|
||||
#define MARIA_STATE_INFO_SIZE \
|
||||
(24 + LSN_STORE_SIZE + 4 + 11*8 + 4*4 + 8 + 3*4 + 5*8)
|
||||
#define MARIA_STATE_KEY_SIZE 8
|
||||
#define MARIA_STATE_KEYBLOCK_SIZE 8
|
||||
#define MARIA_STATE_KEYSEG_SIZE 4
|
||||
@ -229,6 +231,7 @@ typedef struct st_maria_share
|
||||
PAGECACHE *pagecache; /* ref to the current key cache */
|
||||
MARIA_DECODE_TREE *decode_trees;
|
||||
uint16 *decode_tables;
|
||||
uint16 id; /**< 2-byte id by which log records refer to the table */
|
||||
/* Called the first time the table instance is opened */
|
||||
my_bool (*once_init)(struct st_maria_share *, File);
|
||||
/* Called when the last instance of the table is closed */
|
||||
@ -889,6 +892,7 @@ volatile int *_ma_killed_ptr(HA_CHECK *param);
|
||||
void _ma_check_print_error _VARARGS((HA_CHECK *param, const char *fmt, ...));
|
||||
void _ma_check_print_warning _VARARGS((HA_CHECK *param, const char *fmt, ...));
|
||||
void _ma_check_print_info _VARARGS((HA_CHECK *param, const char *fmt, ...));
|
||||
int _ma_repair_write_log_record(const HA_CHECK *param, MARIA_HA *info);
|
||||
C_MODE_END
|
||||
|
||||
int _ma_flush_pending_blocks(MARIA_SORT_PARAM *param);
|
||||
|
@ -52,6 +52,7 @@ static my_atomic_rwlock_t LOCK_short_trid_to_trn, LOCK_pool;
|
||||
|
||||
/*
|
||||
Simple interface functions
|
||||
QQ: if they stay so simple, should we make them inline?
|
||||
*/
|
||||
|
||||
uint trnman_increment_locked_tables(TRN *trn)
|
||||
@ -343,6 +344,9 @@ int trnman_end_trn(TRN *trn, my_bool commit)
|
||||
LF_PINS *pins= trn->pins;
|
||||
DBUG_ENTER("trnman_end_trn");
|
||||
|
||||
DBUG_ASSERT(trn->rec_lsn == 0);
|
||||
/* if a rollback, all UNDO records should have been executed */
|
||||
DBUG_ASSERT(commit || trn->undo_lsn == 0);
|
||||
DBUG_PRINT("info", ("pthread_mutex_lock LOCK_trn_list"));
|
||||
pthread_mutex_lock(&LOCK_trn_list);
|
||||
|
||||
@ -379,8 +383,6 @@ int trnman_end_trn(TRN *trn, my_bool commit)
|
||||
/*
|
||||
if transaction is committed and it was not the only active transaction -
|
||||
add it to the committed list (which is used for read-from relation)
|
||||
TODO check in the condition below that a transaction have made some
|
||||
changes, was not read-only. Something like '&& UndoLSN != 0'
|
||||
*/
|
||||
if (commit && active_list_min.next != &active_list_max)
|
||||
{
|
||||
@ -390,6 +392,19 @@ int trnman_end_trn(TRN *trn, my_bool commit)
|
||||
trnman_committed_transactions++;
|
||||
|
||||
res= lf_hash_insert(&trid_to_committed_trn, pins, &trn);
|
||||
/*
|
||||
By going on with life is res<0, we let other threads block on
|
||||
our rows (because they will never see us committed in
|
||||
trid_to_committed_trn) until they timeout. Though correct, this is not a
|
||||
good situation:
|
||||
- if connection reconnects and wants to check if its rows have been
|
||||
committed, it will not be able to do that (it will just lock on them) so
|
||||
connection stays permanently in doubt
|
||||
- internal structures trid_to_committed_trn and committed_list are
|
||||
desynchronized.
|
||||
So we should take Maria down immediately, the two problems being
|
||||
automatically solved at restart.
|
||||
*/
|
||||
DBUG_ASSERT(res <= 0);
|
||||
}
|
||||
if (res)
|
||||
@ -526,71 +541,133 @@ void trnman_rollback_statement(TRN *trn __attribute__ ((unused)))
|
||||
}
|
||||
|
||||
|
||||
/*
|
||||
Allocates two buffers and stores in them some information about transactions
|
||||
of the active list (into the first buffer) and of the committed list (into
|
||||
the second buffer).
|
||||
/**
|
||||
@brief Allocates buffers and stores in them some info about transactions
|
||||
|
||||
SYNOPSIS
|
||||
trnman_collect_transactions()
|
||||
str_act (OUT) pointer to a LEX_STRING where the allocated buffer, and
|
||||
its size, will be put
|
||||
str_com (OUT) pointer to a LEX_STRING where the allocated buffer, and
|
||||
its size, will be put
|
||||
Does the allocation because the caller cannot know the size itself.
|
||||
Memory freeing is to be done by the caller (if the "str" member of the
|
||||
LEX_STRING is not NULL).
|
||||
The caller has the intention of doing checkpoints.
|
||||
|
||||
@param[out] str_act pointer to where the allocated buffer,
|
||||
and its size, will be put; buffer will be filled
|
||||
with info about active transactions
|
||||
@param[out] str_com pointer to where the allocated buffer,
|
||||
and its size, will be put; buffer will be filled
|
||||
with info about committed transactions
|
||||
@param[out] min_first_undo_lsn pointer to where the minimum
|
||||
first_undo_lsn of all transactions will be put
|
||||
|
||||
DESCRIPTION
|
||||
Does the allocation because the caller cannot know the size itself.
|
||||
Memory freeing is to be done by the caller (if the "str" member of the
|
||||
LEX_STRING is not NULL).
|
||||
The caller has the intention of doing checkpoints.
|
||||
|
||||
RETURN
|
||||
0 on success
|
||||
1 on error
|
||||
@return Operation status
|
||||
@retval 0 OK
|
||||
@retval 1 Error
|
||||
*/
|
||||
my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com)
|
||||
|
||||
my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com,
|
||||
LSN *min_rec_lsn, LSN *min_first_undo_lsn)
|
||||
{
|
||||
my_bool error;
|
||||
TRN *trn;
|
||||
char *ptr;
|
||||
uint stored_transactions= 0;
|
||||
LSN minimum_rec_lsn= ULONGLONG_MAX, minimum_first_undo_lsn= ULONGLONG_MAX;
|
||||
DBUG_ENTER("trnman_collect_transactions");
|
||||
|
||||
DBUG_ASSERT((NULL == str_act->str) && (NULL == str_com->str));
|
||||
|
||||
/* validate the use of read_non_atomic() in general: */
|
||||
compile_time_assert((sizeof(LSN) == 8) && (sizeof(LSN_WITH_FLAGS) == 8));
|
||||
|
||||
DBUG_PRINT("info", ("pthread_mutex_lock LOCK_trn_list"));
|
||||
pthread_mutex_lock(&LOCK_trn_list);
|
||||
str_act->length= 8+(6+2+7+7+7)*trnman_active_transactions;
|
||||
str_com->length= 8+(6+7+7)*trnman_committed_transactions;
|
||||
str_act->length= 2 + /* number of active transactions */
|
||||
LSN_STORE_SIZE + /* minimum of their rec_lsn */
|
||||
(6 + /* long id */
|
||||
2 + /* short id */
|
||||
LSN_STORE_SIZE + /* undo_lsn */
|
||||
#ifdef MARIA_VERSIONING /* not enabled yet */
|
||||
LSN_STORE_SIZE + /* undo_purge_lsn */
|
||||
#endif
|
||||
LSN_STORE_SIZE /* first_undo_lsn */
|
||||
) * trnman_active_transactions;
|
||||
str_com->length= 8 + /* number of committed transactions */
|
||||
(6 + /* long id */
|
||||
#ifdef MARIA_VERSIONING /* not enabled yet */
|
||||
LSN_STORE_SIZE + /* undo_purge_lsn */
|
||||
#endif
|
||||
LSN_STORE_SIZE /* first_undo_lsn */
|
||||
) * trnman_committed_transactions;
|
||||
if ((NULL == (str_act->str= my_malloc(str_act->length, MYF(MY_WME)))) ||
|
||||
(NULL == (str_com->str= my_malloc(str_com->length, MYF(MY_WME)))))
|
||||
goto err;
|
||||
/* First, the active transactions */
|
||||
ptr= str_act->str;
|
||||
int8store(ptr, (ulonglong)trnman_active_transactions);
|
||||
ptr+= 8;
|
||||
ptr= str_act->str + 2 + LSN_STORE_SIZE;
|
||||
for (trn= active_list_min.next; trn != &active_list_max; trn= trn->next)
|
||||
{
|
||||
/*
|
||||
trns with a short trid of 0 are not initialized; Recovery will recognize
|
||||
this and ignore them.
|
||||
State is not needed for now (only when we supported prepared trns).
|
||||
For LSNs, Sanja will soon push lsn7store.
|
||||
trns with a short trid of 0 are not even initialized, we can ignore
|
||||
them. trns with undo_lsn==0 have done no writes, we can ignore them
|
||||
too. XID not needed now.
|
||||
*/
|
||||
uint sid;
|
||||
LSN rec_lsn, undo_lsn, first_undo_lsn;
|
||||
if ((sid= trn->short_id) == 0)
|
||||
{
|
||||
/*
|
||||
Not even inited, has done nothing. Or it is the
|
||||
dummy_transaction_object, which does only non-transactional
|
||||
immediate-sync operations (CREATE/DROP/RENAME/REPAIR TABLE), and so
|
||||
can be forgotten for Checkpoint.
|
||||
*/
|
||||
continue;
|
||||
}
|
||||
#ifndef MARIA_CHECKPOINT
|
||||
/*
|
||||
in the checkpoint patch (not yet ready) we will have a real implementation
|
||||
of lsn_read_non_atomic(); for now it's not needed
|
||||
*/
|
||||
#define lsn_read_non_atomic(A) (A)
|
||||
#endif
|
||||
/* needed for low-water mark calculation */
|
||||
if (((rec_lsn= lsn_read_non_atomic(trn->rec_lsn)) > 0) &&
|
||||
(cmp_translog_addr(rec_lsn, minimum_rec_lsn) < 0))
|
||||
minimum_rec_lsn= rec_lsn;
|
||||
/*
|
||||
trn may have logged REDOs but not yet UNDO, that's why we read rec_lsn
|
||||
before deciding to ignore if undo_lsn==0.
|
||||
*/
|
||||
if ((undo_lsn= trn->undo_lsn) == 0) /* trn can be forgotten */
|
||||
continue;
|
||||
stored_transactions++;
|
||||
int6store(ptr, trn->trid);
|
||||
ptr+= 6;
|
||||
int2store(ptr, trn->short_id);
|
||||
int2store(ptr, sid);
|
||||
ptr+= 2;
|
||||
/* needed for rollback */
|
||||
/* lsn7store(ptr, trn->undo_lsn); */
|
||||
ptr+= 7;
|
||||
/* needed for purge */
|
||||
/* lsn7store(ptr, trn->undo_purge_lsn); */
|
||||
ptr+= 7;
|
||||
lsn_store(ptr, undo_lsn); /* needed for rollback */
|
||||
ptr+= LSN_STORE_SIZE;
|
||||
#ifdef MARIA_VERSIONING /* not enabled yet */
|
||||
/* to know where purging should start (last delete of this trn) */
|
||||
lsn_store(ptr, trn->undo_purge_lsn);
|
||||
ptr+= LSN_STORE_SIZE;
|
||||
#endif
|
||||
/* needed for low-water mark calculation */
|
||||
/* lsn7store(ptr, read_non_atomic(&trn->first_undo_lsn)); */
|
||||
ptr+= 7;
|
||||
if (((first_undo_lsn= lsn_read_non_atomic(trn->first_undo_lsn)) > 0) &&
|
||||
(cmp_translog_addr(first_undo_lsn, minimum_first_undo_lsn) < 0))
|
||||
minimum_first_undo_lsn= first_undo_lsn;
|
||||
lsn_store(ptr, first_undo_lsn);
|
||||
ptr+= LSN_STORE_SIZE;
|
||||
/**
|
||||
@todo RECOVERY: add a comment explaining why we can dirtily read some
|
||||
vars, inspired by the text of "assumption 8" in WL#3072
|
||||
*/
|
||||
}
|
||||
str_act->length= ptr - str_act->str; /* as we maybe over-estimated */
|
||||
ptr= str_act->str;
|
||||
int2store(ptr, stored_transactions);
|
||||
ptr+= 2;
|
||||
/* this LSN influences how REDOs for any page can be ignored by Recovery */
|
||||
lsn_store(ptr, minimum_rec_lsn);
|
||||
/* one day there will also be a list of prepared transactions */
|
||||
/* do the same for committed ones */
|
||||
ptr= str_com->str;
|
||||
int8store(ptr, (ulonglong)trnman_committed_transactions);
|
||||
@ -598,18 +675,26 @@ my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com)
|
||||
for (trn= committed_list_min.next; trn != &committed_list_max;
|
||||
trn= trn->next)
|
||||
{
|
||||
LSN first_undo_lsn;
|
||||
int6store(ptr, trn->trid);
|
||||
ptr+= 6;
|
||||
/* mi_int7store(ptr, trn->undo_purge_lsn); */
|
||||
ptr+= 7;
|
||||
/* mi_int7store(ptr, read_non_atomic(&trn->first_undo_lsn)); */
|
||||
ptr+= 7;
|
||||
#ifdef MARIA_VERSIONING /* not enabled yet */
|
||||
lsn_store(ptr, trn->undo_purge_lsn);
|
||||
ptr+= LSN_STORE_SIZE;
|
||||
#endif
|
||||
first_undo_lsn= LSN_WITH_FLAGS_TO_LSN(trn->first_undo_lsn);
|
||||
if (cmp_translog_addr(first_undo_lsn, minimum_first_undo_lsn) < 0)
|
||||
minimum_first_undo_lsn= first_undo_lsn;
|
||||
lsn_store(ptr, first_undo_lsn);
|
||||
ptr+= LSN_STORE_SIZE;
|
||||
}
|
||||
/*
|
||||
TODO: if we see there exists no transaction (active and committed) we can
|
||||
tell the lock-free structures to do some freeing (my_free()).
|
||||
*/
|
||||
error= 0;
|
||||
*min_rec_lsn= minimum_rec_lsn;
|
||||
*min_first_undo_lsn= minimum_first_undo_lsn;
|
||||
goto end;
|
||||
err:
|
||||
error= 1;
|
||||
|
@ -45,12 +45,13 @@ struct st_transaction
|
||||
LF_PINS *pins;
|
||||
TrID trid, min_read_from, commit_trid;
|
||||
TRN *next, *prev;
|
||||
LSN rec_lsn, undo_lsn, first_undo_lsn;
|
||||
LSN rec_lsn, undo_lsn;
|
||||
LSN_WITH_FLAGS first_undo_lsn;
|
||||
uint locked_tables;
|
||||
/* Note! if locks.loid is 0, trn is NOT initialized */
|
||||
};
|
||||
|
||||
TRN dummy_transaction_object;
|
||||
#define TRANSACTION_LOGGED_LONG_ID ULL(0x8000000000000000)
|
||||
|
||||
C_MODE_END
|
||||
|
||||
|
@ -20,6 +20,8 @@
|
||||
to include my_atomic.h in C++ code.
|
||||
*/
|
||||
|
||||
#include "ma_loghandler_lsn.h"
|
||||
|
||||
C_MODE_START
|
||||
typedef uint64 TrID; /* our TrID is 6 bytes */
|
||||
typedef struct st_transaction TRN;
|
||||
@ -27,6 +29,7 @@ typedef struct st_transaction TRN;
|
||||
#define SHORT_TRID_MAX 65535
|
||||
|
||||
extern uint trnman_active_transactions, trnman_allocated_transactions;
|
||||
extern TRN dummy_transaction_object;
|
||||
|
||||
int trnman_init(void);
|
||||
void trnman_destroy(void);
|
||||
@ -39,7 +42,9 @@ void trnman_free_trn(TRN *trn);
|
||||
int trnman_can_read_from(TRN *trn, TrID trid);
|
||||
void trnman_new_statement(TRN *trn);
|
||||
void trnman_rollback_statement(TRN *trn);
|
||||
my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com);
|
||||
my_bool trnman_collect_transactions(LEX_STRING *str_act, LEX_STRING *str_com,
|
||||
LSN *min_rec_lsn,
|
||||
LSN *min_first_undo_lsn);
|
||||
|
||||
uint trnman_increment_locked_tables(TRN *trn);
|
||||
uint trnman_decrement_locked_tables(TRN *trn);
|
||||
|
@ -196,7 +196,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
if (translog_write_record(&lsn,
|
||||
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
|
||||
trn, NULL,
|
||||
6, TRANSLOG_INTERNAL_PARTS + 1, parts))
|
||||
6, TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "Can't write record #%lu\n", (ulong) 0);
|
||||
translog_destroy();
|
||||
@ -218,7 +218,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
parts[TRANSLOG_INTERNAL_PARTS + 1].str= NULL;
|
||||
parts[TRANSLOG_INTERNAL_PARTS + 1].length= 0;
|
||||
if (translog_write_record(&lsn, LOGREC_FIXED_RECORD_1LSN_EXAMPLE,
|
||||
trn, NULL, LSN_STORE_SIZE, 0, parts))
|
||||
trn, NULL, LSN_STORE_SIZE, 0, parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "1 Can't write reference defore record #%lu\n",
|
||||
(ulong) i);
|
||||
@ -238,7 +238,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
if (translog_write_record(&lsn,
|
||||
LOGREC_VARIABLE_RECORD_1LSN_EXAMPLE,
|
||||
trn, NULL, 0, TRANSLOG_INTERNAL_PARTS + 2,
|
||||
parts))
|
||||
parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "1 Can't write var reference defore record #%lu\n",
|
||||
(ulong) i);
|
||||
@ -257,7 +257,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
if (translog_write_record(&lsn,
|
||||
LOGREC_FIXED_RECORD_2LSN_EXAMPLE,
|
||||
trn, NULL,
|
||||
23, TRANSLOG_INTERNAL_PARTS + 1, parts))
|
||||
23, TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "0 Can't write reference defore record #%lu\n",
|
||||
(ulong) i);
|
||||
@ -277,7 +277,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
if (translog_write_record(&lsn,
|
||||
LOGREC_VARIABLE_RECORD_2LSN_EXAMPLE,
|
||||
trn, NULL, 14 + rec_len,
|
||||
TRANSLOG_INTERNAL_PARTS + 2, parts))
|
||||
TRANSLOG_INTERNAL_PARTS + 2, parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "0 Can't write var reference defore record #%lu\n",
|
||||
(ulong) i);
|
||||
@ -294,7 +294,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
|
||||
trn, NULL, 6,
|
||||
TRANSLOG_INTERNAL_PARTS + 1,
|
||||
parts))
|
||||
parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "Can't write record #%lu\n", (ulong) i);
|
||||
translog_destroy();
|
||||
@ -313,7 +313,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE,
|
||||
trn, NULL, rec_len,
|
||||
TRANSLOG_INTERNAL_PARTS + 1,
|
||||
parts))
|
||||
parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "Can't write variable record #%lu\n", (ulong) i);
|
||||
translog_destroy();
|
||||
|
@ -192,7 +192,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
trn->short_id= 0;
|
||||
if (translog_write_record(&lsn, LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
|
||||
trn, NULL,
|
||||
6, TRANSLOG_INTERNAL_PARTS + 1, parts))
|
||||
6, TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "Can't write record #%lu\n", (ulong) 0);
|
||||
translog_destroy();
|
||||
@ -214,7 +214,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
LOGREC_FIXED_RECORD_1LSN_EXAMPLE,
|
||||
trn, NULL,
|
||||
LSN_STORE_SIZE,
|
||||
TRANSLOG_INTERNAL_PARTS + 1, parts))
|
||||
TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "1 Can't write reference before record #%lu\n",
|
||||
(ulong) i);
|
||||
@ -234,7 +234,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
LOGREC_VARIABLE_RECORD_1LSN_EXAMPLE,
|
||||
trn, NULL, LSN_STORE_SIZE + rec_len,
|
||||
TRANSLOG_INTERNAL_PARTS + 2,
|
||||
parts))
|
||||
parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "1 Can't write var reference before record #%lu\n",
|
||||
(ulong) i);
|
||||
@ -255,7 +255,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
LOGREC_FIXED_RECORD_2LSN_EXAMPLE,
|
||||
trn, NULL, 23,
|
||||
TRANSLOG_INTERNAL_PARTS + 1,
|
||||
parts))
|
||||
parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "0 Can't write reference before record #%lu\n",
|
||||
(ulong) i);
|
||||
@ -276,7 +276,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
LOGREC_VARIABLE_RECORD_2LSN_EXAMPLE,
|
||||
trn, NULL, LSN_STORE_SIZE * 2 + rec_len,
|
||||
TRANSLOG_INTERNAL_PARTS + 2,
|
||||
parts))
|
||||
parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "0 Can't write var reference before record #%lu\n",
|
||||
(ulong) i);
|
||||
@ -293,7 +293,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
if (translog_write_record(&lsn,
|
||||
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
|
||||
trn, NULL, 6,
|
||||
TRANSLOG_INTERNAL_PARTS + 1, parts))
|
||||
TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "Can't write record #%lu\n", (ulong) i);
|
||||
translog_destroy();
|
||||
@ -311,7 +311,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
if (translog_write_record(&lsn,
|
||||
LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE,
|
||||
trn, NULL, rec_len,
|
||||
TRANSLOG_INTERNAL_PARTS + 1, parts))
|
||||
TRANSLOG_INTERNAL_PARTS + 1, parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "Can't write variable record #%lu\n", (ulong) i);
|
||||
translog_destroy();
|
||||
|
@ -137,7 +137,7 @@ void writer(int num)
|
||||
if (translog_write_record(&lsn,
|
||||
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
|
||||
&trn, NULL, 6, TRANSLOG_INTERNAL_PARTS + 1,
|
||||
parts))
|
||||
parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "Can't write LOGREC_FIXED_RECORD_0LSN_EXAMPLE record #%lu "
|
||||
"thread %i\n", (ulong) i, num);
|
||||
@ -154,7 +154,7 @@ void writer(int num)
|
||||
LOGREC_VARIABLE_RECORD_0LSN_EXAMPLE,
|
||||
&trn, NULL,
|
||||
len, TRANSLOG_INTERNAL_PARTS + 1,
|
||||
parts))
|
||||
parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "Can't write variable record #%lu\n", (ulong) i);
|
||||
translog_destroy();
|
||||
@ -303,7 +303,7 @@ int main(int argc __attribute__((unused)),
|
||||
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
|
||||
&dummy_transaction_object, NULL, 6,
|
||||
TRANSLOG_INTERNAL_PARTS + 1,
|
||||
parts))
|
||||
parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "Can't write the first record\n");
|
||||
translog_destroy();
|
||||
|
@ -94,7 +94,7 @@ int main(int argc __attribute__((unused)), char *argv[])
|
||||
LOGREC_FIXED_RECORD_0LSN_EXAMPLE,
|
||||
&dummy_transaction_object, NULL, 6,
|
||||
TRANSLOG_INTERNAL_PARTS + 1,
|
||||
parts))
|
||||
parts, NULL))
|
||||
{
|
||||
fprintf(stderr, "Can't write record #%lu\n", (ulong) 0);
|
||||
translog_destroy();
|
||||
|
Loading…
x
Reference in New Issue
Block a user