Applied InnoDB snapshot innodb-5.0-ss2095

Fixes the following bugs: - Bug #29560: InnoDB >= 5.0.30 hangs on adaptive hash rw-lock 'waiting for an X-lock' Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. - Bug #32125: Database crash due to ha_innodb.cc:3896: ulint convert_search_mode_to_innobase When unknown find_flag is encountered in convert_search_mode_to_innobase() do not call assert(0); instead queue a MySQL error using my_error() and return the error code PAGE_CUR_UNSUPP. Change the functions that call convert_search_mode_to_innobase() to handle that error code by "canceling" execution and returning appropriate error code further upstream. innobase/include/db0err.h: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2091: branches/5.0: Merge r2088 from trunk: log for r2088: Fix Bug#32125 (http://bugs.mysql.com/32125) "Database crash due to ha_innodb.cc:3896: ulint convert_search_mode_to_innobase": When unknown find_flag is encountered in convert_search_mode_to_innobase() do not call assert(0); instead queue a MySQL error using my_error() and return the error code PAGE_CUR_UNSUPP. Change the functions that call convert_search_mode_to_innobase() to handle that error code by "canceling" execution and returning appropriate error code further upstream. Approved by: Heikki innobase/include/os0sync.h: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/include/page0cur.h: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2091: branches/5.0: Merge r2088 from trunk: log for r2088: Fix Bug#32125 (http://bugs.mysql.com/32125) "Database crash due to ha_innodb.cc:3896: ulint convert_search_mode_to_innobase": When unknown find_flag is encountered in convert_search_mode_to_innobase() do not call assert(0); instead queue a MySQL error using my_error() and return the error code PAGE_CUR_UNSUPP. Change the functions that call convert_search_mode_to_innobase() to handle that error code by "canceling" execution and returning appropriate error code further upstream. Approved by: Heikki innobase/include/sync0rw.h: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/include/sync0rw.ic: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/include/sync0sync.ic: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/os/os0sync.c: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/srv/srv0srv.c: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/sync/sync0arr.c: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/sync/sync0rw.c: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki innobase/sync/sync0sync.c: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2082: branches/5.0: bug#29560 Fixed a race condition in the rw_lock where an os_event_reset() can overwrite an earlier os_event_set() triggering an indefinite wait. NOTE: This fix for windows is different from that for other platforms. NOTE2: This bug is introduced in the scalability fix to the sync0arr which was applied to 5.0 only. Therefore, it need not be applied to the 5.1 tree. If we decide to port the scalability fix to 5.1 then this fix should be ported as well. Reviewed by: Heikki sql/ha_innodb.cc: Applied InnoDB snapshot innodb-5.0-ss2095 Revision r2091: branches/5.0: Merge r2088 from trunk: log for r2088: Fix Bug#32125 (http://bugs.mysql.com/32125) "Database crash due to ha_innodb.cc:3896: ulint convert_search_mode_to_innobase": When unknown find_flag is encountered in convert_search_mode_to_innobase() do not call assert(0); instead queue a MySQL error using my_error() and return the error code PAGE_CUR_UNSUPP. Change the functions that call convert_search_mode_to_innobase() to handle that error code by "canceling" execution and returning appropriate error code further upstream. Approved by: Heikki Revision r2095: branches/5.0: Merge r2093 from trunk: convert_search_mode_to_innobase(): Add the missing case label HA_READ_MBR_EQUAL that was forgotten in r2088.
2007-11-20 10:53:19 -07:00 · 2007-11-20 10:53:19 -07:00 · a3dc40e24a
commit a3dc40e24a
parent 49934f490a
12 changed files with 268 additions and 48 deletions
--- a/innobase/include/db0err.h
+++ b/innobase/include/db0err.h
@ -57,6 +57,18 @@ Created 5/24/1996 Heikki Tuuri
 					buffer pool (for big transactions,
 					InnoDB stores the lock structs in the
 					buffer pool) */
+#define DB_FOREIGN_DUPLICATE_KEY 46	/* foreign key constraints
+					activated by the operation would
+					lead to a duplicate key in some
+					table */
+#define DB_TOO_MANY_CONCURRENT_TRXS 47	/* when InnoDB runs out of the
+					preconfigured undo slots, this can
+					only happen when there are too many
+					concurrent transactions */
+#define DB_UNSUPPORTED		48	/* when InnoDB sees any artefact or
+					a feature that it can't recoginize or
+					work with e.g., FT indexes created by
+					a later version of the engine. */

 /* The following are partial failure codes */
 #define DB_FAIL 		1000
--- a/innobase/include/os0sync.h
+++ b/innobase/include/os0sync.h
@ -112,9 +112,13 @@ os_event_set(
 	os_event_t	event);	/* in: event to set */
 /**************************************************************
 Resets an event semaphore to the nonsignaled state. Waiting threads will
-stop to wait for the event. */
+stop to wait for the event.
+The return value should be passed to os_even_wait_low() if it is desired
+that this thread should not wait in case of an intervening call to
+os_event_set() between this os_event_reset() and the
+os_event_wait_low() call. See comments for os_event_wait_low(). */

-void
+ib_longlong
 os_event_reset(
 /*===========*/
 	os_event_t	event);	/* in: event to reset */
@ -125,16 +129,38 @@ void
 os_event_free(
 /*==========*/
 	os_event_t	event);	/* in: event to free */
+
 /**************************************************************
 Waits for an event object until it is in the signaled state. If
 srv_shutdown_state == SRV_SHUTDOWN_EXIT_THREADS this also exits the
 waiting thread when the event becomes signaled (or immediately if the
-event is already in the signaled state). */
+event is already in the signaled state).
+
+Typically, if the event has been signalled after the os_event_reset()
+we'll return immediately because event->is_set == TRUE.
+There are, however, situations (e.g.: sync_array code) where we may
+lose this information. For example:
+
+thread A calls os_event_reset()
+thread B calls os_event_set()   [event->is_set == TRUE]
+thread C calls os_event_reset() [event->is_set == FALSE]
+thread A calls os_event_wait()  [infinite wait!]
+thread C calls os_event_wait()  [infinite wait!]
+
+Where such a scenario is possible, to avoid infinite wait, the
+value returned by os_event_reset() should be passed in as
+reset_sig_count. */
+
+#define os_event_wait(event) os_event_wait_low((event), 0)

 void
-os_event_wait(
-/*==========*/
-	os_event_t	event);	/* in: event to wait */
+os_event_wait_low(
+/*==============*/
+	os_event_t	event,		/* in: event to wait */
+	ib_longlong	reset_sig_count);/* in: zero or the value
+					returned by previous call of
+					os_event_reset(). */
+
 /**************************************************************
 Waits for an event object until it is in the signaled state or
 a timeout is exceeded. In Unix the timeout is always infinite. */
--- a/innobase/include/page0cur.h
+++ b/innobase/include/page0cur.h
@ -22,6 +22,7 @@ Created 10/4/1994 Heikki Tuuri

 /* Page cursor search modes; the values must be in this order! */

+#define	PAGE_CUR_UNSUPP	0
 #define	PAGE_CUR_G	1
 #define	PAGE_CUR_GE	2
 #define	PAGE_CUR_L	3
--- a/innobase/include/sync0rw.h
+++ b/innobase/include/sync0rw.h
@ -418,6 +418,17 @@ field. Then no new readers are allowed in. */

 struct rw_lock_struct {
 	os_event_t	event;	/* Used by sync0arr.c for thread queueing */
+
+#ifdef __WIN__
+	os_event_t	wait_ex_event;	/* This windows specific event is
+				used by the thread which has set the
+				lock state to RW_LOCK_WAIT_EX. The
+				rw_lock design guarantees that this
+				thread will be the next one to proceed
+				once the current the event gets
+				signalled. See LEMMA 2 in sync0sync.c */
+#endif
+
 	ulint	reader_count;	/* Number of readers who have locked this
 				lock in the shared mode */
 	ulint	writer; 	/* This field is set to RW_LOCK_EX if there
--- a/innobase/include/sync0rw.ic
+++ b/innobase/include/sync0rw.ic
@ -382,6 +382,9 @@ rw_lock_s_unlock_func(
 	mutex_exit(mutex);

 	if (UNIV_UNLIKELY(sg)) {
+#ifdef __WIN__
+		os_event_set(lock->wait_ex_event);
+#endif
 		os_event_set(lock->event);
 		sync_array_object_signalled(sync_primary_wait_array);
 	}
@ -463,6 +466,9 @@ rw_lock_x_unlock_func(
 	mutex_exit(&(lock->mutex));

 	if (UNIV_UNLIKELY(sg)) {
+#ifdef __WIN__
+		os_event_set(lock->wait_ex_event);
+#endif
 		os_event_set(lock->event);
 		sync_array_object_signalled(sync_primary_wait_array);
 	}
--- a/innobase/include/sync0sync.ic
+++ b/innobase/include/sync0sync.ic
@ -207,7 +207,7 @@ mutex_exit(
 	perform the read first, which could leave a waiting
 	thread hanging indefinitely.

-	Our current solution call every 10 seconds
+	Our current solution call every second
 	sync_arr_wake_threads_if_sema_free()
 	to wake up possible hanging threads if
 	they are missed in mutex_signal_object. */
--- a/innobase/os/os0sync.c
+++ b/innobase/os/os0sync.c
@ -151,7 +151,14 @@ os_event_create(
 	ut_a(0 == pthread_cond_init(&(event->cond_var), NULL));
 #endif
 	event->is_set = FALSE;
-	event->signal_count = 0;
+
+	/* We return this value in os_event_reset(), which can then be
+	be used to pass to the os_event_wait_low(). The value of zero
+	is reserved in os_event_wait_low() for the case when the
+	caller does not want to pass any signal_count value. To
+	distinguish between the two cases we initialize signal_count
+	to 1 here. */
+	event->signal_count = 1;
 #endif /* __WIN__ */

 	/* The os_sync_mutex can be NULL because during startup an event
@ -244,13 +251,20 @@ os_event_set(

 /**************************************************************
 Resets an event semaphore to the nonsignaled state. Waiting threads will
-stop to wait for the event. */
+stop to wait for the event.
+The return value should be passed to os_even_wait_low() if it is desired
+that this thread should not wait in case of an intervening call to
+os_event_set() between this os_event_reset() and the
+os_event_wait_low() call. See comments for os_event_wait_low(). */

-void
+ib_longlong
 os_event_reset(
 /*===========*/
+				/* out: current signal_count. */
 	os_event_t	event)	/* in: event to reset */
 {
+	ib_longlong	ret = 0;
+
 #ifdef __WIN__
 	ut_a(event);

@ -265,9 +279,11 @@ os_event_reset(
 	} else {
 		event->is_set = FALSE;
 	}
+	ret = event->signal_count;

 	os_fast_mutex_unlock(&(event->os_mutex));
 #endif
+	return(ret);
 }

 /**************************************************************
@ -335,18 +351,38 @@ os_event_free(
 Waits for an event object until it is in the signaled state. If
 srv_shutdown_state == SRV_SHUTDOWN_EXIT_THREADS this also exits the
 waiting thread when the event becomes signaled (or immediately if the
-event is already in the signaled state). */
+event is already in the signaled state).
+
+Typically, if the event has been signalled after the os_event_reset()
+we'll return immediately because event->is_set == TRUE.
+There are, however, situations (e.g.: sync_array code) where we may
+lose this information. For example:
+
+thread A calls os_event_reset()
+thread B calls os_event_set()   [event->is_set == TRUE]
+thread C calls os_event_reset() [event->is_set == FALSE]
+thread A calls os_event_wait()  [infinite wait!]
+thread C calls os_event_wait()  [infinite wait!]
+
+Where such a scenario is possible, to avoid infinite wait, the
+value returned by os_event_reset() should be passed in as
+reset_sig_count. */

 void
-os_event_wait(
-/*==========*/
-	os_event_t	event)	/* in: event to wait */
+os_event_wait_low(
+/*==============*/
+	os_event_t	event,		/* in: event to wait */
+	ib_longlong	reset_sig_count)/* in: zero or the value
+					returned by previous call of
+					os_event_reset(). */
 {
 #ifdef __WIN__
 	DWORD	err;

 	ut_a(event);

+	UT_NOT_USED(reset_sig_count);
+
 	/* Specify an infinite time limit for waiting */
 	err = WaitForSingleObject(event->handle, INFINITE);

@ -360,7 +396,11 @@ os_event_wait(

 	os_fast_mutex_lock(&(event->os_mutex));

-	old_signal_count = event->signal_count;
+	if (reset_sig_count) {
+		old_signal_count = reset_sig_count;
+	} else {
+		old_signal_count = event->signal_count;
+	}

 	for (;;) {
 		if (event->is_set == TRUE
--- a/innobase/srv/srv0srv.c
+++ b/innobase/srv/srv0srv.c
@ -1881,12 +1881,6 @@ loop:

 	os_thread_sleep(1000000);

-	/* In case mutex_exit is not a memory barrier, it is
-	theoretically possible some threads are left waiting though
-	the semaphore is already released. Wake up those threads: */
-	
-	sync_arr_wake_threads_if_sema_free();
-
 	current_time = time(NULL);

 	time_elapsed = difftime(current_time, last_monitor_time);
@ -2083,9 +2077,15 @@ loop:
 		srv_refresh_innodb_monitor_stats();
 	}

+	/* In case mutex_exit is not a memory barrier, it is
+	theoretically possible some threads are left waiting though
+	the semaphore is already released. Wake up those threads: */
+	
+	sync_arr_wake_threads_if_sema_free();
+
 	if (sync_array_print_long_waits()) {
 		fatal_cnt++;
-		if (fatal_cnt > 5) {
+		if (fatal_cnt > 10) {

 			fprintf(stderr,
 "InnoDB: Error: semaphore wait has lasted > %lu seconds\n"
@ -2103,7 +2103,7 @@ loop:

 	fflush(stderr);

-	os_thread_sleep(2000000);
+	os_thread_sleep(1000000);

 	if (srv_shutdown_state < SRV_SHUTDOWN_CLEANUP) {

--- a/innobase/sync/sync0arr.c
+++ b/innobase/sync/sync0arr.c
@ -40,7 +40,15 @@ because we can do with a very small number of OS events,
 say 200. In NT 3.51, allocating events seems to be a quadratic
 algorithm, because 10 000 events are created fast, but
 100 000 events takes a couple of minutes to create.
-*/
+
+As of 5.0.30 the above mentioned design is changed. Since now
+OS can handle millions of wait events efficiently, we no longer
+have this concept of each cell of wait array having one event.
+Instead, now the event that a thread wants to wait on is embedded
+in the wait object (mutex or rw_lock). We still keep the global
+wait array for the sake of diagnostics and also to avoid infinite
+wait The error_monitor thread scans the global wait array to signal
+any waiting threads who have missed the signal. */

 /* A cell where an individual thread may wait suspended
 until a resource is released. The suspending is implemented
@ -62,6 +70,14 @@ struct sync_cell_struct {
 	ibool		waiting;	/* TRUE if the thread has already
 					called sync_array_event_wait
 					on this cell */
+	ib_longlong	signal_count;	/* We capture the signal_count
+					of the wait_object when we
+					reset the event. This value is
+					then passed on to os_event_wait
+					and we wait only if the event
+					has not been signalled in the
+					period between the reset and
+					wait call. */
 	time_t		reservation_time;/* time when the thread reserved
 					the wait cell */
 };
@ -216,6 +232,7 @@ sync_array_create(
 		cell = sync_array_get_nth_cell(arr, i);        	
                cell->wait_object = NULL;
 		cell->waiting = FALSE;
+		cell->signal_count = 0;
 	}

 	return(arr);
@ -282,16 +299,23 @@ sync_array_validate(
 /***********************************************************************
 Puts the cell event in reset state. */
 static
-void
+ib_longlong
 sync_cell_event_reset(
 /*==================*/
+				/* out: value of signal_count
+				at the time of reset. */
 	ulint		type,	/* in: lock type mutex/rw_lock */
 	void*		object) /* in: the rw_lock/mutex object */
 {
 	if (type == SYNC_MUTEX) {
-		os_event_reset(((mutex_t *) object)->event);
+		return(os_event_reset(((mutex_t *) object)->event));
+#ifdef __WIN__
+	} else if (type == RW_LOCK_WAIT_EX) {
+		return(os_event_reset(
+		       ((rw_lock_t *) object)->wait_ex_event));
+#endif
 	} else {
-		os_event_reset(((rw_lock_t *) object)->event);
+		return(os_event_reset(((rw_lock_t *) object)->event));
 	}
 }		

@ -345,8 +369,11 @@ sync_array_reserve_cell(

 			sync_array_exit(arr);

-			/* Make sure the event is reset */
-			sync_cell_event_reset(type, object);
+			/* Make sure the event is reset and also store
+			the value of signal_count at which the event
+			was reset. */
+			cell->signal_count = sync_cell_event_reset(type,
+								object);

 			cell->reservation_time = time(NULL);

@ -388,7 +415,14 @@ sync_array_wait_event(

 	if (cell->request_type == SYNC_MUTEX) {
 		event = ((mutex_t*) cell->wait_object)->event;
-	} else {
+#ifdef __WIN__
+	/* On windows if the thread about to wait is the one which
+	has set the state of the rw_lock to RW_LOCK_WAIT_EX, then
+	it waits on a special event i.e.: wait_ex_event. */
+	} else if (cell->request_type == RW_LOCK_WAIT_EX) {
+		event = ((rw_lock_t*) cell->wait_object)->wait_ex_event;
+#endif
+	} else {	
 		event = ((rw_lock_t*) cell->wait_object)->event;
 	}

@ -413,7 +447,7 @@ sync_array_wait_event(
 #endif
        sync_array_exit(arr);

-        os_event_wait(event);
+        os_event_wait_low(event, cell->signal_count);

        sync_array_free_cell(arr, index);
 }
@ -457,7 +491,11 @@ sync_array_cell_print(
 #endif /* UNIV_SYNC_DEBUG */
 			(ulong) mutex->waiters);

-	} else if (type == RW_LOCK_EX || type == RW_LOCK_SHARED) {
+	} else if (type == RW_LOCK_EX
+#ifdef __WIN__
+		   || type == RW_LOCK_WAIT_EX
+#endif
+		   || type == RW_LOCK_SHARED) {

 		fputs(type == RW_LOCK_EX ? "X-lock on" : "S-lock on", file);

@ -638,7 +676,8 @@ sync_array_detect_deadlock(

 		return(FALSE); /* No deadlock */

-	} else if (cell->request_type == RW_LOCK_EX) {
+	} else if (cell->request_type == RW_LOCK_EX
+		   || cell->request_type == RW_LOCK_WAIT_EX) {

 	    lock = cell->wait_object;

@ -734,7 +773,8 @@ sync_arr_cell_can_wake_up(
 			return(TRUE);
 		}

-	} else if (cell->request_type == RW_LOCK_EX) {
+	} else if (cell->request_type == RW_LOCK_EX
+		   || cell->request_type == RW_LOCK_WAIT_EX) {

 	    	lock = cell->wait_object;

@ -783,6 +823,7 @@ sync_array_free_cell(

 	cell->waiting = FALSE;
 	cell->wait_object =  NULL;
+	cell->signal_count = 0;

 	ut_a(arr->n_reserved > 0);
 	arr->n_reserved--;
@ -839,6 +880,14 @@ sync_arr_wake_threads_if_sema_free(void)

 					mutex = cell->wait_object;
 					os_event_set(mutex->event);
+#ifdef __WIN__
+				} else if (cell->request_type
+					   == RW_LOCK_WAIT_EX) {
+					rw_lock_t*	lock;
+
+					lock = cell->wait_object;
+					os_event_set(lock->wait_ex_event);
+#endif
 				} else {
 					rw_lock_t*	lock;

--- a/innobase/sync/sync0rw.c
+++ b/innobase/sync/sync0rw.c
@ -132,6 +132,10 @@ rw_lock_create_func(
 	lock->last_x_line = 0;
 	lock->event = os_event_create(NULL);

+#ifdef __WIN__
+	lock->wait_ex_event = os_event_create(NULL);
+#endif
+
 	mutex_enter(&rw_lock_list_mutex);
 	
 	if (UT_LIST_GET_LEN(rw_lock_list) > 0) {
@ -168,6 +172,10 @@ rw_lock_free(
 	mutex_enter(&rw_lock_list_mutex);
 	os_event_free(lock->event);

+#ifdef __WIN__
+	os_event_free(lock->wait_ex_event);
+#endif
+
 	if (UT_LIST_GET_PREV(list, lock)) {
 		ut_a(UT_LIST_GET_PREV(list, lock)->magic_n == RW_LOCK_MAGIC_N);
 	}
@ -521,7 +529,15 @@ lock_loop:
 	rw_x_system_call_count++;

        sync_array_reserve_cell(sync_primary_wait_array,
-				lock, RW_LOCK_EX,
+				lock,
+#ifdef __WIN__
+				/* On windows RW_LOCK_WAIT_EX signifies
+				that this thread should wait on the
+				special wait_ex_event. */
+				(state == RW_LOCK_WAIT_EX)
+				 ? RW_LOCK_WAIT_EX :
+#endif
+				RW_LOCK_EX,
 				file_name, line,
 				&index);

--- a/innobase/sync/sync0sync.c
+++ b/innobase/sync/sync0sync.c
@ -95,17 +95,47 @@ have happened that the thread which was holding the mutex has just released
 it and did not see the waiters byte set to 1, a case which would lead the
 other thread to an infinite wait.

-LEMMA 1: After a thread resets the event of the cell it reserves for waiting
-========
-for a mutex, some thread will eventually call sync_array_signal_object with
-the mutex as an argument. Thus no infinite wait is possible.
+LEMMA 1: After a thread resets the event of a mutex (or rw_lock), some
+=======
+thread will eventually call os_event_set() on that particular event.
+Thus no infinite wait is possible in this case.

 Proof:	After making the reservation the thread sets the waiters field in the
 mutex to 1. Then it checks that the mutex is still reserved by some thread,
 or it reserves the mutex for itself. In any case, some thread (which may be
 also some earlier thread, not necessarily the one currently holding the mutex)
 will set the waiters field to 0 in mutex_exit, and then call
-sync_array_signal_object with the mutex as an argument. 
+os_event_set() with the mutex as an argument. 
+Q.E.D.
+
+LEMMA 2: If an os_event_set() call is made after some thread has called
+=======
+the os_event_reset() and before it starts wait on that event, the call
+will not be lost to the second thread. This is true even if there is an
+intervening call to os_event_reset() by another thread.
+Thus no infinite wait is possible in this case.
+
+Proof (non-windows platforms): os_event_reset() returns a monotonically
+increasing value of signal_count. This value is increased at every
+call of os_event_set() If thread A has called os_event_reset() followed
+by thread B calling os_event_set() and then some other thread C calling
+os_event_reset(), the is_set flag of the event will be set to FALSE;
+but now if thread A calls os_event_wait_low() with the signal_count
+value returned from the earlier call of os_event_reset(), it will
+return immediately without waiting.
+Q.E.D.
+
+Proof (windows): If there is a writer thread which is forced to wait for
+the lock, it may be able to set the state of rw_lock to RW_LOCK_WAIT_EX
+The design of rw_lock ensures that there is one and only one thread
+that is able to change the state to RW_LOCK_WAIT_EX and this thread is
+guaranteed to acquire the lock after it is released by the current
+holders and before any other waiter gets the lock.
+On windows this thread waits on a separate event i.e.: wait_ex_event.
+Since only one thread can wait on this event there is no chance
+of this event getting reset before the writer starts wait on it.
+Therefore, this thread is guaranteed to catch the os_set_event()
+signalled unconditionally at the release of the lock.
 Q.E.D. */

 ulint	sync_dummy			= 0;
--- a/sql/ha_innodb.cc
+++ b/sql/ha_innodb.cc
@ -522,6 +522,9 @@ convert_error_code_to_mysql(
                mark_transaction_to_rollback(thd, TRUE);

    		return(HA_ERR_LOCK_TABLE_FULL);
+	} else if (error == DB_UNSUPPORTED) {
+
+		return(HA_ERR_UNSUPPORTED);
    	} else {
    		return(-1);			// Unknown error
    	}
@ -3713,11 +3716,22 @@ convert_search_mode_to_innobase(
 		  and comparison of non-latin1 char type fields in
 		  innobase_mysql_cmp() to get PAGE_CUR_LE_OR_EXTENDS to
 		  work correctly. */
-
-		default:			assert(0);
+		case HA_READ_MBR_CONTAIN:
+		case HA_READ_MBR_INTERSECT:
+		case HA_READ_MBR_WITHIN:
+		case HA_READ_MBR_DISJOINT:
+		case HA_READ_MBR_EQUAL:
+			my_error(ER_TABLE_CANT_HANDLE_SPKEYS, MYF(0));
+			return(PAGE_CUR_UNSUPP);
+		/* do not use "default:" in order to produce a gcc warning:
+		enumeration value '...' not handled in switch
+		(if -Wswitch or -Wall is used)
+		*/
 	}

-	return(0);
+	my_error(ER_CHECK_NOT_IMPLEMENTED, MYF(0), "this functionality");
+
+	return(PAGE_CUR_UNSUPP);
 }

 /*
@ -3855,11 +3869,18 @@ ha_innobase::index_read(

 	last_match_mode = (uint) match_mode;

-	innodb_srv_conc_enter_innodb(prebuilt->trx);
+	if (mode != PAGE_CUR_UNSUPP) {

-	ret = row_search_for_mysql((byte*) buf, mode, prebuilt, match_mode, 0);
+		innodb_srv_conc_enter_innodb(prebuilt->trx);

-	innodb_srv_conc_exit_innodb(prebuilt->trx);
+		ret = row_search_for_mysql((byte*) buf, mode, prebuilt,
+					   match_mode, 0);
+
+		innodb_srv_conc_exit_innodb(prebuilt->trx);
+	} else {
+
+		ret = DB_UNSUPPORTED;
+	}

 	if (ret == DB_SUCCESS) {
 		error = 0;
@ -5174,8 +5195,16 @@ ha_innobase::records_in_range(
 	mode2 = convert_search_mode_to_innobase(max_key ? max_key->flag :
                                                HA_READ_KEY_EXACT);

-	n_rows = btr_estimate_n_rows_in_range(index, range_start,
-						mode1, range_end, mode2);
+	if (mode1 != PAGE_CUR_UNSUPP && mode2 != PAGE_CUR_UNSUPP) {
+
+		n_rows = btr_estimate_n_rows_in_range(index, range_start,
+						      mode1, range_end,
+						      mode2);
+	} else {
+
+		n_rows = 0;
+	}
+
 	dtuple_free_for_mysql(heap1);
 	dtuple_free_for_mysql(heap2);