Merge branch 'merge/merge-pcre' into 10.0

2015-12-13 16:25:57 +01:00 · 2015-12-13 16:25:57 +01:00 · 095b7b92d1
commit 095b7b92d1
parent 359ae59ac0 e7591a1ba9
39 changed files with 2224 additions and 1263 deletions
--- a/pcre/ChangeLog
+++ b/pcre/ChangeLog
@ -1,6 +1,182 @@
 ChangeLog for PCRE
 ------------------
 Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
 development is happening in the PCRE2 10.xx series.
 Version 8.38 23-November-2015
 -----------------------------
 1.  If a group that contained a recursive back reference also contained a
    forward reference subroutine call followed by a non-forward-reference
    subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
    compile correct code, leading to undefined behaviour or an internally
    detected error. This bug was discovered by the LLVM fuzzer.
 2.  Quantification of certain items (e.g. atomic back references) could cause
    incorrect code to be compiled when recursive forward references were
    involved. For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/.
    This bug was discovered by the LLVM fuzzer.
 3.  A repeated conditional group whose condition was a reference by name caused
    a buffer overflow if there was more than one group with the given name.
    This bug was discovered by the LLVM fuzzer.
 4.  A recursive back reference by name within a group that had the same name as
    another group caused a buffer overflow. For example:
    /(?J)(?'d'(?'d'\g{d}))/. This bug was discovered by the LLVM fuzzer.
 5.  A forward reference by name to a group whose number is the same as the
    current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused
    a buffer overflow at compile time. This bug was discovered by the LLVM
    fuzzer.
 6.  A lookbehind assertion within a set of mutually recursive subpatterns could
    provoke a buffer overflow. This bug was discovered by the LLVM fuzzer.
 7.  Another buffer overflow bug involved duplicate named groups with a
    reference between their definition, with a group that reset capture
    numbers, for example: /(?J:(?|(?'R')(\k'R')|((?'R'))))/. This has been
    fixed by always allowing for more memory, even if not needed. (A proper fix
    is implemented in PCRE2, but it involves more refactoring.)
 8.  There was no check for integer overflow in subroutine calls such as (?123).
 9.  The table entry for \l in EBCDIC environments was incorrect, leading to its
    being treated as a literal 'l' instead of causing an error.
 10. There was a buffer overflow if pcre_exec() was called with an ovector of
    size 1. This bug was found by american fuzzy lop.
 11. If a non-capturing group containing a conditional group that could match
    an empty string was repeated, it was not identified as matching an empty
    string itself. For example: /^(?:(?(1)x|)+)+$()/.
 12. In an EBCDIC environment, pcretest was mishandling the escape sequences
    \a and \e in test subject lines.
 13. In an EBCDIC environment, \a in a pattern was converted to the ASCII
    instead of the EBCDIC value.
 14. The handling of \c in an EBCDIC environment has been revised so that it is
    now compatible with the specification in Perl's perlebcdic page.
 15. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
    ASCII/Unicode. This has now been added to the list of characters that are
    recognized as white space in EBCDIC.
 16. When PCRE was compiled without UCP support, the use of \p and \P gave an
    error (correctly) when used outside a class, but did not give an error
    within a class.
 17. \h within a class was incorrectly compiled in EBCDIC environments.
 18. A pattern with an unmatched closing parenthesis that contained a backward
    assertion which itself contained a forward reference caused buffer
    overflow. And example pattern is: /(?=di(?<=(?1))|(?=(.))))/.
 19. JIT should return with error when the compiled pattern requires more stack
    space than the maximum.
 20. A possessively repeated conditional group that could match an empty string,
    for example, /(?(R))*+/, was incorrectly compiled.
 21. Fix infinite recursion in the JIT compiler when certain patterns such as
    /(?:|a|){100}x/ are analysed.
 22. Some patterns with character classes involving [: and \\ were incorrectly
    compiled and could cause reading from uninitialized memory or an incorrect
    error diagnosis.
 23. Pathological patterns containing many nested occurrences of [: caused
    pcre_compile() to run for a very long time.
 24. A conditional group with only one branch has an implicit empty alternative
    branch and must therefore be treated as potentially matching an empty
    string.
 25. If (?R was followed by - or + incorrect behaviour happened instead of a
    diagnostic.
 26. Arrange to give up on finding the minimum matching length for overly
    complex patterns.
 27. Similar to (4) above: in a pattern with duplicated named groups and an
    occurrence of (?| it is possible for an apparently non-recursive back
    reference to become recursive if a later named group with the relevant
    number is encountered. This could lead to a buffer overflow. Wen Guanxing
    from Venustech ADLAB discovered this bug.
 28. If pcregrep was given the -q option with -c or -l, or when handling a
    binary file, it incorrectly wrote output to stdout.
 29. The JIT compiler did not restore the control verb head in case of *THEN
    control verbs. This issue was found by Karl Skomski with a custom LLVM
    fuzzer.
 30. Error messages for syntax errors following \g and \k were giving inaccurate
    offsets in the pattern.
 31. Added a check for integer overflow in conditions (?(<digits>) and
    (?(R<digits>). This omission was discovered by Karl Skomski with the LLVM
    fuzzer.
 32. Handling recursive references such as (?2) when the reference is to a group
    later in the pattern uses code that is very hacked about and error-prone.
    It has been re-written for PCRE2. Here in PCRE1, a check has been added to
    give an internal error if it is obvious that compiling has gone wrong.
 33. The JIT compiler should not check repeats after a {0,1} repeat byte code.
    This issue was found by Karl Skomski with a custom LLVM fuzzer.
 34. The JIT compiler should restore the control chain for empty possessive
    repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer.
 35. Match limit check added to JIT recursion. This issue was found by Karl
    Skomski with a custom LLVM fuzzer.
 36. Yet another case similar to 27 above has been circumvented by an
    unconditional allocation of extra memory. This issue is fixed "properly" in
    PCRE2 by refactoring the way references are handled. Wen Guanxing
    from Venustech ADLAB discovered this bug.
 37. Fix two assertion fails in JIT. These issues were found by Karl Skomski
    with a custom LLVM fuzzer.
 38. Fixed a corner case of range optimization in JIT.
 39. An incorrect error "overran compiling workspace" was given if there were
    exactly enough group forward references such that the last one extended
    into the workspace safety margin. The next one would have expanded the
    workspace. The test for overflow was not including the safety margin.
 40. A match limit issue is fixed in JIT which was found by Karl Skomski
    with a custom LLVM fuzzer.
 41. Remove the use of /dev/null in testdata/testinput2, because it doesn't
    work under Windows. (Why has it taken so long for anyone to notice?)
 42. In a character class such as [\W\p{Any}] where both a negative-type escape
    ("not a word character") and a property escape were present, the property
    escape was being ignored.
 43. Fix crash caused by very long (*MARK) or (*THEN) names.
 44. A sequence such as [[:punct:]b] that is, a POSIX character class followed
    by a single ASCII character in a class item, was incorrectly compiled in
    UCP mode. The POSIX class got lost, but only if the single character
    followed it.
 45. [:punct:] in UCP mode was matching some characters in the range 128-255
    that should not have been matched.
 46. If [:^ascii:] or [:^xdigit:] or [:^cntrl:] are present in a non-negated
    class, all characters with code points greater than 255 are in the class.
    When a Unicode property was also in the class (if PCRE_UCP is set, escapes
    such as \w are turned into Unicode properties), wide characters were not
    correctly handled, and could fail to match.
 Version 8.37 28-April-2015
 --------------------------
--- a/pcre/NEWS
+++ b/pcre/NEWS
@ -1,6 +1,14 @@
 News about PCRE releases
 ------------------------
 Release 8.38 23-November-2015
 -----------------------------
 This is bug-fix release. Note that this library (now called PCRE1) is now being
 maintained for bug fixes only. New projects are advised to use the new PCRE2
 libraries.
 Release 8.37 28-April-2015
 --------------------------
--- a/pcre/NON-AUTOTOOLS-BUILD
+++ b/pcre/NON-AUTOTOOLS-BUILD
@ -764,9 +764,9 @@ required. For details, please see this web site:
  http://www.zaconsultants.net
-There is also a mirror here:
+You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
-
+executable, is in EBCDIC and native z/OS file formats and this is the
-  http://www.vsoft-software.com/downloads.html
+recommended download site.
 ==========================
-Last Updated: 10 February 2015
+Last Updated: 25 June 2015
--- a/pcre/RunGrepTest
+++ b/pcre/RunGrepTest
@ -512,6 +512,14 @@ echo "aaaaa" >>testtemp1grep
 (cd $srcdir; $valgrind $pcregrep  --line-offsets '(?<=\Ka)' $builddir/testtemp1grep) >>testtrygrep 2>&1
 echo "RC=$?" >>testtrygrep
 echo "---------------------------- Test 108 ------------------------------" >>testtrygrep
 (cd $srcdir; $valgrind $pcregrep -lq PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep
 echo "RC=$?" >>testtrygrep
 echo "---------------------------- Test 109 -----------------------------" >>testtrygrep
 (cd $srcdir; $valgrind $pcregrep -cq lazy ./testdata/grepinput*) >>testtrygrep
 echo "RC=$?" >>testtrygrep
 # Now compare the results.
 $cf $srcdir/testdata/grepoutput testtrygrep
--- a/pcre/configure.ac
+++ b/pcre/configure.ac
@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
 dnl be defined as -RC2, for example. For real releases, it should be empty.
 m4_define(pcre_major, [8])
-m4_define(pcre_minor, [37])
+m4_define(pcre_minor, [38])
 m4_define(pcre_prerelease, [])
-m4_define(pcre_date, [2015-04-28])
+m4_define(pcre_date, [2015-11-23])
 # NOTE: The CMakeLists.txt file searches for the above variables in the first
 # 50 lines of this file. Please update that if the variables above are moved.
 # Libtool shared library interface versions (current:revision:age)
-m4_define(libpcre_version, [3:5:2])
+m4_define(libpcre_version, [3:6:2])
-m4_define(libpcre16_version, [2:5:2])
+m4_define(libpcre16_version, [2:6:2])
-m4_define(libpcre32_version, [0:5:0])
+m4_define(libpcre32_version, [0:6:0])
 m4_define(libpcreposix_version, [0:3:0])
 m4_define(libpcrecpp_version, [0:1:0])
--- a/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt
+++ b/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt
@ -764,9 +764,9 @@ required. For details, please see this web site:
  http://www.zaconsultants.net
-There is also a mirror here:
+You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
-
+executable, is in EBCDIC and native z/OS file formats and this is the
-  http://www.vsoft-software.com/downloads.html
+recommended download site.
 ==========================
-Last Updated: 10 February 2015
+Last Updated: 25 June 2015
--- a/pcre/doc/html/pcrepattern.html
+++ b/pcre/doc/html/pcrepattern.html
@ -329,7 +329,8 @@ A second use of backslash provides a way of encoding non-printing characters
 in patterns in a visible manner. There is no restriction on the appearance of
 non-printing characters, apart from the binary zero that terminates a pattern,
 but when a pattern is being prepared by text editing, it is often easier to use
-one of the following escape sequences than the binary character it represents:
+one of the following escape sequences than the binary character it represents.
 In an ASCII or Unicode environment, these escapes are as follows:
 <pre>
  \a        alarm, that is, the BEL character (hex 07)
  \cx       "control-x", where x is any ASCII character
@ -353,19 +354,33 @@ data item (byte or 16-bit value) following \c has a value greater than 127, a
 compile-time error occurs. This locks out non-ASCII characters in all modes.
 </P>
 <P>
-The \c facility was designed for use with ASCII characters, but with the
+When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
-extension to Unicode it is even less useful than it once was. It is, however,
+generate the appropriate EBCDIC code values. The \c escape is processed
-recognized when PCRE is compiled in EBCDIC mode, where data items are always
+as specified for Perl in the <b>perlebcdic</b> document. The only characters
-bytes. In this mode, all values are valid after \c. If the next character is a
+that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
-lower case letter, it is converted to upper case. Then the 0xc0 bits of the
+other character provokes a compile-time error. The sequence \@ encodes
-byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because
+character code 0; the letters (in either case) encode characters 1-26 (hex 01
-the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other
+to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
-characters also generate different values.
+\? becomes either 255 (hex FF) or 95 (hex 5F).
 </P>
 <P>
 Thus, apart from \?, these escapes generate the same character code values as
 they do in an ASCII environment, though the meanings of the values mostly
 differ. For example, \G always generates code value 7, which is BEL in ASCII
 but DEL in EBCDIC.
 </P>
 <P>
 The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
 because 127 is not a control character in EBCDIC, Perl makes it generate the
 APC character. Unfortunately, there are several variants of EBCDIC. In most of
 them the APC character has the value 255 (hex FF), but in the one Perl calls
 POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
 values, PCRE makes \? generate 95; otherwise it generates 255.
 </P>
 <P>
 After \0 up to two further octal digits are read. If there are fewer than two
-digits, just those that are present are used. Thus the sequence \0\x\07
+digits, just those that are present are used. Thus the sequence \0\x\015
-specifies two binary zeros followed by a BEL character (code value 7). Make
+specifies two binary zeros followed by a CR character (code value 13). Make
 sure you supply two digits after the initial zero if the pattern character that
 follows is itself an octal digit.
 </P>
@ -3249,9 +3264,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 08 January 2014
+Last updated: 14 June 2015
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2015 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
--- a/pcre/doc/pcre.txt
+++ b/pcre/doc/pcre.txt
--- a/pcre/doc/pcrepattern.3
+++ b/pcre/doc/pcrepattern.3
@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "08 January 2014" "PCRE 8.35"
+.TH PCREPATTERN 3 "14 June 2015" "PCRE 8.38"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"
@ -308,7 +308,8 @@ A second use of backslash provides a way of encoding non-printing characters
 in patterns in a visible manner. There is no restriction on the appearance of
 non-printing characters, apart from the binary zero that terminates a pattern,
 but when a pattern is being prepared by text editing, it is often easier to use
-one of the following escape sequences than the binary character it represents:
+one of the following escape sequences than the binary character it represents.
 In an ASCII or Unicode environment, these escapes are as follows:
 .sp
  \ea        alarm, that is, the BEL character (hex 07)
  \ecx       "control-x", where x is any ASCII character
@ -331,18 +332,30 @@ but \ec{ becomes hex 3B ({ is 7B), and \ec; becomes hex 7B (; is 3B). If the
 data item (byte or 16-bit value) following \ec has a value greater than 127, a
 compile-time error occurs. This locks out non-ASCII characters in all modes.
 .P
-The \ec facility was designed for use with ASCII characters, but with the
+When PCRE is compiled in EBCDIC mode, \ea, \ee, \ef, \en, \er, and \et
-extension to Unicode it is even less useful than it once was. It is, however,
+generate the appropriate EBCDIC code values. The \ec escape is processed
-recognized when PCRE is compiled in EBCDIC mode, where data items are always
+as specified for Perl in the \fBperlebcdic\fP document. The only characters
-bytes. In this mode, all values are valid after \ec. If the next character is a
+that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ], ^, _, or ?. Any
-lower case letter, it is converted to upper case. Then the 0xc0 bits of the
+other character provokes a compile-time error. The sequence \e@ encodes
-byte are inverted. Thus \ecA becomes hex 01, as in ASCII (A is C1), but because
+character code 0; the letters (in either case) encode characters 1-26 (hex 01
-the EBCDIC letters are disjoint, \ecZ becomes hex 29 (Z is E9), and other
+to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
-characters also generate different values.
+\e? becomes either 255 (hex FF) or 95 (hex 5F).
 .P
 Thus, apart from \e?, these escapes generate the same character code values as
 they do in an ASCII environment, though the meanings of the values mostly
 differ. For example, \eG always generates code value 7, which is BEL in ASCII
 but DEL in EBCDIC.
 .P
 The sequence \e? generates DEL (127, hex 7F) in an ASCII environment, but
 because 127 is not a control character in EBCDIC, Perl makes it generate the
 APC character. Unfortunately, there are several variants of EBCDIC. In most of
 them the APC character has the value 255 (hex FF), but in the one Perl calls
 POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
 values, PCRE makes \e? generate 95; otherwise it generates 255.
 .P
 After \e0 up to two further octal digits are read. If there are fewer than two
-digits, just those that are present are used. Thus the sequence \e0\ex\e07
+digits, just those that are present are used. Thus the sequence \e0\ex\e015
-specifies two binary zeros followed by a BEL character (code value 7). Make
+specifies two binary zeros followed by a CR character (code value 13). Make
 sure you supply two digits after the initial zero if the pattern character that
 follows is itself an octal digit.
 .P
@ -3283,6 +3296,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 08 January 2014
+Last updated: 14 June 2015
-Copyright (c) 1997-2014 University of Cambridge.
+Copyright (c) 1997-2015 University of Cambridge.
 .fi
--- a/pcre/pcre_compile.c
+++ b/pcre/pcre_compile.c
@ -174,7 +174,7 @@ static const short int escapes[] = {
     -ESC_Z,                  CHAR_LEFT_SQUARE_BRACKET,
     CHAR_BACKSLASH,          CHAR_RIGHT_SQUARE_BRACKET,
     CHAR_CIRCUMFLEX_ACCENT,  CHAR_UNDERSCORE,
-     CHAR_GRAVE_ACCENT,       7,
+     CHAR_GRAVE_ACCENT,       ESC_a,
     -ESC_b,                  0,
     -ESC_d,                  ESC_e,
     ESC_f,                   0,
@ -202,9 +202,9 @@ static const short int escapes[] = {
 /*  68 */     0,     0,    '|',     ',',    '%',   '_',    '>',    '?',
 /*  70 */     0,     0,      0,       0,      0,     0,      0,      0,
 /*  78 */     0,   '`',    ':',     '#',    '@',  '\'',    '=',    '"',
-/*  80 */     0,     7, -ESC_b,       0, -ESC_d, ESC_e,  ESC_f,      0,
+/*  80 */     0, ESC_a, -ESC_b,       0, -ESC_d, ESC_e,  ESC_f,      0,
 /*  88 */-ESC_h,     0,      0,     '{',      0,     0,      0,      0,
-/*  90 */     0,     0, -ESC_k,     'l',      0, ESC_n,      0, -ESC_p,
+/*  90 */     0,     0, -ESC_k,       0,      0, ESC_n,      0, -ESC_p,
 /*  98 */     0, ESC_r,      0,     '}',      0,     0,      0,      0,
 /*  A0 */     0,   '~', -ESC_s, ESC_tee,      0,-ESC_v, -ESC_w,      0,
 /*  A8 */     0,-ESC_z,      0,       0,      0,   '[',      0,      0,
@ -219,6 +219,12 @@ static const short int escapes[] = {
 /*  F0 */     0,     0,      0,       0,      0,     0,      0,      0,
 /*  F8 */     0,     0,      0,       0,      0,     0,      0,      0
 };
 /* We also need a table of characters that may follow \c in an EBCDIC
 environment for characters 0-31. */
 static unsigned char ebcdic_escape_c[] = "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_";
 #endif
@ -458,7 +464,7 @@ static const char error_texts[] =
  "range out of order in character class\0"
  "nothing to repeat\0"
  /* 10 */
-  "operand of unlimited repeat could match the empty string\0"  /** DEAD **/
+  "internal error: invalid forward reference offset\0"
  "internal error: unexpected repeat\0"
  "unrecognized character after (? or (?-\0"
  "POSIX named classes are supported only within a class\0"
@ -527,7 +533,11 @@ static const char error_texts[] =
  "different names for subpatterns of the same number are not allowed\0"
  "(*MARK) must have an argument\0"
  "this version of PCRE is not compiled with Unicode property support\0"
 #ifndef EBCDIC
  "\\c must be followed by an ASCII character\0"
 #else
  "\\c must be followed by a letter or one of [\\]^_?\0"
 #endif
  "\\k is not followed by a braced, angle-bracketed, or quoted name\0"
  /* 70 */
  "internal error: unknown opcode in find_fixedlength()\0"
@ -1425,7 +1435,16 @@ else
    c ^= 0x40;
 #else             /* EBCDIC coding */
    if (c >= CHAR_a && c <= CHAR_z) c += 64;
-    c ^= 0xC0;
+    if (c == CHAR_QUESTION_MARK)
      c = ('\\' == 188 && '`' == 74)? 0x5f : 0xff;
    else
      {
      for (i = 0; i < 32; i++)
        {
        if (c == ebcdic_escape_c[i]) break;
        }
      if (i < 32) c = i; else *errorcodeptr = ERR68;
      }
 #endif
    break;
@ -1799,7 +1818,7 @@ for (;;)
    case OP_ASSERTBACK:
    case OP_ASSERTBACK_NOT:
    do cc += GET(cc, 1); while (*cc == OP_ALT);
-    cc += PRIV(OP_lengths)[*cc];
+    cc += 1 + LINK_SIZE;
    break;
    /* Skip over things that don't match chars */
@ -2487,7 +2506,7 @@ for (code = first_significant_code(code + PRIV(OP_lengths)[*code], TRUE);
  if (c == OP_BRA  || c == OP_BRAPOS ||
      c == OP_CBRA || c == OP_CBRAPOS ||
      c == OP_ONCE || c == OP_ONCE_NC ||
-      c == OP_COND)
+      c == OP_COND || c == OP_SCOND)
    {
    BOOL empty_branch;
    if (GET(code, 1) == 0) return TRUE;    /* Hit unclosed bracket */
@ -3886,11 +3905,11 @@ didn't consider this to be a POSIX class. Likewise for [:1234:].
 The problem in trying to be exactly like Perl is in the handling of escapes. We
 have to be sure that [abc[:x\]pqr] is *not* treated as containing a POSIX
 class, but [abc[:x\]pqr:]] is (so that an error can be generated). The code
-below handles the special case of \], but does not try to do any other escape
+below handles the special cases \\ and \], but does not try to do any other
-processing. This makes it different from Perl for cases such as [:l\ower:]
+escape processing. This makes it different from Perl for cases such as
-where Perl recognizes it as the POSIX class "lower" but PCRE does not recognize
+[:l\ower:] where Perl recognizes it as the POSIX class "lower" but PCRE does
-"l\ower". This is a lesser evil than not diagnosing bad classes when Perl does,
+not recognize "l\ower". This is a lesser evil than not diagnosing bad classes
-I think.
+when Perl does, I think.
 A user pointed out that PCRE was rejecting [:a[:digit:]] whereas Perl was not.
 It seems that the appearance of a nested POSIX class supersedes an apparent
@ -3917,21 +3936,16 @@ pcre_uchar terminator;          /* Don't combine these lines; the Solaris cc */
 terminator = *(++ptr);   /* compiler warns about "non-constant" initializer. */
 for (++ptr; *ptr != CHAR_NULL; ptr++)
  {
-  if (*ptr == CHAR_BACKSLASH && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
+  if (*ptr == CHAR_BACKSLASH &&
      (ptr[1] == CHAR_RIGHT_SQUARE_BRACKET ||
       ptr[1] == CHAR_BACKSLASH))
    ptr++;
-  else if (*ptr == CHAR_RIGHT_SQUARE_BRACKET) return FALSE;
+  else if ((*ptr == CHAR_LEFT_SQUARE_BRACKET && ptr[1] == terminator) ||
-  else
+            *ptr == CHAR_RIGHT_SQUARE_BRACKET) return FALSE;
  else if (*ptr == terminator && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
    {
-    if (*ptr == terminator && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
+    *endptr = ptr;
-      {
+    return TRUE;
      *endptr = ptr;
      return TRUE;
      }
    if (*ptr == CHAR_LEFT_SQUARE_BRACKET &&
         (ptr[1] == CHAR_COLON || ptr[1] == CHAR_DOT ||
          ptr[1] == CHAR_EQUALS_SIGN) &&
        check_posix_syntax(ptr, endptr))
      return FALSE;
    }
  }
 return FALSE;
@ -3985,11 +3999,12 @@ have their offsets adjusted. That one of the jobs of this function. Before it
 is called, the partially compiled regex must be temporarily terminated with
 OP_END.
-This function has been extended with the possibility of forward references for
+This function has been extended to cope with forward references for recursions
-recursions and subroutine calls. It must also check the list of such references
+and subroutine calls. It must check the list of such references for the
-for the group we are dealing with. If it finds that one of the recursions in
+group we are dealing with. If it finds that one of the recursions in the
-the current group is on this list, it adjusts the offset in the list, not the
+current group is on this list, it does not adjust the value in the reference
-value in the reference (which is a group number).
+(which is a group number). After the group has been scanned, all the offsets in
 the forward reference list for the group are adjusted.
 Arguments:
  group      points to the start of the group
@ -4005,29 +4020,21 @@ static void
 adjust_recurse(pcre_uchar *group, int adjust, BOOL utf, compile_data *cd,
  size_t save_hwm_offset)
 {
 int offset;
 pcre_uchar *hc;
 pcre_uchar *ptr = group;
 while ((ptr = (pcre_uchar *)find_recurse(ptr, utf)) != NULL)
  {
  int offset;
  pcre_uchar *hc;
  /* See if this recursion is on the forward reference list. If so, adjust the
  reference. */
  for (hc = (pcre_uchar *)cd->start_workspace + save_hwm_offset; hc < cd->hwm;
       hc += LINK_SIZE)
    {
    offset = (int)GET(hc, 0);
-    if (cd->start_code + offset == ptr + 1)
+    if (cd->start_code + offset == ptr + 1) break;
      {
      PUT(hc, 0, offset + adjust);
      break;
      }
    }
-  /* Otherwise, adjust the recursion offset if it's after the start of this
+  /* If we have not found this recursion on the forward reference list, adjust
-  group. */
+  the recursion's offset if it's after the start of this group. */
  if (hc >= cd->hwm)
    {
@ -4037,6 +4044,15 @@ while ((ptr = (pcre_uchar *)find_recurse(ptr, utf)) != NULL)
  ptr += 1 + LINK_SIZE;
  }
 /* Now adjust all forward reference offsets for the group. */
 for (hc = (pcre_uchar *)cd->start_workspace + save_hwm_offset; hc < cd->hwm;
     hc += LINK_SIZE)
  {
  offset = (int)GET(hc, 0);
  PUT(hc, 0, offset + adjust);
  }
 }
@ -4465,7 +4481,7 @@ const pcre_uchar *tempptr;
 const pcre_uchar *nestptr = NULL;
 pcre_uchar *previous = NULL;
 pcre_uchar *previous_callout = NULL;
-size_t save_hwm_offset = 0;
+size_t item_hwm_offset = 0;
 pcre_uint8 classbits[32];
 /* We can fish out the UTF-8 setting once and for all into a BOOL, but we
@ -4623,8 +4639,7 @@ for (;; ptr++)
  /* In the real compile phase, just check the workspace used by the forward
  reference list. */
-  else if (cd->hwm > cd->start_workspace + cd->workspace_size -
+  else if (cd->hwm > cd->start_workspace + cd->workspace_size)
           WORK_SIZE_SAFETY_MARGIN)
    {
    *errorcodeptr = ERR52;
    goto FAILED;
@ -4767,6 +4782,7 @@ for (;; ptr++)
    zeroreqchar = reqchar;
    zeroreqcharflags = reqcharflags;
    previous = code;
    item_hwm_offset = cd->hwm - cd->start_workspace;
    *code++ = ((options & PCRE_DOTALL) != 0)? OP_ALLANY: OP_ANY;
    break;
@ -4818,6 +4834,7 @@ for (;; ptr++)
    /* Handle a real character class. */
    previous = code;
    item_hwm_offset = cd->hwm - cd->start_workspace;
    /* PCRE supports POSIX class stuff inside a class. Perl gives an error if
    they are encountered at the top level, so we'll do that too. */
@ -4923,9 +4940,10 @@ for (;; ptr++)
      (which is on the stack). We have to remember that there was XCLASS data,
      however. */
      if (class_uchardata > class_uchardata_base) xclass = TRUE;
      if (lengthptr != NULL && class_uchardata > class_uchardata_base)
        {
        xclass = TRUE;
        *lengthptr += (int)(class_uchardata - class_uchardata_base);
        class_uchardata = class_uchardata_base;
        }
@ -5028,10 +5046,26 @@ for (;; ptr++)
            ptr = tempptr + 1;
            continue;
-            /* For all other POSIX classes, no special action is taken in UCP
+            /* For the other POSIX classes (ascii, xdigit) we are going to fall
-            mode. Fall through to the non_UCP case. */
+            through to the non-UCP case and build a bit map for characters with
            code points less than 256. If we are in a negated POSIX class
            within a non-negated overall class, characters with code points
            greater than 255 must all match. In the special case where we have
            not yet generated any xclass data, and this is the final item in
            the overall class, we need do nothing: later on, the opcode
            OP_NCLASS will be used to indicate that characters greater than 255
            are acceptable. If we have already seen an xclass item or one may
            follow (we have to assume that it might if this is not the end of
            the class), explicitly match all wide codepoints. */
            default:
            if (!negate_class && local_negate &&
                (xclass || tempptr[2] != CHAR_RIGHT_SQUARE_BRACKET))
              {
              *class_uchardata++ = XCL_RANGE;
              class_uchardata += PRIV(ord2utf)(0x100, class_uchardata);
              class_uchardata += PRIV(ord2utf)(0x10ffff, class_uchardata);
              }
            break;
            }
          }
@ -5195,9 +5229,9 @@ for (;; ptr++)
              cd, PRIV(vspace_list));
            continue;
 #ifdef SUPPORT_UCP
            case ESC_p:
            case ESC_P:
 #ifdef SUPPORT_UCP
              {
              BOOL negated;
              unsigned int ptype = 0, pdata = 0;
@ -5211,6 +5245,9 @@ for (;; ptr++)
              class_has_8bitchar--;                /* Undo! */
              continue;
              }
 #else
            *errorcodeptr = ERR45;
            goto FAILED;
 #endif
            /* Unrecognized escapes are faulted if PCRE is running in its
            strict mode. By default, for compatibility with Perl, they are
@ -5367,16 +5404,20 @@ for (;; ptr++)
      CLASS_SINGLE_CHARACTER:
      if (class_one_char < 2) class_one_char++;
-      /* If class_one_char is 1, we have the first single character in the
+      /* If xclass_has_prop is false and class_one_char is 1, we have the first
-      class, and there have been no prior ranges, or XCLASS items generated by
+      single character in the class, and there have been no prior ranges, or
-      escapes. If this is the final character in the class, we can optimize by
+      XCLASS items generated by escapes. If this is the final character in the
-      turning the item into a 1-character OP_CHAR[I] if it's positive, or
+      class, we can optimize by turning the item into a 1-character OP_CHAR[I]
-      OP_NOT[I] if it's negative. In the positive case, it can cause firstchar
+      if it's positive, or OP_NOT[I] if it's negative. In the positive case, it
-      to be set. Otherwise, there can be no first char if this item is first,
+      can cause firstchar to be set. Otherwise, there can be no first char if
-      whatever repeat count may follow. In the case of reqchar, save the
+      this item is first, whatever repeat count may follow. In the case of
-      previous value for reinstating. */
+      reqchar, save the previous value for reinstating. */
-      if (!inescq && class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
+      if (!inescq &&
 #ifdef SUPPORT_UCP
          !xclass_has_prop &&
 #endif
          class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
        {
        ptr++;
        zeroreqchar = reqchar;
@ -5492,9 +5533,10 @@ for (;; ptr++)
    actual compiled code. */
 #ifdef SUPPORT_UTF
-    if (xclass && (!should_flip_negation || (options & PCRE_UCP) != 0))
+    if (xclass && (xclass_has_prop || !should_flip_negation ||
        (options & PCRE_UCP) != 0))
 #elif !defined COMPILE_PCRE8
-    if (xclass && !should_flip_negation)
+    if (xclass && (xclass_has_prop || !should_flip_negation))
 #endif
 #if defined SUPPORT_UTF || !defined COMPILE_PCRE8
      {
@ -5930,7 +5972,7 @@ for (;; ptr++)
      {
      register int i;
      int len = (int)(code - previous);
-      size_t base_hwm_offset = save_hwm_offset;
+      size_t base_hwm_offset = item_hwm_offset;
      pcre_uchar *bralink = NULL;
      pcre_uchar *brazeroptr = NULL;
@ -5985,7 +6027,7 @@ for (;; ptr++)
        if (repeat_max <= 1)    /* Covers 0, 1, and unlimited */
          {
          *code = OP_END;
-          adjust_recurse(previous, 1, utf, cd, save_hwm_offset);
+          adjust_recurse(previous, 1, utf, cd, item_hwm_offset);
          memmove(previous + 1, previous, IN_UCHARS(len));
          code++;
          if (repeat_max == 0)
@ -6009,7 +6051,7 @@ for (;; ptr++)
          {
          int offset;
          *code = OP_END;
-          adjust_recurse(previous, 2 + LINK_SIZE, utf, cd, save_hwm_offset);
+          adjust_recurse(previous, 2 + LINK_SIZE, utf, cd, item_hwm_offset);
          memmove(previous + 2 + LINK_SIZE, previous, IN_UCHARS(len));
          code += 2 + LINK_SIZE;
          *previous++ = OP_BRAZERO + repeat_type;
@ -6254,6 +6296,12 @@ for (;; ptr++)
            while (*scode == OP_ALT);
            }
          /* A conditional group with only one branch has an implicit empty
          alternative branch. */
          if (*bracode == OP_COND && bracode[GET(bracode,1)] != OP_ALT)
            *bracode = OP_SCOND;
          /* Handle possessive quantifiers. */
          if (possessive_quantifier)
@ -6267,11 +6315,11 @@ for (;; ptr++)
              {
              int nlen = (int)(code - bracode);
              *code = OP_END;
-              adjust_recurse(bracode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
+              adjust_recurse(bracode, 1 + LINK_SIZE, utf, cd, item_hwm_offset);
              memmove(bracode + 1 + LINK_SIZE, bracode, IN_UCHARS(nlen));
              code += 1 + LINK_SIZE;
              nlen += 1 + LINK_SIZE;
-              *bracode = OP_BRAPOS;
+              *bracode = (*bracode == OP_COND)? OP_BRAPOS : OP_SBRAPOS;
              *code++ = OP_KETRPOS;
              PUTINC(code, 0, nlen);
              PUT(bracode, 1, nlen);
@ -6401,7 +6449,7 @@ for (;; ptr++)
        else
          {
          *code = OP_END;
-          adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
+          adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, item_hwm_offset);
          memmove(tempcode + 1 + LINK_SIZE, tempcode, IN_UCHARS(len));
          code += 1 + LINK_SIZE;
          len += 1 + LINK_SIZE;
@ -6450,7 +6498,7 @@ for (;; ptr++)
        default:
        *code = OP_END;
-        adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
+        adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, item_hwm_offset);
        memmove(tempcode + 1 + LINK_SIZE, tempcode, IN_UCHARS(len));
        code += 1 + LINK_SIZE;
        len += 1 + LINK_SIZE;
@ -6586,9 +6634,17 @@ for (;; ptr++)
              goto FAILED;
              }
            setverb = *code++ = verbs[i].op_arg;
-            *code++ = arglen;
+            if (lengthptr != NULL)    /* In pass 1 just add in the length */
-            memcpy(code, arg, IN_UCHARS(arglen));
+              {                       /* to avoid potential workspace */
-            code += arglen;
+              *lengthptr += arglen;   /* overflow. */
              *code++ = 0;
              }
            else
              {
              *code++ = arglen;
              memcpy(code, arg, IN_UCHARS(arglen));
              code += arglen;
              }
            *code++ = 0;
            }
@ -6623,7 +6679,7 @@ for (;; ptr++)
    newoptions = options;
    skipbytes = 0;
    bravalue = OP_CBRA;
-    save_hwm_offset = cd->hwm - cd->start_workspace;
+    item_hwm_offset = cd->hwm - cd->start_workspace;
    reset_bracount = FALSE;
    /* Deal with the extended parentheses; all are introduced by '?', and the
@ -6641,6 +6697,7 @@ for (;; ptr++)
        /* ------------------------------------------------------------ */
        case CHAR_VERTICAL_LINE:  /* Reset capture count for each branch */
        reset_bracount = TRUE;
        cd->dupgroups = TRUE;     /* Record (?| encountered */
        /* Fall through */
        /* ------------------------------------------------------------ */
@ -6741,6 +6798,12 @@ for (;; ptr++)
          {
          while (IS_DIGIT(*ptr))
            {
            if (recno > INT_MAX / 10 - 1)  /* Integer overflow */
              {
              while (IS_DIGIT(*ptr)) ptr++;
              *errorcodeptr = ERR61;
              goto FAILED;
              }
            recno = recno * 10 + (int)(*ptr - CHAR_0);
            ptr++;
            }
@ -6769,7 +6832,7 @@ for (;; ptr++)
            ptr++;
            }
          namelen = (int)(ptr - name);
-          if (lengthptr != NULL) *lengthptr += IMM2_SIZE;
+          if (lengthptr != NULL) skipbytes += IMM2_SIZE;
          }
        /* Check the terminator */
@ -6875,6 +6938,11 @@ for (;; ptr++)
              *errorcodeptr = ERR15;
              goto FAILED;
              }
            if (recno > INT_MAX / 10 - 1)   /* Integer overflow */
              {
              *errorcodeptr = ERR61;
              goto FAILED;
              }
            recno = recno * 10 + name[i] - CHAR_0;
            }
          if (recno == 0) recno = RREF_ANY;
@ -7151,6 +7219,7 @@ for (;; ptr++)
        if (lengthptr != NULL)
          {
          named_group *ng;
          recno = 0;
          if (namelen == 0)
            {
@ -7168,20 +7237,6 @@ for (;; ptr++)
            goto FAILED;
            }
          /* The name table does not exist in the first pass; instead we must
          scan the list of names encountered so far in order to get the
          number. If the name is not found, set the value to 0 for a forward
          reference. */
          ng = cd->named_groups;
          for (i = 0; i < cd->names_found; i++, ng++)
            {
            if (namelen == ng->length &&
                STRNCMP_UC_UC(name, ng->name, namelen) == 0)
              break;
            }
          recno = (i < cd->names_found)? ng->number : 0;
          /* Count named back references. */
          if (!is_recurse) cd->namedrefcount++;
@ -7191,6 +7246,56 @@ for (;; ptr++)
          16-bit data item. */
          *lengthptr += IMM2_SIZE;
          /* If this is a forward reference and we are within a (?|...) group,
          the reference may end up as the number of a group which we are
          currently inside, that is, it could be a recursive reference. In the
          real compile this will be picked up and the reference wrapped with
          OP_ONCE to make it atomic, so we must space in case this occurs. */
          /* In fact, this can happen for a non-forward reference because
          another group with the same number might be created later. This
          issue is fixed "properly" in PCRE2. As PCRE1 is now in maintenance
          only mode, we finesse the bug by allowing more memory always. */
          *lengthptr += 2 + 2*LINK_SIZE;
          /* It is even worse than that. The current reference may be to an
          existing named group with a different number (so apparently not
          recursive) but which later on is also attached to a group with the
          current number. This can only happen if $(| has been previous
          encountered. In that case, we allow yet more memory, just in case.
          (Again, this is fixed "properly" in PCRE2. */
          if (cd->dupgroups) *lengthptr += 4 + 4*LINK_SIZE;
          /* Otherwise, check for recursion here. The name table does not exist
          in the first pass; instead we must scan the list of names encountered
          so far in order to get the number. If the name is not found, leave
          the value of recno as 0 for a forward reference. */
          else
            {
            ng = cd->named_groups;
            for (i = 0; i < cd->names_found; i++, ng++)
              {
              if (namelen == ng->length &&
                  STRNCMP_UC_UC(name, ng->name, namelen) == 0)
                {
                open_capitem *oc;
                recno = ng->number;
                if (is_recurse) break;
                for (oc = cd->open_caps; oc != NULL; oc = oc->next)
                  {
                  if (oc->number == recno)
                    {
                    oc->flag = TRUE;
                    break;
                    }
                  }
                }
              }
            }
          }
        /* In the real compile, search the name table. We check the name
@ -7237,8 +7342,6 @@ for (;; ptr++)
          for (i++; i < cd->names_found; i++)
            {
            if (STRCMP_UC_UC(slot + IMM2_SIZE, cslot + IMM2_SIZE) != 0) break;
            count++;
            cslot += cd->name_entry_size;
            }
@ -7247,6 +7350,7 @@ for (;; ptr++)
            {
            if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE;
            previous = code;
            item_hwm_offset = cd->hwm - cd->start_workspace;
            *code++ = ((options & PCRE_CASELESS) != 0)? OP_DNREFI : OP_DNREF;
            PUT2INC(code, 0, index);
            PUT2INC(code, 0, count);
@ -7284,9 +7388,14 @@ for (;; ptr++)
        /* ------------------------------------------------------------ */
-        case CHAR_R:              /* Recursion */
+        case CHAR_R:              /* Recursion, same as (?0) */
-        ptr++;                    /* Same as (?0)      */
+        recno = 0;
-        /* Fall through */
+        if (*(++ptr) != CHAR_RIGHT_PARENTHESIS)
          {
          *errorcodeptr = ERR29;
          goto FAILED;
          }
        goto HANDLE_RECURSION;
        /* ------------------------------------------------------------ */
@ -7323,7 +7432,15 @@ for (;; ptr++)
          recno = 0;
          while(IS_DIGIT(*ptr))
            {
            if (recno > INT_MAX / 10 - 1) /* Integer overflow */
              {
              while (IS_DIGIT(*ptr)) ptr++;
              *errorcodeptr = ERR61;
              goto FAILED;
              }
            recno = recno * 10 + *ptr++ - CHAR_0;
            }
          if (*ptr != (pcre_uchar)terminator)
            {
@ -7360,6 +7477,7 @@ for (;; ptr++)
          HANDLE_RECURSION:
          previous = code;
          item_hwm_offset = cd->hwm - cd->start_workspace;
          called = cd->start_code;
          /* When we are actually compiling, find the bracket that is being
@ -7561,7 +7679,11 @@ for (;; ptr++)
      previous = NULL;
      cd->iscondassert = FALSE;
      }
-    else previous = code;
+    else
      {
      previous = code;
      item_hwm_offset = cd->hwm - cd->start_workspace;
      }
    *code = bravalue;
    tempcode = code;
@ -7809,7 +7931,7 @@ for (;; ptr++)
        const pcre_uchar *p;
        pcre_uint32 cf;
-        save_hwm_offset = cd->hwm - cd->start_workspace;   /* Normally this is set when '(' is read */
+        item_hwm_offset = cd->hwm - cd->start_workspace;   /* Normally this is set when '(' is read */
        terminator = (*(++ptr) == CHAR_LESS_THAN_SIGN)?
          CHAR_GREATER_THAN_SIGN : CHAR_APOSTROPHE;
@ -7838,7 +7960,7 @@ for (;; ptr++)
        if (*p != (pcre_uchar)terminator)
          {
          *errorcodeptr = ERR57;
-          break;
+          goto FAILED;
          }
        ptr++;
        goto HANDLE_NUMERICAL_RECURSION;
@ -7853,7 +7975,7 @@ for (;; ptr++)
          ptr[1] != CHAR_APOSTROPHE && ptr[1] != CHAR_LEFT_CURLY_BRACKET))
          {
          *errorcodeptr = ERR69;
-          break;
+          goto FAILED;
          }
        is_recurse = FALSE;
        terminator = (*(++ptr) == CHAR_LESS_THAN_SIGN)?
@ -7877,6 +7999,7 @@ for (;; ptr++)
        HANDLE_REFERENCE:
        if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE;
        previous = code;
        item_hwm_offset = cd->hwm - cd->start_workspace;
        *code++ = ((options & PCRE_CASELESS) != 0)? OP_REFI : OP_REF;
        PUT2INC(code, 0, recno);
        cd->backref_map |= (recno < 32)? (1 << recno) : 1;
@ -7906,6 +8029,7 @@ for (;; ptr++)
        if (!get_ucp(&ptr, &negated, &ptype, &pdata, errorcodeptr))
          goto FAILED;
        previous = code;
        item_hwm_offset = cd->hwm - cd->start_workspace;
        *code++ = ((escape == ESC_p) != negated)? OP_PROP : OP_NOTPROP;
        *code++ = ptype;
        *code++ = pdata;
@ -7946,6 +8070,7 @@ for (;; ptr++)
          {
          previous = (escape > ESC_b && escape < ESC_Z)? code : NULL;
          item_hwm_offset = cd->hwm - cd->start_workspace;
          *code++ = (!utf && escape == ESC_C)? OP_ALLANY : escape;
          }
        }
@ -7989,6 +8114,7 @@ for (;; ptr++)
    ONE_CHAR:
    previous = code;
    item_hwm_offset = cd->hwm - cd->start_workspace;
    /* For caseless UTF-8 mode when UCP support is available, check whether
    this character has more than one other case. If so, generate a special
@ -9164,6 +9290,7 @@ cd->names_found = 0;
 cd->name_entry_size = 0;
 cd->name_table = NULL;
 cd->dupnames = FALSE;
 cd->dupgroups = FALSE;
 cd->namedrefcount = 0;
 cd->start_code = cworkspace;
 cd->hwm = cworkspace;
@ -9336,6 +9463,16 @@ if (cd->hwm > cd->start_workspace)
    int offset, recno;
    cd->hwm -= LINK_SIZE;
    offset = GET(cd->hwm, 0);
    /* Check that the hwm handling hasn't gone wrong. This whole area is
    rewritten in PCRE2 because there are some obscure cases. */
    if (offset == 0 || codestart[offset-1] != OP_RECURSE)
      {
      errorcode = ERR10;
      break;
      }
    recno = GET(codestart, offset);
    if (recno != prev_recno)
      {
@ -9366,7 +9503,7 @@ used in this code because at least one compiler gives a warning about loss of
 "const" attribute if the cast (pcre_uchar *)codestart is used directly in the
 function call. */
-if ((options & PCRE_NO_AUTO_POSSESS) == 0)
+if (errorcode == 0 && (options & PCRE_NO_AUTO_POSSESS) == 0)
  {
  pcre_uchar *temp = (pcre_uchar *)codestart;
  auto_possessify(temp, utf, cd);
@ -9380,7 +9517,7 @@ OP_RECURSE that are not fixed length get a diagnosic with a useful offset. The
 exceptional ones forgo this. We scan the pattern to check that they are fixed
 length, and set their lengths. */
-if (cd->check_lookbehind)
+if (errorcode == 0 && cd->check_lookbehind)
  {
  pcre_uchar *cc = (pcre_uchar *)codestart;
@ -9593,4 +9730,3 @@ return (pcre32 *)re;
 }
 /* End of pcre_compile.c */
--- a/pcre/pcre_exec.c
+++ b/pcre/pcre_exec.c
@ -6685,7 +6685,8 @@ if (md->offset_vector != NULL)
  register int *iend = iptr - re->top_bracket;
  if (iend < md->offset_vector + 2) iend = md->offset_vector + 2;
  while (--iptr >= iend) *iptr = -1;
-  md->offset_vector[0] = md->offset_vector[1] = -1;
+  if (offsetcount > 0) md->offset_vector[0] = -1;
  if (offsetcount > 1) md->offset_vector[1] = -1;
  }
 /* Set up the first character to match, if available. The first_char value is
--- a/pcre/pcre_internal.h
+++ b/pcre/pcre_internal.h
@ -984,7 +984,7 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
 #ifndef EBCDIC
 #define HSPACE_LIST \
-  CHAR_HT, CHAR_SPACE, 0xa0, \
+  CHAR_HT, CHAR_SPACE, CHAR_NBSP, \
  0x1680, 0x180e, 0x2000, 0x2001, 0x2002, 0x2003, 0x2004, 0x2005, \
  0x2006, 0x2007, 0x2008, 0x2009, 0x200A, 0x202f, 0x205f, 0x3000, \
  NOTACHAR
@ -1010,7 +1010,7 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
 #define HSPACE_BYTE_CASES \
  case CHAR_HT: \
  case CHAR_SPACE: \
-  case 0xa0     /* NBSP */
+  case CHAR_NBSP
 #define HSPACE_CASES \
  HSPACE_BYTE_CASES: \
@ -1037,11 +1037,12 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
 /* ------ EBCDIC environments ------ */
 #else
-#define HSPACE_LIST CHAR_HT, CHAR_SPACE
+#define HSPACE_LIST CHAR_HT, CHAR_SPACE, CHAR_NBSP, NOTACHAR
 #define HSPACE_BYTE_CASES \
  case CHAR_HT: \
-  case CHAR_SPACE
+  case CHAR_SPACE: \
  case CHAR_NBSP
 #define HSPACE_CASES HSPACE_BYTE_CASES
@ -1215,6 +1216,7 @@ same code point. */
 #define CHAR_ESC                    '\047'
 #define CHAR_DEL                    '\007'
 #define CHAR_NBSP                   '\x41'
 #define STR_ESC                     "\047"
 #define STR_DEL                     "\007"
@ -1229,6 +1231,7 @@ a positive value. */
 #define CHAR_NEL                    ((unsigned char)'\x85')
 #define CHAR_ESC                    '\033'
 #define CHAR_DEL                    '\177'
 #define CHAR_NBSP                   ((unsigned char)'\xa0')
 #define STR_LF                      "\n"
 #define STR_NL                      STR_LF
@ -1606,6 +1609,7 @@ only. */
 #define CHAR_VERTICAL_LINE          '\174'
 #define CHAR_RIGHT_CURLY_BRACKET    '\175'
 #define CHAR_TILDE                  '\176'
 #define CHAR_NBSP                   ((unsigned char)'\xa0')
 #define STR_HT                      "\011"
 #define STR_VT                      "\013"
@ -1762,6 +1766,10 @@ only. */
 /* Escape items that are just an encoding of a particular data value. */
 #ifndef ESC_a
 #define ESC_a CHAR_BEL
 #endif
 #ifndef ESC_e
 #define ESC_e CHAR_ESC
 #endif
@ -2446,6 +2454,7 @@ typedef struct compile_data {
  BOOL had_pruneorskip;             /* (*PRUNE) or (*SKIP) encountered */
  BOOL check_lookbehind;            /* Lookbehinds need later checking */
  BOOL dupnames;                    /* Duplicate names exist */
  BOOL dupgroups;                   /* Duplicate groups exist: (?| found */
  BOOL iscondassert;                /* Next assert is a condition */
  int  nltype;                      /* Newline type */
  int  nllen;                       /* Newline string length */
--- a/pcre/pcre_jit_compile.c
+++ b/pcre/pcre_jit_compile.c
@ -1064,6 +1064,7 @@ pcre_uchar *alternative;
 pcre_uchar *end = NULL;
 int private_data_ptr = *private_data_start;
 int space, size, bracketlen;
 BOOL repeat_check = TRUE;
 while (cc < ccend)
  {
@ -1071,9 +1072,10 @@ while (cc < ccend)
  size = 0;
  bracketlen = 0;
  if (private_data_ptr > SLJIT_MAX_LOCAL_SIZE)
-    return;
+    break;
-  if (*cc == OP_ONCE || *cc == OP_ONCE_NC || *cc == OP_BRA || *cc == OP_CBRA || *cc == OP_COND)
+  if (repeat_check && (*cc == OP_ONCE || *cc == OP_ONCE_NC || *cc == OP_BRA || *cc == OP_CBRA || *cc == OP_COND))
    {
    if (detect_repeat(common, cc))
      {
      /* These brackets are converted to repeats, so no global
@ -1081,6 +1083,8 @@ while (cc < ccend)
      if (cc >= end)
        end = bracketend(cc);
      }
    }
  repeat_check = TRUE;
  switch(*cc)
    {
@ -1136,6 +1140,13 @@ while (cc < ccend)
    bracketlen = 1 + LINK_SIZE + IMM2_SIZE;
    break;
    case OP_BRAZERO:
    case OP_BRAMINZERO:
    case OP_BRAPOSZERO:
    repeat_check = FALSE;
    size = 1;
    break;
    CASE_ITERATOR_PRIVATE_DATA_1
    space = 1;
    size = -2;
@ -1162,12 +1173,17 @@ while (cc < ccend)
    size = 1;
    break;
-    CASE_ITERATOR_TYPE_PRIVATE_DATA_2B
+    case OP_TYPEUPTO:
    if (cc[1 + IMM2_SIZE] != OP_ANYNL && cc[1 + IMM2_SIZE] != OP_EXTUNI)
      space = 2;
    size = 1 + IMM2_SIZE;
    break;
    case OP_TYPEMINUPTO:
    space = 2;
    size = 1 + IMM2_SIZE;
    break;
    case OP_CLASS:
    case OP_NCLASS:
    size += 1 + 32 / sizeof(pcre_uchar);
@ -1316,6 +1332,13 @@ while (cc < ccend)
    cc += 1 + LINK_SIZE + IMM2_SIZE;
    break;
    case OP_THEN:
    stack_restore = TRUE;
    if (common->control_head_ptr != 0)
      *needs_control_head = TRUE;
    cc ++;
    break;
    default:
    stack_restore = TRUE;
    /* Fall through. */
@ -2220,6 +2243,7 @@ while (current != NULL)
    SLJIT_ASSERT_STOP();
    break;
    }
  SLJIT_ASSERT(current > (sljit_sw*)current[-1]);
  current = (sljit_sw*)current[-1];
  }
 return -1;
@ -3209,7 +3233,7 @@ bytes[len] = byte;
 bytes[0] = len;
 }
-static int scan_prefix(compiler_common *common, pcre_uchar *cc, pcre_uint32 *chars, pcre_uint8 *bytes, int max_chars)
+static int scan_prefix(compiler_common *common, pcre_uchar *cc, pcre_uint32 *chars, pcre_uint8 *bytes, int max_chars, pcre_uint32 *rec_count)
 {
 /* Recursive function, which scans prefix literals. */
 BOOL last, any, caseless;
@ -3227,9 +3251,14 @@ pcre_uchar othercase[1];
 repeat = 1;
 while (TRUE)
  {
  if (*rec_count == 0)
    return 0;
  (*rec_count)--;
  last = TRUE;
  any = FALSE;
  caseless = FALSE;
  switch (*cc)
    {
    case OP_CHARI:
@ -3291,7 +3320,7 @@ while (TRUE)
 #ifdef SUPPORT_UTF
    if (common->utf && HAS_EXTRALEN(*cc)) len += GET_EXTRALEN(*cc);
 #endif
-    max_chars = scan_prefix(common, cc + len, chars, bytes, max_chars);
+    max_chars = scan_prefix(common, cc + len, chars, bytes, max_chars, rec_count);
    if (max_chars == 0)
      return consumed;
    last = FALSE;
@ -3314,7 +3343,7 @@ while (TRUE)
    alternative = cc + GET(cc, 1);
    while (*alternative == OP_ALT)
      {
-      max_chars = scan_prefix(common, alternative + 1 + LINK_SIZE, chars, bytes, max_chars);
+      max_chars = scan_prefix(common, alternative + 1 + LINK_SIZE, chars, bytes, max_chars, rec_count);
      if (max_chars == 0)
        return consumed;
      alternative += GET(alternative, 1);
@ -3556,6 +3585,7 @@ int i, max, from;
 int range_right = -1, range_len = 3 - 1;
 sljit_ub *update_table = NULL;
 BOOL in_range;
 pcre_uint32 rec_count;
 for (i = 0; i < MAX_N_CHARS; i++)
  {
@ -3564,7 +3594,8 @@ for (i = 0; i < MAX_N_CHARS; i++)
  bytes[i * MAX_N_BYTES] = 0;
  }
-max = scan_prefix(common, common->start, chars, bytes, MAX_N_CHARS);
+rec_count = 10000;
 max = scan_prefix(common, common->start, chars, bytes, MAX_N_CHARS, &rec_count);
 if (max <= 1)
  return FALSE;
@ -4311,8 +4342,10 @@ switch(length)
  case 4:
  if ((ranges[1] - ranges[0]) == (ranges[3] - ranges[2])
      && (ranges[0] | (ranges[2] - ranges[0])) == ranges[2]
      && (ranges[1] & (ranges[2] - ranges[0])) == 0
      && is_powerof2(ranges[2] - ranges[0]))
    {
    SLJIT_ASSERT((ranges[0] & (ranges[2] - ranges[0])) == 0 && (ranges[2] & ranges[3] & (ranges[2] - ranges[0])) != 0);
    OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, ranges[2] - ranges[0]);
    if (ranges[2] + 1 != ranges[3])
      {
@ -4900,9 +4933,10 @@ else if ((cc[-1] & XCL_MAP) != 0)
  if (!check_class_ranges(common, (const pcre_uint8 *)cc, FALSE, TRUE, list))
    {
 #ifdef COMPILE_PCRE8
-    SLJIT_ASSERT(common->utf);
+    jump = NULL;
    if (common->utf)
 #endif
-    jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255);
+      jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255);
    OP2(SLJIT_AND, TMP2, 0, TMP1, 0, SLJIT_IMM, 0x7);
    OP2(SLJIT_LSHR, TMP1, 0, TMP1, 0, SLJIT_IMM, 3);
@ -4911,7 +4945,10 @@ else if ((cc[-1] & XCL_MAP) != 0)
    OP2(SLJIT_AND | SLJIT_SET_E, SLJIT_UNUSED, 0, TMP1, 0, TMP2, 0);
    add_jump(compiler, list, JUMP(SLJIT_NOT_ZERO));
-    JUMPHERE(jump);
+#ifdef COMPILE_PCRE8
    if (common->utf)
 #endif
      JUMPHERE(jump);
    }
  OP1(SLJIT_MOV, TMP1, 0, TMP3, 0);
@ -5219,7 +5256,7 @@ while (*cc != XCL_END)
      OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_UNUSED, 0, SLJIT_LESS_EQUAL);
      SET_CHAR_OFFSET(0);
-      OP2(SLJIT_SUB | SLJIT_SET_U, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0xff);
+      OP2(SLJIT_SUB | SLJIT_SET_U, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x7f);
      OP_FLAGS(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_LESS_EQUAL);
      SET_TYPE_OFFSET(ucp_Pc);
@ -7665,6 +7702,10 @@ while (*cc != OP_KETRPOS)
      OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), STR_PTR, 0);
      }
    /* Even if the match is empty, we need to reset the control head. */
    if (needs_control_head)
      OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
    if (opcode == OP_SBRAPOS || opcode == OP_SCBRAPOS)
      add_jump(compiler, &emptymatch, CMP(SLJIT_EQUAL, TMP1, 0, STR_PTR, 0));
@ -7692,6 +7733,10 @@ while (*cc != OP_KETRPOS)
      OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), (framesize + 1) * sizeof(sljit_sw), STR_PTR, 0);
      }
    /* Even if the match is empty, we need to reset the control head. */
    if (needs_control_head)
      OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
    if (opcode == OP_SBRAPOS || opcode == OP_SCBRAPOS)
      add_jump(compiler, &emptymatch, CMP(SLJIT_EQUAL, TMP1, 0, STR_PTR, 0));
@ -7704,9 +7749,6 @@ while (*cc != OP_KETRPOS)
      }
    }
  if (needs_control_head)
    OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
  JUMPTO(SLJIT_JUMP, loop);
  flush_stubs(common);
@ -8441,8 +8483,7 @@ while (cc < ccend)
      OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(1), STR_PTR, 0);
      }
    BACKTRACK_AS(braminzero_backtrack)->matchingpath = LABEL();
-    if (cc[1] > OP_ASSERTBACK_NOT)
+    count_match(common);
      count_match(common);
    break;
    case OP_ONCE:
@ -9624,7 +9665,7 @@ static SLJIT_INLINE void compile_recurse(compiler_common *common)
 DEFINE_COMPILER;
 pcre_uchar *cc = common->start + common->currententry->start;
 pcre_uchar *ccbegin = cc + 1 + LINK_SIZE + (*cc == OP_BRA ? 0 : IMM2_SIZE);
-pcre_uchar *ccend = bracketend(cc);
+pcre_uchar *ccend = bracketend(cc) - (1 + LINK_SIZE);
 BOOL needs_control_head;
 int framesize = get_framesize(common, cc, NULL, TRUE, &needs_control_head);
 int private_data_size = get_private_data_copy_length(common, ccbegin, ccend, needs_control_head);
@ -9648,6 +9689,7 @@ set_jumps(common->currententry->calls, common->currententry->entry);
 sljit_emit_fast_enter(compiler, TMP2, 0);
 allocate_stack(common, private_data_size + framesize + alternativesize);
 count_match(common);
 OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(private_data_size + framesize + alternativesize - 1), TMP2, 0);
 copy_private_data(common, ccbegin, ccend, TRUE, private_data_size + framesize + alternativesize, framesize + alternativesize, needs_control_head);
 if (needs_control_head)
@ -9992,6 +10034,7 @@ OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, stack));
 OP1(SLJIT_MOV_UI, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, limit_match));
 OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(struct sljit_stack, base));
 OP1(SLJIT_MOV, STACK_LIMIT, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(struct sljit_stack, limit));
 OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 1);
 OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LIMIT_MATCH, TMP1, 0);
 if (mode == JIT_PARTIAL_SOFT_COMPILE)
--- a/pcre/pcre_jit_test.c
+++ b/pcre/pcre_jit_test.c
@ -182,6 +182,7 @@ static struct regression_test_case regression_test_cases[] = {
 	{ CMUAP, 0, "\xf0\x90\x90\x80{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" },
 	{ CMUAP, 0, "\xf0\x90\x90\xa8{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" },
 	{ CMUAP, 0, "\xe1\xbd\xb8\xe1\xbf\xb8", "\xe1\xbf\xb8\xe1\xbd\xb8" },
 	{ MA, 0, "[3-57-9]", "5" },
 	/* Assertions. */
 	{ MUA, 0, "\\b[^A]", "A_B#" },
--- a/pcre/pcre_study.c
+++ b/pcre/pcre_study.c
@ -71,6 +71,7 @@ Arguments:
  startcode       pointer to start of the whole pattern's code
  options         the compiling options
  recurses        chain of recurse_check to catch mutual recursion
  countptr        pointer to call count (to catch over complexity)
 Returns:   the minimum length
           -1 if \C in UTF-8 mode or (*ACCEPT) was encountered
@ -80,7 +81,8 @@ Returns:   the minimum length
 static int
 find_minlength(const REAL_PCRE *re, const pcre_uchar *code,
-  const pcre_uchar *startcode, int options, recurse_check *recurses)
+  const pcre_uchar *startcode, int options, recurse_check *recurses,
  int *countptr)
 {
 int length = -1;
 /* PCRE_UTF16 has the same value as PCRE_UTF8. */
@ -90,6 +92,8 @@ recurse_check this_recurse;
 register int branchlength = 0;
 register pcre_uchar *cc = (pcre_uchar *)code + 1 + LINK_SIZE;
 if ((*countptr)++ > 1000) return -1;   /* too complex */
 if (*code == OP_CBRA || *code == OP_SCBRA ||
    *code == OP_CBRAPOS || *code == OP_SCBRAPOS) cc += IMM2_SIZE;
@ -131,7 +135,7 @@ for (;;)
    case OP_SBRAPOS:
    case OP_ONCE:
    case OP_ONCE_NC:
-    d = find_minlength(re, cc, startcode, options, recurses);
+    d = find_minlength(re, cc, startcode, options, recurses, countptr);
    if (d < 0) return d;
    branchlength += d;
    do cc += GET(cc, 1); while (*cc == OP_ALT);
@ -415,7 +419,8 @@ for (;;)
            int dd;
            this_recurse.prev = recurses;
            this_recurse.group = cs;
-            dd = find_minlength(re, cs, startcode, options, &this_recurse);
+            dd = find_minlength(re, cs, startcode, options, &this_recurse,
              countptr);
            if (dd < d) d = dd;
            }
          }
@ -451,7 +456,8 @@ for (;;)
          {
          this_recurse.prev = recurses;
          this_recurse.group = cs;
-          d = find_minlength(re, cs, startcode, options, &this_recurse);
+          d = find_minlength(re, cs, startcode, options, &this_recurse,
            countptr);
          }
        }
      }
@ -514,7 +520,7 @@ for (;;)
        this_recurse.prev = recurses;
        this_recurse.group = cs;
        branchlength += find_minlength(re, cs, startcode, options,
-          &this_recurse);
+          &this_recurse, countptr);
        }
      }
    cc += 1 + LINK_SIZE;
@ -1453,6 +1459,7 @@ pcre32_study(const pcre32 *external_re, int options, const char **errorptr)
 #endif
 {
 int min;
 int count = 0;
 BOOL bits_set = FALSE;
 pcre_uint8 start_bits[32];
 PUBL(extra) *extra = NULL;
@ -1539,7 +1546,7 @@ if ((re->options & PCRE_ANCHORED) == 0 &&
 /* Find the minimum length of subject string. */
-switch(min = find_minlength(re, code, code, re->options, NULL))
+switch(min = find_minlength(re, code, code, re->options, NULL, &count))
  {
  case -2: *errorptr = "internal error: missing capturing bracket"; return NULL;
  case -3: *errorptr = "internal error: opcode not recognized"; return NULL;
--- a/pcre/pcre_xclass.c
+++ b/pcre/pcre_xclass.c
@ -246,7 +246,7 @@ while ((t = *data++) != XCL_END)
      case PT_PXPUNCT:
      if ((PRIV(ucp_gentype)[prop->chartype] == ucp_P ||
-            (c < 256 && PRIV(ucp_gentype)[prop->chartype] == ucp_S)) == isprop)
+            (c < 128 && PRIV(ucp_gentype)[prop->chartype] == ucp_S)) == isprop)
        return !negated;
      break;
--- a/pcre/pcregrep.c
+++ b/pcre/pcregrep.c
@ -1692,9 +1692,13 @@ while (ptr < endptr)
    if (filenames == FN_NOMATCH_ONLY) return 1;
    /* If all we want is a yes/no answer, stop now. */
    if (quiet) return 0;
    /* Just count if just counting is wanted. */
-    if (count_only) count++;
+    else if (count_only) count++;
    /* When handling a binary file and binary-files==binary, the "binary"
    variable will be set true (it's false in all other cases). In this
@ -1715,10 +1719,6 @@ while (ptr < endptr)
      return 0;
      }
    /* Likewise, if all we want is a yes/no answer. */
    else if (quiet) return 0;
    /* The --only-matching option prints just the substring that matched,
    and/or one or more captured portions of it, as long as these strings are
    not empty. The --file-offsets and --line-offsets options output offsets for
@ -2089,7 +2089,7 @@ if (filenames == FN_NOMATCH_ONLY)
 /* Print the match count if wanted */
-if (count_only)
+if (count_only && !quiet)
  {
  if (count > 0 || !omit_zero_count)
    {
--- a/pcre/pcretest.c
+++ b/pcre/pcretest.c
@ -4621,9 +4621,9 @@ while (!done)
      else switch ((c = *p++))
        {
-        case 'a': c =    7; break;
+        case 'a': c =  CHAR_BEL; break;
        case 'b': c = '\b'; break;
-        case 'e': c =   27; break;
+        case 'e': c =  CHAR_ESC; break;
        case 'f': c = '\f'; break;
        case 'n': c = '\n'; break;
        case 'r': c = '\r'; break;
--- a/pcre/testdata/grepoutput
+++ b/pcre/testdata/grepoutput
@ -751,3 +751,7 @@ RC=0
 2:3,1
 2:4,1
 RC=0
 ---------------------------- Test 108 ------------------------------
 RC=0
 ---------------------------- Test 109 -----------------------------
 RC=0
--- a/pcre/testdata/testinput1
+++ b/pcre/testdata/testinput1
@ -5730,4 +5730,7 @@ AbcdCBefgBhiBqz
 "(?1)(?#?'){8}(a)"
    baaaaaaaaac
 "(?|(\k'Pm')|(?'Pm'))"
    abcd
 /-- End of testinput1 --/
--- a/pcre/testdata/testinput11
+++ b/pcre/testdata/testinput11
@ -136,4 +136,6 @@ is required for these tests. --/
 /((?+1)(\1))/B
 /.((?2)(?R)\1)()/B
 /-- End of testinput11 --/
--- a/pcre/testdata/testinput12
+++ b/pcre/testdata/testinput12
--- a/pcre/testdata/testinput14
+++ b/pcre/testdata/testinput14
@ -340,4 +340,6 @@ not matter. --/
 /[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/BZ
 /(?'ABC'[bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar](*THEN:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/
 /-- End of testinput14 --/
--- a/pcre/testdata/testinput17
+++ b/pcre/testdata/testinput17
--- a/pcre/testdata/testinput2
+++ b/pcre/testdata/testinput2
--- a/pcre/testdata/testinput6
+++ b/pcre/testdata/testinput6
@ -1502,4 +1502,55 @@
 /\C\X*QT/8
    Ӆ\x0aT
 /[\pS#moq]/
    =
 /[[:punct:]]/8W
    \xc2\xb4
    \x{b4} 
 /[[:^ascii:]]/8W
    \x{100}
    \x{200}
    \x{300}
    \x{37e}
    a
    9
    g
 /[[:^ascii:]\w]/8W
    a
    9
    g
    \x{100}
    \x{200}
    \x{300}
    \x{37e}
 /[\w[:^ascii:]]/8W
    a
    9
    g
    \x{100}
    \x{200}
    \x{300}
    \x{37e}
 /[^[:ascii:]\W]/8W
    a
    9
    g
    \x{100}
    \x{200}
    \x{300}
    \x{37e}
 /[[:^ascii:]a]/8W
    a
    9
    g
    \x{100}
    \x{200}
    \x{37e}
 /-- End of testinput6 --/
--- a/pcre/testdata/testinput7
+++ b/pcre/testdata/testinput7
@ -838,4 +838,19 @@ of case for anything other than the ASCII letters. --/
 /^s?c/mi8I
    scat
 /[\W\p{Any}]/BZ
    abc
    123 
 /[\W\pL]/BZ
    abc
    ** Failers 
    123     
 /a[[:punct:]b]/WBZ
 /a[[:punct:]b]/8WBZ
 /a[b[:punct:]]/8WBZ
 /-- End of testinput7 --/
--- a/pcre/testdata/testinputEBC
+++ b/pcre/testdata/testinputEBC
@ -29,13 +29,16 @@ in EBCDIC, but can be specified as escapes. --/
 /^A\ˆ/
    A B
    A\x41B
 /-- Test \H --/
 /^A\È/
    AB
    A\x42B
    ** Fail
    A B
    A\x41B
 /-- Test \R --/
--- a/pcre/testdata/testoutput1
+++ b/pcre/testdata/testoutput1
@ -9429,4 +9429,9 @@ No match
 0: aaaaaaaaa
 1: a
 "(?|(\k'Pm')|(?'Pm'))"
    abcd
 0: 
 1: 
 /-- End of testinput1 --/
--- a/pcre/testdata/testoutput11-16
+++ b/pcre/testdata/testoutput11-16
@ -231,7 +231,7 @@ Memory allocation (code space): 73
 ------------------------------------------------------------------
 /(?P<a>a)...(?P=a)bbb(?P>a)d/BM
-Memory allocation (code space): 61
+Memory allocation (code space): 77
 ------------------------------------------------------------------
  0  24 Bra
  2   5 CBra 1
@ -650,18 +650,18 @@ Memory allocation (code space): 14
 /[[:^alpha:][:^cntrl:]]+/8WB
 ------------------------------------------------------------------
-  0  26 Bra
+  0  30 Bra
-  2     [ -~\x80-\xff\P{L}]++
+  2     [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
- 26  26 Ket
+ 30  30 Ket
- 28     End
+ 32     End
 ------------------------------------------------------------------
 /[[:^cntrl:][:^alpha:]]+/8WB
 ------------------------------------------------------------------
-  0  26 Bra
+  0  30 Bra
-  2     [ -~\x80-\xff\P{L}]++
+  2     [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
- 26  26 Ket
+ 30  30 Ket
- 28     End
+ 32     End
 ------------------------------------------------------------------
 /[[:alpha:]]+/8WB
@ -748,4 +748,21 @@ Memory allocation (code space): 14
 22     End
 ------------------------------------------------------------------
 /.((?2)(?R)\1)()/B
 ------------------------------------------------------------------
  0  23 Bra
  2     Any
  3  13 Once
  5   9 CBra 1
  8  18 Recurse
 10   0 Recurse
 12     \1
 14   9 Ket
 16  13 Ket
 18   3 CBra 2
 21   3 Ket
 23  23 Ket
 25     End
 ------------------------------------------------------------------
 /-- End of testinput11 --/
--- a/pcre/testdata/testoutput11-32
+++ b/pcre/testdata/testoutput11-32
@ -231,7 +231,7 @@ Memory allocation (code space): 155
 ------------------------------------------------------------------
 /(?P<a>a)...(?P=a)bbb(?P>a)d/BM
-Memory allocation (code space): 125
+Memory allocation (code space): 157
 ------------------------------------------------------------------
  0  24 Bra
  2   5 CBra 1
@ -650,18 +650,18 @@ Memory allocation (code space): 28
 /[[:^alpha:][:^cntrl:]]+/8WB
 ------------------------------------------------------------------
-  0  18 Bra
+  0  21 Bra
-  2     [ -~\x80-\xff\P{L}]++
+  2     [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
- 18  18 Ket
+ 21  21 Ket
- 20     End
+ 23     End
 ------------------------------------------------------------------
 /[[:^cntrl:][:^alpha:]]+/8WB
 ------------------------------------------------------------------
-  0  18 Bra
+  0  21 Bra
-  2     [ -~\x80-\xff\P{L}]++
+  2     [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
- 18  18 Ket
+ 21  21 Ket
- 20     End
+ 23     End
 ------------------------------------------------------------------
 /[[:alpha:]]+/8WB
@ -748,4 +748,21 @@ Memory allocation (code space): 28
 22     End
 ------------------------------------------------------------------
 /.((?2)(?R)\1)()/B
 ------------------------------------------------------------------
  0  23 Bra
  2     Any
  3  13 Once
  5   9 CBra 1
  8  18 Recurse
 10   0 Recurse
 12     \1
 14   9 Ket
 16  13 Ket
 18   3 CBra 2
 21   3 Ket
 23  23 Ket
 25     End
 ------------------------------------------------------------------
 /-- End of testinput11 --/
--- a/pcre/testdata/testoutput11-8
+++ b/pcre/testdata/testoutput11-8
@ -231,7 +231,7 @@ Memory allocation (code space): 45
 ------------------------------------------------------------------
 /(?P<a>a)...(?P=a)bbb(?P>a)d/BM
-Memory allocation (code space): 38
+Memory allocation (code space): 50
 ------------------------------------------------------------------
  0  30 Bra
  3   7 CBra 1
@ -650,18 +650,18 @@ Memory allocation (code space): 10
 /[[:^alpha:][:^cntrl:]]+/8WB
 ------------------------------------------------------------------
-  0  44 Bra
+  0  51 Bra
-  3     [ -~\x80-\xff\P{L}]++
+  3     [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
- 44  44 Ket
+ 51  51 Ket
- 47     End
+ 54     End
 ------------------------------------------------------------------
 /[[:^cntrl:][:^alpha:]]+/8WB
 ------------------------------------------------------------------
-  0  44 Bra
+  0  51 Bra
-  3     [ -~\x80-\xff\P{L}]++
+  3     [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
- 44  44 Ket
+ 51  51 Ket
- 47     End
+ 54     End
 ------------------------------------------------------------------
 /[[:alpha:]]+/8WB
@ -748,4 +748,21 @@ Memory allocation (code space): 10
 34     End
 ------------------------------------------------------------------
 /.((?2)(?R)\1)()/B
 ------------------------------------------------------------------
  0  35 Bra
  3     Any
  4  20 Once
  7  14 CBra 1
 12  27 Recurse
 15   0 Recurse
 18     \1
 21  14 Ket
 24  20 Ket
 27   5 CBra 2
 32   5 Ket
 35  35 Ket
 38     End
 ------------------------------------------------------------------
 /-- End of testinput11 --/
--- a/pcre/testdata/testoutput12
+++ b/pcre/testdata/testoutput12
--- a/pcre/testdata/testoutput14
+++ b/pcre/testdata/testoutput14
@ -527,4 +527,6 @@ Failed: character value in \u.... sequence is too large at offset 6
        End
 ------------------------------------------------------------------
 /(?'ABC'[bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar](*THEN:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/
 /-- End of testinput14 --/
--- a/pcre/testdata/testoutput17
+++ b/pcre/testdata/testoutput17
--- a/pcre/testdata/testoutput2
+++ b/pcre/testdata/testoutput2
--- a/pcre/testdata/testoutput6
+++ b/pcre/testdata/testoutput6
@ -2469,4 +2469,92 @@ No match
    Ӆ\x0aT
 No match
 /[\pS#moq]/
    =
 0: =
 /[[:punct:]]/8W
    \xc2\xb4
 No match
    \x{b4} 
 No match
 /[[:^ascii:]]/8W
    \x{100}
 0: \x{100}
    \x{200}
 0: \x{200}
    \x{300}
 0: \x{300}
    \x{37e}
 0: \x{37e}
    a
 No match
    9
 No match
    g
 No match
 /[[:^ascii:]\w]/8W
    a
 0: a
    9
 0: 9
    g
 0: g
    \x{100}
 0: \x{100}
    \x{200}
 0: \x{200}
    \x{300}
 0: \x{300}
    \x{37e}
 0: \x{37e}
 /[\w[:^ascii:]]/8W
    a
 0: a
    9
 0: 9
    g
 0: g
    \x{100}
 0: \x{100}
    \x{200}
 0: \x{200}
    \x{300}
 0: \x{300}
    \x{37e}
 0: \x{37e}
 /[^[:ascii:]\W]/8W
    a
 No match
    9
 No match
    g
 No match
    \x{100}
 0: \x{100}
    \x{200}
 0: \x{200}
    \x{300}
 No match
    \x{37e}
 No match
 /[[:^ascii:]a]/8W
    a
 0: a
    9
 No match
    g
 No match
    \x{100}
 0: \x{100}
    \x{200}
 0: \x{200}
    \x{37e}
 0: \x{37e}
 /-- End of testinput6 --/
--- a/pcre/testdata/testoutput7
+++ b/pcre/testdata/testoutput7
@ -949,7 +949,7 @@ No match
 /[[:^alpha:][:^cntrl:]]+/8WBZ
 ------------------------------------------------------------------
        Bra
-        [ -~\x80-\xff\P{L}]++
+        [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
        Ket
        End
 ------------------------------------------------------------------
@ -961,7 +961,7 @@ No match
 /[[:^cntrl:][:^alpha:]]+/8WBZ
 ------------------------------------------------------------------
        Bra
-        [ -~\x80-\xff\P{L}]++
+        [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
        Ket
        End
 ------------------------------------------------------------------
@ -2295,4 +2295,57 @@ Need char = 'c' (caseless)
    scat
 0: sc
 /[\W\p{Any}]/BZ
 ------------------------------------------------------------------
        Bra
        [\x00-/:-@[-^`{-\xff\p{Any}]
        Ket
        End
 ------------------------------------------------------------------
    abc
 0: a
    123 
 0: 1
 /[\W\pL]/BZ
 ------------------------------------------------------------------
        Bra
        [\x00-/:-@[-^`{-\xff\p{L}]
        Ket
        End
 ------------------------------------------------------------------
    abc
 0: a
    ** Failers 
 0: *
    123     
 No match
 /a[[:punct:]b]/WBZ
 ------------------------------------------------------------------
        Bra
        a
        [b[:punct:]]
        Ket
        End
 ------------------------------------------------------------------
 /a[[:punct:]b]/8WBZ
 ------------------------------------------------------------------
        Bra
        a
        [b[:punct:]]
        Ket
        End
 ------------------------------------------------------------------
 /a[b[:punct:]]/8WBZ
 ------------------------------------------------------------------
        Bra
        a
        [b[:punct:]]
        Ket
        End
 ------------------------------------------------------------------
 /-- End of testinput7 --/
--- a/pcre/testdata/testoutputEBC
+++ b/pcre/testdata/testoutputEBC
@ -41,16 +41,22 @@ No match
 /^A\ˆ/
    A B
 0: A\x20
    A\x41B
 0: AA
 /-- Test \H --/
 /^A\È/
    AB
 0: AB
    A\x42B
 0: AB
    ** Fail
 No match
    A B
 No match
    A\x41B
 No match
 /-- Test \R --/