Merge branch 'merge/merge-pcre' into 10.0

2015-12-13 16:25:57 +01:00 · 2015-12-13 16:25:57 +01:00 · 095b7b92d1
commit 095b7b92d1
parent 359ae59ac0 e7591a1ba9
39 changed files with 2224 additions and 1263 deletions
--- a/pcre/ChangeLog
+++ b/pcre/ChangeLog
@ -1,6 +1,182 @@
 ChangeLog for PCRE
 ------------------

+Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
+development is happening in the PCRE2 10.xx series.
+
+Version 8.38 23-November-2015
+-----------------------------
+
+1.  If a group that contained a recursive back reference also contained a
+    forward reference subroutine call followed by a non-forward-reference
+    subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
+    compile correct code, leading to undefined behaviour or an internally
+    detected error. This bug was discovered by the LLVM fuzzer.
+
+2.  Quantification of certain items (e.g. atomic back references) could cause
+    incorrect code to be compiled when recursive forward references were
+    involved. For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/.
+    This bug was discovered by the LLVM fuzzer.
+
+3.  A repeated conditional group whose condition was a reference by name caused
+    a buffer overflow if there was more than one group with the given name.
+    This bug was discovered by the LLVM fuzzer.
+
+4.  A recursive back reference by name within a group that had the same name as
+    another group caused a buffer overflow. For example:
+    /(?J)(?'d'(?'d'\g{d}))/. This bug was discovered by the LLVM fuzzer.
+
+5.  A forward reference by name to a group whose number is the same as the
+    current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused
+    a buffer overflow at compile time. This bug was discovered by the LLVM
+    fuzzer.
+
+6.  A lookbehind assertion within a set of mutually recursive subpatterns could
+    provoke a buffer overflow. This bug was discovered by the LLVM fuzzer.
+
+7.  Another buffer overflow bug involved duplicate named groups with a
+    reference between their definition, with a group that reset capture
+    numbers, for example: /(?J:(?|(?'R')(\k'R')|((?'R'))))/. This has been
+    fixed by always allowing for more memory, even if not needed. (A proper fix
+    is implemented in PCRE2, but it involves more refactoring.)
+
+8.  There was no check for integer overflow in subroutine calls such as (?123).
+
+9.  The table entry for \l in EBCDIC environments was incorrect, leading to its
+    being treated as a literal 'l' instead of causing an error.
+
+10. There was a buffer overflow if pcre_exec() was called with an ovector of
+    size 1. This bug was found by american fuzzy lop.
+
+11. If a non-capturing group containing a conditional group that could match
+    an empty string was repeated, it was not identified as matching an empty
+    string itself. For example: /^(?:(?(1)x|)+)+$()/.
+
+12. In an EBCDIC environment, pcretest was mishandling the escape sequences
+    \a and \e in test subject lines.
+
+13. In an EBCDIC environment, \a in a pattern was converted to the ASCII
+    instead of the EBCDIC value.
+
+14. The handling of \c in an EBCDIC environment has been revised so that it is
+    now compatible with the specification in Perl's perlebcdic page.
+
+15. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
+    ASCII/Unicode. This has now been added to the list of characters that are
+    recognized as white space in EBCDIC.
+
+16. When PCRE was compiled without UCP support, the use of \p and \P gave an
+    error (correctly) when used outside a class, but did not give an error
+    within a class.
+
+17. \h within a class was incorrectly compiled in EBCDIC environments.
+
+18. A pattern with an unmatched closing parenthesis that contained a backward
+    assertion which itself contained a forward reference caused buffer
+    overflow. And example pattern is: /(?=di(?<=(?1))|(?=(.))))/.
+
+19. JIT should return with error when the compiled pattern requires more stack
+    space than the maximum.
+
+20. A possessively repeated conditional group that could match an empty string,
+    for example, /(?(R))*+/, was incorrectly compiled.
+
+21. Fix infinite recursion in the JIT compiler when certain patterns such as
+    /(?:|a|){100}x/ are analysed.
+
+22. Some patterns with character classes involving [: and \\ were incorrectly
+    compiled and could cause reading from uninitialized memory or an incorrect
+    error diagnosis.
+
+23. Pathological patterns containing many nested occurrences of [: caused
+    pcre_compile() to run for a very long time.
+
+24. A conditional group with only one branch has an implicit empty alternative
+    branch and must therefore be treated as potentially matching an empty
+    string.
+
+25. If (?R was followed by - or + incorrect behaviour happened instead of a
+    diagnostic.
+
+26. Arrange to give up on finding the minimum matching length for overly
+    complex patterns.
+
+27. Similar to (4) above: in a pattern with duplicated named groups and an
+    occurrence of (?| it is possible for an apparently non-recursive back
+    reference to become recursive if a later named group with the relevant
+    number is encountered. This could lead to a buffer overflow. Wen Guanxing
+    from Venustech ADLAB discovered this bug.
+
+28. If pcregrep was given the -q option with -c or -l, or when handling a
+    binary file, it incorrectly wrote output to stdout.
+
+29. The JIT compiler did not restore the control verb head in case of *THEN
+    control verbs. This issue was found by Karl Skomski with a custom LLVM
+    fuzzer.
+
+30. Error messages for syntax errors following \g and \k were giving inaccurate
+    offsets in the pattern.
+
+31. Added a check for integer overflow in conditions (?(<digits>) and
+    (?(R<digits>). This omission was discovered by Karl Skomski with the LLVM
+    fuzzer.
+
+32. Handling recursive references such as (?2) when the reference is to a group
+    later in the pattern uses code that is very hacked about and error-prone.
+    It has been re-written for PCRE2. Here in PCRE1, a check has been added to
+    give an internal error if it is obvious that compiling has gone wrong.
+
+33. The JIT compiler should not check repeats after a {0,1} repeat byte code.
+    This issue was found by Karl Skomski with a custom LLVM fuzzer.
+
+34. The JIT compiler should restore the control chain for empty possessive
+    repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer.
+
+35. Match limit check added to JIT recursion. This issue was found by Karl
+    Skomski with a custom LLVM fuzzer.
+
+36. Yet another case similar to 27 above has been circumvented by an
+    unconditional allocation of extra memory. This issue is fixed "properly" in
+    PCRE2 by refactoring the way references are handled. Wen Guanxing
+    from Venustech ADLAB discovered this bug.
+
+37. Fix two assertion fails in JIT. These issues were found by Karl Skomski
+    with a custom LLVM fuzzer.
+
+38. Fixed a corner case of range optimization in JIT.
+
+39. An incorrect error "overran compiling workspace" was given if there were
+    exactly enough group forward references such that the last one extended
+    into the workspace safety margin. The next one would have expanded the
+    workspace. The test for overflow was not including the safety margin.
+
+40. A match limit issue is fixed in JIT which was found by Karl Skomski
+    with a custom LLVM fuzzer.
+
+41. Remove the use of /dev/null in testdata/testinput2, because it doesn't
+    work under Windows. (Why has it taken so long for anyone to notice?)
+
+42. In a character class such as [\W\p{Any}] where both a negative-type escape
+    ("not a word character") and a property escape were present, the property
+    escape was being ignored.
+
+43. Fix crash caused by very long (*MARK) or (*THEN) names.
+
+44. A sequence such as [[:punct:]b] that is, a POSIX character class followed
+    by a single ASCII character in a class item, was incorrectly compiled in
+    UCP mode. The POSIX class got lost, but only if the single character
+    followed it.
+
+45. [:punct:] in UCP mode was matching some characters in the range 128-255
+    that should not have been matched.
+
+46. If [:^ascii:] or [:^xdigit:] or [:^cntrl:] are present in a non-negated
+    class, all characters with code points greater than 255 are in the class.
+    When a Unicode property was also in the class (if PCRE_UCP is set, escapes
+    such as \w are turned into Unicode properties), wide characters were not
+    correctly handled, and could fail to match.
+
+
 Version 8.37 28-April-2015
 --------------------------

--- a/pcre/NEWS
+++ b/pcre/NEWS
@ -1,6 +1,14 @@
 News about PCRE releases
 ------------------------

+Release 8.38 23-November-2015
+-----------------------------
+
+This is bug-fix release. Note that this library (now called PCRE1) is now being
+maintained for bug fixes only. New projects are advised to use the new PCRE2
+libraries.
+
+
 Release 8.37 28-April-2015
 --------------------------

--- a/pcre/NON-AUTOTOOLS-BUILD
+++ b/pcre/NON-AUTOTOOLS-BUILD
@ -764,9 +764,9 @@ required. For details, please see this web site:

  http://www.zaconsultants.net

-There is also a mirror here:
-
-  http://www.vsoft-software.com/downloads.html
+You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
+executable, is in EBCDIC and native z/OS file formats and this is the
+recommended download site.

 ==========================
-Last Updated: 10 February 2015
+Last Updated: 25 June 2015
--- a/pcre/RunGrepTest
+++ b/pcre/RunGrepTest
@ -512,6 +512,14 @@ echo "aaaaa" >>testtemp1grep
 (cd $srcdir; $valgrind $pcregrep  --line-offsets '(?<=\Ka)' $builddir/testtemp1grep) >>testtrygrep 2>&1
 echo "RC=$?" >>testtrygrep

+echo "---------------------------- Test 108 ------------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $pcregrep -lq PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep
+echo "RC=$?" >>testtrygrep
+
+echo "---------------------------- Test 109 -----------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $pcregrep -cq lazy ./testdata/grepinput*) >>testtrygrep
+echo "RC=$?" >>testtrygrep
+
 # Now compare the results.

 $cf $srcdir/testdata/grepoutput testtrygrep
--- a/pcre/configure.ac
+++ b/pcre/configure.ac
@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
 dnl be defined as -RC2, for example. For real releases, it should be empty.

 m4_define(pcre_major, [8])
-m4_define(pcre_minor, [37])
+m4_define(pcre_minor, [38])
 m4_define(pcre_prerelease, [])
-m4_define(pcre_date, [2015-04-28])
+m4_define(pcre_date, [2015-11-23])

 # NOTE: The CMakeLists.txt file searches for the above variables in the first
 # 50 lines of this file. Please update that if the variables above are moved.

 # Libtool shared library interface versions (current:revision:age)
-m4_define(libpcre_version, [3:5:2])
-m4_define(libpcre16_version, [2:5:2])
-m4_define(libpcre32_version, [0:5:0])
+m4_define(libpcre_version, [3:6:2])
+m4_define(libpcre16_version, [2:6:2])
+m4_define(libpcre32_version, [0:6:0])
 m4_define(libpcreposix_version, [0:3:0])
 m4_define(libpcrecpp_version, [0:1:0])

--- a/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt
+++ b/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt
@ -764,9 +764,9 @@ required. For details, please see this web site:

  http://www.zaconsultants.net

-There is also a mirror here:
-
-  http://www.vsoft-software.com/downloads.html
+You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
+executable, is in EBCDIC and native z/OS file formats and this is the
+recommended download site.

 ==========================
-Last Updated: 10 February 2015
+Last Updated: 25 June 2015
--- a/pcre/doc/html/pcrepattern.html
+++ b/pcre/doc/html/pcrepattern.html
@ -329,7 +329,8 @@ A second use of backslash provides a way of encoding non-printing characters
 in patterns in a visible manner. There is no restriction on the appearance of
 non-printing characters, apart from the binary zero that terminates a pattern,
 but when a pattern is being prepared by text editing, it is often easier to use
-one of the following escape sequences than the binary character it represents:
+one of the following escape sequences than the binary character it represents.
+In an ASCII or Unicode environment, these escapes are as follows:
 <pre>
  \a        alarm, that is, the BEL character (hex 07)
  \cx       "control-x", where x is any ASCII character
@ -353,19 +354,33 @@ data item (byte or 16-bit value) following \c has a value greater than 127, a
 compile-time error occurs. This locks out non-ASCII characters in all modes.
 </P>
 <P>
-The \c facility was designed for use with ASCII characters, but with the
-extension to Unicode it is even less useful than it once was. It is, however,
-recognized when PCRE is compiled in EBCDIC mode, where data items are always
-bytes. In this mode, all values are valid after \c. If the next character is a
-lower case letter, it is converted to upper case. Then the 0xc0 bits of the
-byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because
-the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other
-characters also generate different values.
+When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
+generate the appropriate EBCDIC code values. The \c escape is processed
+as specified for Perl in the <b>perlebcdic</b> document. The only characters
+that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
+other character provokes a compile-time error. The sequence \@ encodes
+character code 0; the letters (in either case) encode characters 1-26 (hex 01
+to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
+\? becomes either 255 (hex FF) or 95 (hex 5F).
+</P>
+<P>
+Thus, apart from \?, these escapes generate the same character code values as
+they do in an ASCII environment, though the meanings of the values mostly
+differ. For example, \G always generates code value 7, which is BEL in ASCII
+but DEL in EBCDIC.
+</P>
+<P>
+The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
+because 127 is not a control character in EBCDIC, Perl makes it generate the
+APC character. Unfortunately, there are several variants of EBCDIC. In most of
+them the APC character has the value 255 (hex FF), but in the one Perl calls
+POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
+values, PCRE makes \? generate 95; otherwise it generates 255.
 </P>
 <P>
 After \0 up to two further octal digits are read. If there are fewer than two
-digits, just those that are present are used. Thus the sequence \0\x\07
-specifies two binary zeros followed by a BEL character (code value 7). Make
+digits, just those that are present are used. Thus the sequence \0\x\015
+specifies two binary zeros followed by a CR character (code value 13). Make
 sure you supply two digits after the initial zero if the pattern character that
 follows is itself an octal digit.
 </P>
@ -3249,9 +3264,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 08 January 2014
+Last updated: 14 June 2015
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2015 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
--- a/pcre/doc/pcre.txt
+++ b/pcre/doc/pcre.txt
--- a/pcre/doc/pcrepattern.3
+++ b/pcre/doc/pcrepattern.3
@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "08 January 2014" "PCRE 8.35"
+.TH PCREPATTERN 3 "14 June 2015" "PCRE 8.38"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"
@ -308,7 +308,8 @@ A second use of backslash provides a way of encoding non-printing characters
 in patterns in a visible manner. There is no restriction on the appearance of
 non-printing characters, apart from the binary zero that terminates a pattern,
 but when a pattern is being prepared by text editing, it is often easier to use
-one of the following escape sequences than the binary character it represents:
+one of the following escape sequences than the binary character it represents.
+In an ASCII or Unicode environment, these escapes are as follows:
 .sp
  \ea        alarm, that is, the BEL character (hex 07)
  \ecx       "control-x", where x is any ASCII character
@ -331,18 +332,30 @@ but \ec{ becomes hex 3B ({ is 7B), and \ec; becomes hex 7B (; is 3B). If the
 data item (byte or 16-bit value) following \ec has a value greater than 127, a
 compile-time error occurs. This locks out non-ASCII characters in all modes.
 .P
-The \ec facility was designed for use with ASCII characters, but with the
-extension to Unicode it is even less useful than it once was. It is, however,
-recognized when PCRE is compiled in EBCDIC mode, where data items are always
-bytes. In this mode, all values are valid after \ec. If the next character is a
-lower case letter, it is converted to upper case. Then the 0xc0 bits of the
-byte are inverted. Thus \ecA becomes hex 01, as in ASCII (A is C1), but because
-the EBCDIC letters are disjoint, \ecZ becomes hex 29 (Z is E9), and other
-characters also generate different values.
+When PCRE is compiled in EBCDIC mode, \ea, \ee, \ef, \en, \er, and \et
+generate the appropriate EBCDIC code values. The \ec escape is processed
+as specified for Perl in the \fBperlebcdic\fP document. The only characters
+that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ], ^, _, or ?. Any
+other character provokes a compile-time error. The sequence \e@ encodes
+character code 0; the letters (in either case) encode characters 1-26 (hex 01
+to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
+\e? becomes either 255 (hex FF) or 95 (hex 5F).
+.P
+Thus, apart from \e?, these escapes generate the same character code values as
+they do in an ASCII environment, though the meanings of the values mostly
+differ. For example, \eG always generates code value 7, which is BEL in ASCII
+but DEL in EBCDIC.
+.P
+The sequence \e? generates DEL (127, hex 7F) in an ASCII environment, but
+because 127 is not a control character in EBCDIC, Perl makes it generate the
+APC character. Unfortunately, there are several variants of EBCDIC. In most of
+them the APC character has the value 255 (hex FF), but in the one Perl calls
+POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
+values, PCRE makes \e? generate 95; otherwise it generates 255.
 .P
 After \e0 up to two further octal digits are read. If there are fewer than two
-digits, just those that are present are used. Thus the sequence \e0\ex\e07
-specifies two binary zeros followed by a BEL character (code value 7). Make
+digits, just those that are present are used. Thus the sequence \e0\ex\e015
+specifies two binary zeros followed by a CR character (code value 13). Make
 sure you supply two digits after the initial zero if the pattern character that
 follows is itself an octal digit.
 .P
@ -3283,6 +3296,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 08 January 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 14 June 2015
+Copyright (c) 1997-2015 University of Cambridge.
 .fi
--- a/pcre/pcre_compile.c
+++ b/pcre/pcre_compile.c
@ -174,7 +174,7 @@ static const short int escapes[] = {
     -ESC_Z,                  CHAR_LEFT_SQUARE_BRACKET,
     CHAR_BACKSLASH,          CHAR_RIGHT_SQUARE_BRACKET,
     CHAR_CIRCUMFLEX_ACCENT,  CHAR_UNDERSCORE,
-     CHAR_GRAVE_ACCENT,       7,
+     CHAR_GRAVE_ACCENT,       ESC_a,
     -ESC_b,                  0,
     -ESC_d,                  ESC_e,
     ESC_f,                   0,
@ -202,9 +202,9 @@ static const short int escapes[] = {
 /*  68 */     0,     0,    '|',     ',',    '%',   '_',    '>',    '?',
 /*  70 */     0,     0,      0,       0,      0,     0,      0,      0,
 /*  78 */     0,   '`',    ':',     '#',    '@',  '\'',    '=',    '"',
-/*  80 */     0,     7, -ESC_b,       0, -ESC_d, ESC_e,  ESC_f,      0,
+/*  80 */     0, ESC_a, -ESC_b,       0, -ESC_d, ESC_e,  ESC_f,      0,
 /*  88 */-ESC_h,     0,      0,     '{',      0,     0,      0,      0,
-/*  90 */     0,     0, -ESC_k,     'l',      0, ESC_n,      0, -ESC_p,
+/*  90 */     0,     0, -ESC_k,       0,      0, ESC_n,      0, -ESC_p,
 /*  98 */     0, ESC_r,      0,     '}',      0,     0,      0,      0,
 /*  A0 */     0,   '~', -ESC_s, ESC_tee,      0,-ESC_v, -ESC_w,      0,
 /*  A8 */     0,-ESC_z,      0,       0,      0,   '[',      0,      0,
@ -219,6 +219,12 @@ static const short int escapes[] = {
 /*  F0 */     0,     0,      0,       0,      0,     0,      0,      0,
 /*  F8 */     0,     0,      0,       0,      0,     0,      0,      0
 };
+
+/* We also need a table of characters that may follow \c in an EBCDIC
+environment for characters 0-31. */
+
+static unsigned char ebcdic_escape_c[] = "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_";
+
 #endif


@ -458,7 +464,7 @@ static const char error_texts[] =
  "range out of order in character class\0"
  "nothing to repeat\0"
  /* 10 */
-  "operand of unlimited repeat could match the empty string\0"  /** DEAD **/
+  "internal error: invalid forward reference offset\0"
  "internal error: unexpected repeat\0"
  "unrecognized character after (? or (?-\0"
  "POSIX named classes are supported only within a class\0"
@ -527,7 +533,11 @@ static const char error_texts[] =
  "different names for subpatterns of the same number are not allowed\0"
  "(*MARK) must have an argument\0"
  "this version of PCRE is not compiled with Unicode property support\0"
+#ifndef EBCDIC
  "\\c must be followed by an ASCII character\0"
+#else
+  "\\c must be followed by a letter or one of [\\]^_?\0"
+#endif
  "\\k is not followed by a braced, angle-bracketed, or quoted name\0"
  /* 70 */
  "internal error: unknown opcode in find_fixedlength()\0"
@ -1425,7 +1435,16 @@ else
    c ^= 0x40;
 #else             /* EBCDIC coding */
    if (c >= CHAR_a && c <= CHAR_z) c += 64;
-    c ^= 0xC0;
+    if (c == CHAR_QUESTION_MARK)
+      c = ('\\' == 188 && '`' == 74)? 0x5f : 0xff;
+    else
+      {
+      for (i = 0; i < 32; i++)
+        {
+        if (c == ebcdic_escape_c[i]) break;
+        }
+      if (i < 32) c = i; else *errorcodeptr = ERR68;
+      }
 #endif
    break;

@ -1799,7 +1818,7 @@ for (;;)
    case OP_ASSERTBACK:
    case OP_ASSERTBACK_NOT:
    do cc += GET(cc, 1); while (*cc == OP_ALT);
-    cc += PRIV(OP_lengths)[*cc];
+    cc += 1 + LINK_SIZE;
    break;

    /* Skip over things that don't match chars */
@ -2487,7 +2506,7 @@ for (code = first_significant_code(code + PRIV(OP_lengths)[*code], TRUE);
  if (c == OP_BRA  || c == OP_BRAPOS ||
      c == OP_CBRA || c == OP_CBRAPOS ||
      c == OP_ONCE || c == OP_ONCE_NC ||
-      c == OP_COND)
+      c == OP_COND || c == OP_SCOND)
    {
    BOOL empty_branch;
    if (GET(code, 1) == 0) return TRUE;    /* Hit unclosed bracket */
@ -3886,11 +3905,11 @@ didn't consider this to be a POSIX class. Likewise for [:1234:].
 The problem in trying to be exactly like Perl is in the handling of escapes. We
 have to be sure that [abc[:x\]pqr] is *not* treated as containing a POSIX
 class, but [abc[:x\]pqr:]] is (so that an error can be generated). The code
-below handles the special case of \], but does not try to do any other escape
-processing. This makes it different from Perl for cases such as [:l\ower:]
-where Perl recognizes it as the POSIX class "lower" but PCRE does not recognize
-"l\ower". This is a lesser evil than not diagnosing bad classes when Perl does,
-I think.
+below handles the special cases \\ and \], but does not try to do any other
+escape processing. This makes it different from Perl for cases such as
+[:l\ower:] where Perl recognizes it as the POSIX class "lower" but PCRE does
+not recognize "l\ower". This is a lesser evil than not diagnosing bad classes
+when Perl does, I think.

 A user pointed out that PCRE was rejecting [:a[:digit:]] whereas Perl was not.
 It seems that the appearance of a nested POSIX class supersedes an apparent
@ -3917,21 +3936,16 @@ pcre_uchar terminator;          /* Don't combine these lines; the Solaris cc */
 terminator = *(++ptr);   /* compiler warns about "non-constant" initializer. */
 for (++ptr; *ptr != CHAR_NULL; ptr++)
  {
-  if (*ptr == CHAR_BACKSLASH && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
+  if (*ptr == CHAR_BACKSLASH &&
+      (ptr[1] == CHAR_RIGHT_SQUARE_BRACKET ||
+       ptr[1] == CHAR_BACKSLASH))
    ptr++;
-  else if (*ptr == CHAR_RIGHT_SQUARE_BRACKET) return FALSE;
-  else
+  else if ((*ptr == CHAR_LEFT_SQUARE_BRACKET && ptr[1] == terminator) ||
+            *ptr == CHAR_RIGHT_SQUARE_BRACKET) return FALSE;
+  else if (*ptr == terminator && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
    {
-    if (*ptr == terminator && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
-      {
-      *endptr = ptr;
-      return TRUE;
-      }
-    if (*ptr == CHAR_LEFT_SQUARE_BRACKET &&
-         (ptr[1] == CHAR_COLON || ptr[1] == CHAR_DOT ||
-          ptr[1] == CHAR_EQUALS_SIGN) &&
-        check_posix_syntax(ptr, endptr))
-      return FALSE;
+    *endptr = ptr;
+    return TRUE;
    }
  }
 return FALSE;
@ -3985,11 +3999,12 @@ have their offsets adjusted. That one of the jobs of this function. Before it
 is called, the partially compiled regex must be temporarily terminated with
 OP_END.

-This function has been extended with the possibility of forward references for
-recursions and subroutine calls. It must also check the list of such references
-for the group we are dealing with. If it finds that one of the recursions in
-the current group is on this list, it adjusts the offset in the list, not the
-value in the reference (which is a group number).
+This function has been extended to cope with forward references for recursions
+and subroutine calls. It must check the list of such references for the
+group we are dealing with. If it finds that one of the recursions in the
+current group is on this list, it does not adjust the value in the reference
+(which is a group number). After the group has been scanned, all the offsets in
+the forward reference list for the group are adjusted.

 Arguments:
  group      points to the start of the group
@ -4005,29 +4020,21 @@ static void
 adjust_recurse(pcre_uchar *group, int adjust, BOOL utf, compile_data *cd,
  size_t save_hwm_offset)
 {
+int offset;
+pcre_uchar *hc;
 pcre_uchar *ptr = group;

 while ((ptr = (pcre_uchar *)find_recurse(ptr, utf)) != NULL)
  {
-  int offset;
-  pcre_uchar *hc;
-
-  /* See if this recursion is on the forward reference list. If so, adjust the
-  reference. */
-
  for (hc = (pcre_uchar *)cd->start_workspace + save_hwm_offset; hc < cd->hwm;
       hc += LINK_SIZE)
    {
    offset = (int)GET(hc, 0);
-    if (cd->start_code + offset == ptr + 1)
-      {
-      PUT(hc, 0, offset + adjust);
-      break;
-      }
+    if (cd->start_code + offset == ptr + 1) break;
    }

-  /* Otherwise, adjust the recursion offset if it's after the start of this
-  group. */
+  /* If we have not found this recursion on the forward reference list, adjust
+  the recursion's offset if it's after the start of this group. */

  if (hc >= cd->hwm)
    {
@ -4037,6 +4044,15 @@ while ((ptr = (pcre_uchar *)find_recurse(ptr, utf)) != NULL)

  ptr += 1 + LINK_SIZE;
  }
+
+/* Now adjust all forward reference offsets for the group. */
+
+for (hc = (pcre_uchar *)cd->start_workspace + save_hwm_offset; hc < cd->hwm;
+     hc += LINK_SIZE)
+  {
+  offset = (int)GET(hc, 0);
+  PUT(hc, 0, offset + adjust);
+  }
 }


@ -4465,7 +4481,7 @@ const pcre_uchar *tempptr;
 const pcre_uchar *nestptr = NULL;
 pcre_uchar *previous = NULL;
 pcre_uchar *previous_callout = NULL;
-size_t save_hwm_offset = 0;
+size_t item_hwm_offset = 0;
 pcre_uint8 classbits[32];

 /* We can fish out the UTF-8 setting once and for all into a BOOL, but we
@ -4623,8 +4639,7 @@ for (;; ptr++)
  /* In the real compile phase, just check the workspace used by the forward
  reference list. */

-  else if (cd->hwm > cd->start_workspace + cd->workspace_size -
-           WORK_SIZE_SAFETY_MARGIN)
+  else if (cd->hwm > cd->start_workspace + cd->workspace_size)
    {
    *errorcodeptr = ERR52;
    goto FAILED;
@ -4767,6 +4782,7 @@ for (;; ptr++)
    zeroreqchar = reqchar;
    zeroreqcharflags = reqcharflags;
    previous = code;
+    item_hwm_offset = cd->hwm - cd->start_workspace;
    *code++ = ((options & PCRE_DOTALL) != 0)? OP_ALLANY: OP_ANY;
    break;

@ -4818,6 +4834,7 @@ for (;; ptr++)
    /* Handle a real character class. */

    previous = code;
+    item_hwm_offset = cd->hwm - cd->start_workspace;

    /* PCRE supports POSIX class stuff inside a class. Perl gives an error if
    they are encountered at the top level, so we'll do that too. */
@ -4923,9 +4940,10 @@ for (;; ptr++)
      (which is on the stack). We have to remember that there was XCLASS data,
      however. */

+      if (class_uchardata > class_uchardata_base) xclass = TRUE;
+
      if (lengthptr != NULL && class_uchardata > class_uchardata_base)
        {
-        xclass = TRUE;
        *lengthptr += (int)(class_uchardata - class_uchardata_base);
        class_uchardata = class_uchardata_base;
        }
@ -5028,10 +5046,26 @@ for (;; ptr++)
            ptr = tempptr + 1;
            continue;

-            /* For all other POSIX classes, no special action is taken in UCP
-            mode. Fall through to the non_UCP case. */
+            /* For the other POSIX classes (ascii, xdigit) we are going to fall
+            through to the non-UCP case and build a bit map for characters with
+            code points less than 256. If we are in a negated POSIX class
+            within a non-negated overall class, characters with code points
+            greater than 255 must all match. In the special case where we have
+            not yet generated any xclass data, and this is the final item in
+            the overall class, we need do nothing: later on, the opcode
+            OP_NCLASS will be used to indicate that characters greater than 255
+            are acceptable. If we have already seen an xclass item or one may
+            follow (we have to assume that it might if this is not the end of
+            the class), explicitly match all wide codepoints. */

            default:
+            if (!negate_class && local_negate &&
+                (xclass || tempptr[2] != CHAR_RIGHT_SQUARE_BRACKET))
+              {
+              *class_uchardata++ = XCL_RANGE;
+              class_uchardata += PRIV(ord2utf)(0x100, class_uchardata);
+              class_uchardata += PRIV(ord2utf)(0x10ffff, class_uchardata);
+              }
            break;
            }
          }
@ -5195,9 +5229,9 @@ for (;; ptr++)
              cd, PRIV(vspace_list));
            continue;

-#ifdef SUPPORT_UCP
            case ESC_p:
            case ESC_P:
+#ifdef SUPPORT_UCP
              {
              BOOL negated;
              unsigned int ptype = 0, pdata = 0;
@ -5211,6 +5245,9 @@ for (;; ptr++)
              class_has_8bitchar--;                /* Undo! */
              continue;
              }
+#else
+            *errorcodeptr = ERR45;
+            goto FAILED;
 #endif
            /* Unrecognized escapes are faulted if PCRE is running in its
            strict mode. By default, for compatibility with Perl, they are
@ -5367,16 +5404,20 @@ for (;; ptr++)
      CLASS_SINGLE_CHARACTER:
      if (class_one_char < 2) class_one_char++;

-      /* If class_one_char is 1, we have the first single character in the
-      class, and there have been no prior ranges, or XCLASS items generated by
-      escapes. If this is the final character in the class, we can optimize by
-      turning the item into a 1-character OP_CHAR[I] if it's positive, or
-      OP_NOT[I] if it's negative. In the positive case, it can cause firstchar
-      to be set. Otherwise, there can be no first char if this item is first,
-      whatever repeat count may follow. In the case of reqchar, save the
-      previous value for reinstating. */
+      /* If xclass_has_prop is false and class_one_char is 1, we have the first
+      single character in the class, and there have been no prior ranges, or
+      XCLASS items generated by escapes. If this is the final character in the
+      class, we can optimize by turning the item into a 1-character OP_CHAR[I]
+      if it's positive, or OP_NOT[I] if it's negative. In the positive case, it
+      can cause firstchar to be set. Otherwise, there can be no first char if
+      this item is first, whatever repeat count may follow. In the case of
+      reqchar, save the previous value for reinstating. */

-      if (!inescq && class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
+      if (!inescq &&
+#ifdef SUPPORT_UCP
+          !xclass_has_prop &&
+#endif
+          class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
        {
        ptr++;
        zeroreqchar = reqchar;
@ -5492,9 +5533,10 @@ for (;; ptr++)
    actual compiled code. */

 #ifdef SUPPORT_UTF
-    if (xclass && (!should_flip_negation || (options & PCRE_UCP) != 0))
+    if (xclass && (xclass_has_prop || !should_flip_negation ||
+        (options & PCRE_UCP) != 0))
 #elif !defined COMPILE_PCRE8
-    if (xclass && !should_flip_negation)
+    if (xclass && (xclass_has_prop || !should_flip_negation))
 #endif
 #if defined SUPPORT_UTF || !defined COMPILE_PCRE8
      {
@ -5930,7 +5972,7 @@ for (;; ptr++)
      {
      register int i;
      int len = (int)(code - previous);
-      size_t base_hwm_offset = save_hwm_offset;
+      size_t base_hwm_offset = item_hwm_offset;
      pcre_uchar *bralink = NULL;
      pcre_uchar *brazeroptr = NULL;

@ -5985,7 +6027,7 @@ for (;; ptr++)
        if (repeat_max <= 1)    /* Covers 0, 1, and unlimited */
          {
          *code = OP_END;
-          adjust_recurse(previous, 1, utf, cd, save_hwm_offset);
+          adjust_recurse(previous, 1, utf, cd, item_hwm_offset);
          memmove(previous + 1, previous, IN_UCHARS(len));
          code++;
          if (repeat_max == 0)
@ -6009,7 +6051,7 @@ for (;; ptr++)
          {
          int offset;
          *code = OP_END;
-          adjust_recurse(previous, 2 + LINK_SIZE, utf, cd, save_hwm_offset);
+          adjust_recurse(previous, 2 + LINK_SIZE, utf, cd, item_hwm_offset);
          memmove(previous + 2 + LINK_SIZE, previous, IN_UCHARS(len));
          code += 2 + LINK_SIZE;
          *previous++ = OP_BRAZERO + repeat_type;
@ -6254,6 +6296,12 @@ for (;; ptr++)
            while (*scode == OP_ALT);
            }

+          /* A conditional group with only one branch has an implicit empty
+          alternative branch. */
+
+          if (*bracode == OP_COND && bracode[GET(bracode,1)] != OP_ALT)
+            *bracode = OP_SCOND;
+
          /* Handle possessive quantifiers. */

          if (possessive_quantifier)
@ -6267,11 +6315,11 @@ for (;; ptr++)
              {
              int nlen = (int)(code - bracode);
              *code = OP_END;
-              adjust_recurse(bracode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
+              adjust_recurse(bracode, 1 + LINK_SIZE, utf, cd, item_hwm_offset);
              memmove(bracode + 1 + LINK_SIZE, bracode, IN_UCHARS(nlen));
              code += 1 + LINK_SIZE;
              nlen += 1 + LINK_SIZE;
-              *bracode = OP_BRAPOS;
+              *bracode = (*bracode == OP_COND)? OP_BRAPOS : OP_SBRAPOS;
              *code++ = OP_KETRPOS;
              PUTINC(code, 0, nlen);
              PUT(bracode, 1, nlen);
@ -6401,7 +6449,7 @@ for (;; ptr++)
        else
          {
          *code = OP_END;
-          adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
+          adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, item_hwm_offset);
          memmove(tempcode + 1 + LINK_SIZE, tempcode, IN_UCHARS(len));
          code += 1 + LINK_SIZE;
          len += 1 + LINK_SIZE;
@ -6450,7 +6498,7 @@ for (;; ptr++)

        default:
        *code = OP_END;
-        adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
+        adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, item_hwm_offset);
        memmove(tempcode + 1 + LINK_SIZE, tempcode, IN_UCHARS(len));
        code += 1 + LINK_SIZE;
        len += 1 + LINK_SIZE;
@ -6586,9 +6634,17 @@ for (;; ptr++)
              goto FAILED;
              }
            setverb = *code++ = verbs[i].op_arg;
-            *code++ = arglen;
-            memcpy(code, arg, IN_UCHARS(arglen));
-            code += arglen;
+            if (lengthptr != NULL)    /* In pass 1 just add in the length */
+              {                       /* to avoid potential workspace */
+              *lengthptr += arglen;   /* overflow. */
+              *code++ = 0;
+              }
+            else
+              {
+              *code++ = arglen;
+              memcpy(code, arg, IN_UCHARS(arglen));
+              code += arglen;
+              }
            *code++ = 0;
            }

@ -6623,7 +6679,7 @@ for (;; ptr++)
    newoptions = options;
    skipbytes = 0;
    bravalue = OP_CBRA;
-    save_hwm_offset = cd->hwm - cd->start_workspace;
+    item_hwm_offset = cd->hwm - cd->start_workspace;
    reset_bracount = FALSE;

    /* Deal with the extended parentheses; all are introduced by '?', and the
@ -6641,6 +6697,7 @@ for (;; ptr++)
        /* ------------------------------------------------------------ */
        case CHAR_VERTICAL_LINE:  /* Reset capture count for each branch */
        reset_bracount = TRUE;
+        cd->dupgroups = TRUE;     /* Record (?| encountered */
        /* Fall through */

        /* ------------------------------------------------------------ */
@ -6741,6 +6798,12 @@ for (;; ptr++)
          {
          while (IS_DIGIT(*ptr))
            {
+            if (recno > INT_MAX / 10 - 1)  /* Integer overflow */
+              {
+              while (IS_DIGIT(*ptr)) ptr++;
+              *errorcodeptr = ERR61;
+              goto FAILED;
+              }
            recno = recno * 10 + (int)(*ptr - CHAR_0);
            ptr++;
            }
@ -6769,7 +6832,7 @@ for (;; ptr++)
            ptr++;
            }
          namelen = (int)(ptr - name);
-          if (lengthptr != NULL) *lengthptr += IMM2_SIZE;
+          if (lengthptr != NULL) skipbytes += IMM2_SIZE;
          }

        /* Check the terminator */
@ -6875,6 +6938,11 @@ for (;; ptr++)
              *errorcodeptr = ERR15;
              goto FAILED;
              }
+            if (recno > INT_MAX / 10 - 1)   /* Integer overflow */
+              {
+              *errorcodeptr = ERR61;
+              goto FAILED;
+              }
            recno = recno * 10 + name[i] - CHAR_0;
            }
          if (recno == 0) recno = RREF_ANY;
@ -7151,6 +7219,7 @@ for (;; ptr++)
        if (lengthptr != NULL)
          {
          named_group *ng;
+          recno = 0;

          if (namelen == 0)
            {
@ -7168,20 +7237,6 @@ for (;; ptr++)
            goto FAILED;
            }

-          /* The name table does not exist in the first pass; instead we must
-          scan the list of names encountered so far in order to get the
-          number. If the name is not found, set the value to 0 for a forward
-          reference. */
-
-          ng = cd->named_groups;
-          for (i = 0; i < cd->names_found; i++, ng++)
-            {
-            if (namelen == ng->length &&
-                STRNCMP_UC_UC(name, ng->name, namelen) == 0)
-              break;
-            }
-          recno = (i < cd->names_found)? ng->number : 0;
-
          /* Count named back references. */

          if (!is_recurse) cd->namedrefcount++;
@ -7191,6 +7246,56 @@ for (;; ptr++)
          16-bit data item. */

          *lengthptr += IMM2_SIZE;
+
+          /* If this is a forward reference and we are within a (?|...) group,
+          the reference may end up as the number of a group which we are
+          currently inside, that is, it could be a recursive reference. In the
+          real compile this will be picked up and the reference wrapped with
+          OP_ONCE to make it atomic, so we must space in case this occurs. */
+
+          /* In fact, this can happen for a non-forward reference because
+          another group with the same number might be created later. This
+          issue is fixed "properly" in PCRE2. As PCRE1 is now in maintenance
+          only mode, we finesse the bug by allowing more memory always. */
+
+          *lengthptr += 2 + 2*LINK_SIZE;
+
+          /* It is even worse than that. The current reference may be to an
+          existing named group with a different number (so apparently not
+          recursive) but which later on is also attached to a group with the
+          current number. This can only happen if $(| has been previous
+          encountered. In that case, we allow yet more memory, just in case.
+          (Again, this is fixed "properly" in PCRE2. */
+
+          if (cd->dupgroups) *lengthptr += 4 + 4*LINK_SIZE;
+
+          /* Otherwise, check for recursion here. The name table does not exist
+          in the first pass; instead we must scan the list of names encountered
+          so far in order to get the number. If the name is not found, leave
+          the value of recno as 0 for a forward reference. */
+
+          else
+            {
+            ng = cd->named_groups;
+            for (i = 0; i < cd->names_found; i++, ng++)
+              {
+              if (namelen == ng->length &&
+                  STRNCMP_UC_UC(name, ng->name, namelen) == 0)
+                {
+                open_capitem *oc;
+                recno = ng->number;
+                if (is_recurse) break;
+                for (oc = cd->open_caps; oc != NULL; oc = oc->next)
+                  {
+                  if (oc->number == recno)
+                    {
+                    oc->flag = TRUE;
+                    break;
+                    }
+                  }
+                }
+              }
+            }
          }

        /* In the real compile, search the name table. We check the name
@ -7237,8 +7342,6 @@ for (;; ptr++)
          for (i++; i < cd->names_found; i++)
            {
            if (STRCMP_UC_UC(slot + IMM2_SIZE, cslot + IMM2_SIZE) != 0) break;
-
-
            count++;
            cslot += cd->name_entry_size;
            }
@ -7247,6 +7350,7 @@ for (;; ptr++)
            {
            if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE;
            previous = code;
+            item_hwm_offset = cd->hwm - cd->start_workspace;
            *code++ = ((options & PCRE_CASELESS) != 0)? OP_DNREFI : OP_DNREF;
            PUT2INC(code, 0, index);
            PUT2INC(code, 0, count);
@ -7284,9 +7388,14 @@ for (;; ptr++)


        /* ------------------------------------------------------------ */
-        case CHAR_R:              /* Recursion */
-        ptr++;                    /* Same as (?0)      */
-        /* Fall through */
+        case CHAR_R:              /* Recursion, same as (?0) */
+        recno = 0;
+        if (*(++ptr) != CHAR_RIGHT_PARENTHESIS)
+          {
+          *errorcodeptr = ERR29;
+          goto FAILED;
+          }
+        goto HANDLE_RECURSION;


        /* ------------------------------------------------------------ */
@ -7323,7 +7432,15 @@ for (;; ptr++)

          recno = 0;
          while(IS_DIGIT(*ptr))
+            {
+            if (recno > INT_MAX / 10 - 1) /* Integer overflow */
+              {
+              while (IS_DIGIT(*ptr)) ptr++;
+              *errorcodeptr = ERR61;
+              goto FAILED;
+              }
            recno = recno * 10 + *ptr++ - CHAR_0;
+            }

          if (*ptr != (pcre_uchar)terminator)
            {
@ -7360,6 +7477,7 @@ for (;; ptr++)
          HANDLE_RECURSION:

          previous = code;
+          item_hwm_offset = cd->hwm - cd->start_workspace;
          called = cd->start_code;

          /* When we are actually compiling, find the bracket that is being
@ -7561,7 +7679,11 @@ for (;; ptr++)
      previous = NULL;
      cd->iscondassert = FALSE;
      }
-    else previous = code;
+    else
+      {
+      previous = code;
+      item_hwm_offset = cd->hwm - cd->start_workspace;
+      }

    *code = bravalue;
    tempcode = code;
@ -7809,7 +7931,7 @@ for (;; ptr++)
        const pcre_uchar *p;
        pcre_uint32 cf;

-        save_hwm_offset = cd->hwm - cd->start_workspace;   /* Normally this is set when '(' is read */
+        item_hwm_offset = cd->hwm - cd->start_workspace;   /* Normally this is set when '(' is read */
        terminator = (*(++ptr) == CHAR_LESS_THAN_SIGN)?
          CHAR_GREATER_THAN_SIGN : CHAR_APOSTROPHE;

@ -7838,7 +7960,7 @@ for (;; ptr++)
        if (*p != (pcre_uchar)terminator)
          {
          *errorcodeptr = ERR57;
-          break;
+          goto FAILED;
          }
        ptr++;
        goto HANDLE_NUMERICAL_RECURSION;
@ -7853,7 +7975,7 @@ for (;; ptr++)
          ptr[1] != CHAR_APOSTROPHE && ptr[1] != CHAR_LEFT_CURLY_BRACKET))
          {
          *errorcodeptr = ERR69;
-          break;
+          goto FAILED;
          }
        is_recurse = FALSE;
        terminator = (*(++ptr) == CHAR_LESS_THAN_SIGN)?
@ -7877,6 +7999,7 @@ for (;; ptr++)
        HANDLE_REFERENCE:
        if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE;
        previous = code;
+        item_hwm_offset = cd->hwm - cd->start_workspace;
        *code++ = ((options & PCRE_CASELESS) != 0)? OP_REFI : OP_REF;
        PUT2INC(code, 0, recno);
        cd->backref_map |= (recno < 32)? (1 << recno) : 1;
@ -7906,6 +8029,7 @@ for (;; ptr++)
        if (!get_ucp(&ptr, &negated, &ptype, &pdata, errorcodeptr))
          goto FAILED;
        previous = code;
+        item_hwm_offset = cd->hwm - cd->start_workspace;
        *code++ = ((escape == ESC_p) != negated)? OP_PROP : OP_NOTPROP;
        *code++ = ptype;
        *code++ = pdata;
@ -7946,6 +8070,7 @@ for (;; ptr++)

          {
          previous = (escape > ESC_b && escape < ESC_Z)? code : NULL;
+          item_hwm_offset = cd->hwm - cd->start_workspace;
          *code++ = (!utf && escape == ESC_C)? OP_ALLANY : escape;
          }
        }
@ -7989,6 +8114,7 @@ for (;; ptr++)

    ONE_CHAR:
    previous = code;
+    item_hwm_offset = cd->hwm - cd->start_workspace;

    /* For caseless UTF-8 mode when UCP support is available, check whether
    this character has more than one other case. If so, generate a special
@ -9164,6 +9290,7 @@ cd->names_found = 0;
 cd->name_entry_size = 0;
 cd->name_table = NULL;
 cd->dupnames = FALSE;
+cd->dupgroups = FALSE;
 cd->namedrefcount = 0;
 cd->start_code = cworkspace;
 cd->hwm = cworkspace;
@ -9336,6 +9463,16 @@ if (cd->hwm > cd->start_workspace)
    int offset, recno;
    cd->hwm -= LINK_SIZE;
    offset = GET(cd->hwm, 0);
+
+    /* Check that the hwm handling hasn't gone wrong. This whole area is
+    rewritten in PCRE2 because there are some obscure cases. */
+
+    if (offset == 0 || codestart[offset-1] != OP_RECURSE)
+      {
+      errorcode = ERR10;
+      break;
+      }
+
    recno = GET(codestart, offset);
    if (recno != prev_recno)
      {
@ -9366,7 +9503,7 @@ used in this code because at least one compiler gives a warning about loss of
 "const" attribute if the cast (pcre_uchar *)codestart is used directly in the
 function call. */

-if ((options & PCRE_NO_AUTO_POSSESS) == 0)
+if (errorcode == 0 && (options & PCRE_NO_AUTO_POSSESS) == 0)
  {
  pcre_uchar *temp = (pcre_uchar *)codestart;
  auto_possessify(temp, utf, cd);
@ -9380,7 +9517,7 @@ OP_RECURSE that are not fixed length get a diagnosic with a useful offset. The
 exceptional ones forgo this. We scan the pattern to check that they are fixed
 length, and set their lengths. */

-if (cd->check_lookbehind)
+if (errorcode == 0 && cd->check_lookbehind)
  {
  pcre_uchar *cc = (pcre_uchar *)codestart;

@ -9593,4 +9730,3 @@ return (pcre32 *)re;
 }

 /* End of pcre_compile.c */
-
--- a/pcre/pcre_exec.c
+++ b/pcre/pcre_exec.c
@ -6685,7 +6685,8 @@ if (md->offset_vector != NULL)
  register int *iend = iptr - re->top_bracket;
  if (iend < md->offset_vector + 2) iend = md->offset_vector + 2;
  while (--iptr >= iend) *iptr = -1;
-  md->offset_vector[0] = md->offset_vector[1] = -1;
+  if (offsetcount > 0) md->offset_vector[0] = -1;
+  if (offsetcount > 1) md->offset_vector[1] = -1;
  }

 /* Set up the first character to match, if available. The first_char value is
--- a/pcre/pcre_internal.h
+++ b/pcre/pcre_internal.h
@ -984,7 +984,7 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
 #ifndef EBCDIC

 #define HSPACE_LIST \
-  CHAR_HT, CHAR_SPACE, 0xa0, \
+  CHAR_HT, CHAR_SPACE, CHAR_NBSP, \
  0x1680, 0x180e, 0x2000, 0x2001, 0x2002, 0x2003, 0x2004, 0x2005, \
  0x2006, 0x2007, 0x2008, 0x2009, 0x200A, 0x202f, 0x205f, 0x3000, \
  NOTACHAR
@ -1010,7 +1010,7 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
 #define HSPACE_BYTE_CASES \
  case CHAR_HT: \
  case CHAR_SPACE: \
-  case 0xa0     /* NBSP */
+  case CHAR_NBSP

 #define HSPACE_CASES \
  HSPACE_BYTE_CASES: \
@ -1037,11 +1037,12 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
 /* ------ EBCDIC environments ------ */

 #else
-#define HSPACE_LIST CHAR_HT, CHAR_SPACE
+#define HSPACE_LIST CHAR_HT, CHAR_SPACE, CHAR_NBSP, NOTACHAR

 #define HSPACE_BYTE_CASES \
  case CHAR_HT: \
-  case CHAR_SPACE
+  case CHAR_SPACE: \
+  case CHAR_NBSP

 #define HSPACE_CASES HSPACE_BYTE_CASES

@ -1215,6 +1216,7 @@ same code point. */

 #define CHAR_ESC                    '\047'
 #define CHAR_DEL                    '\007'
+#define CHAR_NBSP                   '\x41'
 #define STR_ESC                     "\047"
 #define STR_DEL                     "\007"

@ -1229,6 +1231,7 @@ a positive value. */
 #define CHAR_NEL                    ((unsigned char)'\x85')
 #define CHAR_ESC                    '\033'
 #define CHAR_DEL                    '\177'
+#define CHAR_NBSP                   ((unsigned char)'\xa0')

 #define STR_LF                      "\n"
 #define STR_NL                      STR_LF
@ -1606,6 +1609,7 @@ only. */
 #define CHAR_VERTICAL_LINE          '\174'
 #define CHAR_RIGHT_CURLY_BRACKET    '\175'
 #define CHAR_TILDE                  '\176'
+#define CHAR_NBSP                   ((unsigned char)'\xa0')

 #define STR_HT                      "\011"
 #define STR_VT                      "\013"
@ -1762,6 +1766,10 @@ only. */

 /* Escape items that are just an encoding of a particular data value. */

+#ifndef ESC_a
+#define ESC_a CHAR_BEL
+#endif
+
 #ifndef ESC_e
 #define ESC_e CHAR_ESC
 #endif
@ -2446,6 +2454,7 @@ typedef struct compile_data {
  BOOL had_pruneorskip;             /* (*PRUNE) or (*SKIP) encountered */
  BOOL check_lookbehind;            /* Lookbehinds need later checking */
  BOOL dupnames;                    /* Duplicate names exist */
+  BOOL dupgroups;                   /* Duplicate groups exist: (?| found */
  BOOL iscondassert;                /* Next assert is a condition */
  int  nltype;                      /* Newline type */
  int  nllen;                       /* Newline string length */
--- a/pcre/pcre_jit_compile.c
+++ b/pcre/pcre_jit_compile.c
@ -1064,6 +1064,7 @@ pcre_uchar *alternative;
 pcre_uchar *end = NULL;
 int private_data_ptr = *private_data_start;
 int space, size, bracketlen;
+BOOL repeat_check = TRUE;

 while (cc < ccend)
  {
@ -1071,9 +1072,10 @@ while (cc < ccend)
  size = 0;
  bracketlen = 0;
  if (private_data_ptr > SLJIT_MAX_LOCAL_SIZE)
-    return;
+    break;

-  if (*cc == OP_ONCE || *cc == OP_ONCE_NC || *cc == OP_BRA || *cc == OP_CBRA || *cc == OP_COND)
+  if (repeat_check && (*cc == OP_ONCE || *cc == OP_ONCE_NC || *cc == OP_BRA || *cc == OP_CBRA || *cc == OP_COND))
+    {
    if (detect_repeat(common, cc))
      {
      /* These brackets are converted to repeats, so no global
@ -1081,6 +1083,8 @@ while (cc < ccend)
      if (cc >= end)
        end = bracketend(cc);
      }
+    }
+  repeat_check = TRUE;

  switch(*cc)
    {
@ -1136,6 +1140,13 @@ while (cc < ccend)
    bracketlen = 1 + LINK_SIZE + IMM2_SIZE;
    break;

+    case OP_BRAZERO:
+    case OP_BRAMINZERO:
+    case OP_BRAPOSZERO:
+    repeat_check = FALSE;
+    size = 1;
+    break;
+
    CASE_ITERATOR_PRIVATE_DATA_1
    space = 1;
    size = -2;
@ -1162,12 +1173,17 @@ while (cc < ccend)
    size = 1;
    break;

-    CASE_ITERATOR_TYPE_PRIVATE_DATA_2B
+    case OP_TYPEUPTO:
    if (cc[1 + IMM2_SIZE] != OP_ANYNL && cc[1 + IMM2_SIZE] != OP_EXTUNI)
      space = 2;
    size = 1 + IMM2_SIZE;
    break;

+    case OP_TYPEMINUPTO:
+    space = 2;
+    size = 1 + IMM2_SIZE;
+    break;
+
    case OP_CLASS:
    case OP_NCLASS:
    size += 1 + 32 / sizeof(pcre_uchar);
@ -1316,6 +1332,13 @@ while (cc < ccend)
    cc += 1 + LINK_SIZE + IMM2_SIZE;
    break;

+    case OP_THEN:
+    stack_restore = TRUE;
+    if (common->control_head_ptr != 0)
+      *needs_control_head = TRUE;
+    cc ++;
+    break;
+
    default:
    stack_restore = TRUE;
    /* Fall through. */
@ -2220,6 +2243,7 @@ while (current != NULL)
    SLJIT_ASSERT_STOP();
    break;
    }
+  SLJIT_ASSERT(current > (sljit_sw*)current[-1]);
  current = (sljit_sw*)current[-1];
  }
 return -1;
@ -3209,7 +3233,7 @@ bytes[len] = byte;
 bytes[0] = len;
 }

-static int scan_prefix(compiler_common *common, pcre_uchar *cc, pcre_uint32 *chars, pcre_uint8 *bytes, int max_chars)
+static int scan_prefix(compiler_common *common, pcre_uchar *cc, pcre_uint32 *chars, pcre_uint8 *bytes, int max_chars, pcre_uint32 *rec_count)
 {
 /* Recursive function, which scans prefix literals. */
 BOOL last, any, caseless;
@ -3227,9 +3251,14 @@ pcre_uchar othercase[1];
 repeat = 1;
 while (TRUE)
  {
+  if (*rec_count == 0)
+    return 0;
+  (*rec_count)--;
+
  last = TRUE;
  any = FALSE;
  caseless = FALSE;
+
  switch (*cc)
    {
    case OP_CHARI:
@ -3291,7 +3320,7 @@ while (TRUE)
 #ifdef SUPPORT_UTF
    if (common->utf && HAS_EXTRALEN(*cc)) len += GET_EXTRALEN(*cc);
 #endif
-    max_chars = scan_prefix(common, cc + len, chars, bytes, max_chars);
+    max_chars = scan_prefix(common, cc + len, chars, bytes, max_chars, rec_count);
    if (max_chars == 0)
      return consumed;
    last = FALSE;
@ -3314,7 +3343,7 @@ while (TRUE)
    alternative = cc + GET(cc, 1);
    while (*alternative == OP_ALT)
      {
-      max_chars = scan_prefix(common, alternative + 1 + LINK_SIZE, chars, bytes, max_chars);
+      max_chars = scan_prefix(common, alternative + 1 + LINK_SIZE, chars, bytes, max_chars, rec_count);
      if (max_chars == 0)
        return consumed;
      alternative += GET(alternative, 1);
@ -3556,6 +3585,7 @@ int i, max, from;
 int range_right = -1, range_len = 3 - 1;
 sljit_ub *update_table = NULL;
 BOOL in_range;
+pcre_uint32 rec_count;

 for (i = 0; i < MAX_N_CHARS; i++)
  {
@ -3564,7 +3594,8 @@ for (i = 0; i < MAX_N_CHARS; i++)
  bytes[i * MAX_N_BYTES] = 0;
  }

-max = scan_prefix(common, common->start, chars, bytes, MAX_N_CHARS);
+rec_count = 10000;
+max = scan_prefix(common, common->start, chars, bytes, MAX_N_CHARS, &rec_count);

 if (max <= 1)
  return FALSE;
@ -4311,8 +4342,10 @@ switch(length)
  case 4:
  if ((ranges[1] - ranges[0]) == (ranges[3] - ranges[2])
      && (ranges[0] | (ranges[2] - ranges[0])) == ranges[2]
+      && (ranges[1] & (ranges[2] - ranges[0])) == 0
      && is_powerof2(ranges[2] - ranges[0]))
    {
+    SLJIT_ASSERT((ranges[0] & (ranges[2] - ranges[0])) == 0 && (ranges[2] & ranges[3] & (ranges[2] - ranges[0])) != 0);
    OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, ranges[2] - ranges[0]);
    if (ranges[2] + 1 != ranges[3])
      {
@ -4900,9 +4933,10 @@ else if ((cc[-1] & XCL_MAP) != 0)
  if (!check_class_ranges(common, (const pcre_uint8 *)cc, FALSE, TRUE, list))
    {
 #ifdef COMPILE_PCRE8
-    SLJIT_ASSERT(common->utf);
+    jump = NULL;
+    if (common->utf)
 #endif
-    jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255);
+      jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255);

    OP2(SLJIT_AND, TMP2, 0, TMP1, 0, SLJIT_IMM, 0x7);
    OP2(SLJIT_LSHR, TMP1, 0, TMP1, 0, SLJIT_IMM, 3);
@ -4911,7 +4945,10 @@ else if ((cc[-1] & XCL_MAP) != 0)
    OP2(SLJIT_AND | SLJIT_SET_E, SLJIT_UNUSED, 0, TMP1, 0, TMP2, 0);
    add_jump(compiler, list, JUMP(SLJIT_NOT_ZERO));

-    JUMPHERE(jump);
+#ifdef COMPILE_PCRE8
+    if (common->utf)
+#endif
+      JUMPHERE(jump);
    }

  OP1(SLJIT_MOV, TMP1, 0, TMP3, 0);
@ -5219,7 +5256,7 @@ while (*cc != XCL_END)
      OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_UNUSED, 0, SLJIT_LESS_EQUAL);

      SET_CHAR_OFFSET(0);
-      OP2(SLJIT_SUB | SLJIT_SET_U, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0xff);
+      OP2(SLJIT_SUB | SLJIT_SET_U, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x7f);
      OP_FLAGS(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_LESS_EQUAL);

      SET_TYPE_OFFSET(ucp_Pc);
@ -7665,6 +7702,10 @@ while (*cc != OP_KETRPOS)
      OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), STR_PTR, 0);
      }

+    /* Even if the match is empty, we need to reset the control head. */
+    if (needs_control_head)
+      OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
+
    if (opcode == OP_SBRAPOS || opcode == OP_SCBRAPOS)
      add_jump(compiler, &emptymatch, CMP(SLJIT_EQUAL, TMP1, 0, STR_PTR, 0));

@ -7692,6 +7733,10 @@ while (*cc != OP_KETRPOS)
      OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), (framesize + 1) * sizeof(sljit_sw), STR_PTR, 0);
      }

+    /* Even if the match is empty, we need to reset the control head. */
+    if (needs_control_head)
+      OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
+
    if (opcode == OP_SBRAPOS || opcode == OP_SCBRAPOS)
      add_jump(compiler, &emptymatch, CMP(SLJIT_EQUAL, TMP1, 0, STR_PTR, 0));

@ -7704,9 +7749,6 @@ while (*cc != OP_KETRPOS)
      }
    }

-  if (needs_control_head)
-    OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
-
  JUMPTO(SLJIT_JUMP, loop);
  flush_stubs(common);

@ -8441,8 +8483,7 @@ while (cc < ccend)
      OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(1), STR_PTR, 0);
      }
    BACKTRACK_AS(braminzero_backtrack)->matchingpath = LABEL();
-    if (cc[1] > OP_ASSERTBACK_NOT)
-      count_match(common);
+    count_match(common);
    break;

    case OP_ONCE:
@ -9624,7 +9665,7 @@ static SLJIT_INLINE void compile_recurse(compiler_common *common)
 DEFINE_COMPILER;
 pcre_uchar *cc = common->start + common->currententry->start;
 pcre_uchar *ccbegin = cc + 1 + LINK_SIZE + (*cc == OP_BRA ? 0 : IMM2_SIZE);
-pcre_uchar *ccend = bracketend(cc);
+pcre_uchar *ccend = bracketend(cc) - (1 + LINK_SIZE);
 BOOL needs_control_head;
 int framesize = get_framesize(common, cc, NULL, TRUE, &needs_control_head);
 int private_data_size = get_private_data_copy_length(common, ccbegin, ccend, needs_control_head);
@ -9648,6 +9689,7 @@ set_jumps(common->currententry->calls, common->currententry->entry);

 sljit_emit_fast_enter(compiler, TMP2, 0);
 allocate_stack(common, private_data_size + framesize + alternativesize);
+count_match(common);
 OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(private_data_size + framesize + alternativesize - 1), TMP2, 0);
 copy_private_data(common, ccbegin, ccend, TRUE, private_data_size + framesize + alternativesize, framesize + alternativesize, needs_control_head);
 if (needs_control_head)
@ -9992,6 +10034,7 @@ OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, stack));
 OP1(SLJIT_MOV_UI, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, limit_match));
 OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(struct sljit_stack, base));
 OP1(SLJIT_MOV, STACK_LIMIT, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(struct sljit_stack, limit));
+OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 1);
 OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LIMIT_MATCH, TMP1, 0);

 if (mode == JIT_PARTIAL_SOFT_COMPILE)
--- a/pcre/pcre_jit_test.c
+++ b/pcre/pcre_jit_test.c
@ -182,6 +182,7 @@ static struct regression_test_case regression_test_cases[] = {
 	{ CMUAP, 0, "\xf0\x90\x90\x80{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" },
 	{ CMUAP, 0, "\xf0\x90\x90\xa8{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" },
 	{ CMUAP, 0, "\xe1\xbd\xb8\xe1\xbf\xb8", "\xe1\xbf\xb8\xe1\xbd\xb8" },
+	{ MA, 0, "[3-57-9]", "5" },

 	/* Assertions. */
 	{ MUA, 0, "\\b[^A]", "A_B#" },
--- a/pcre/pcre_study.c
+++ b/pcre/pcre_study.c
@ -71,6 +71,7 @@ Arguments:
  startcode       pointer to start of the whole pattern's code
  options         the compiling options
  recurses        chain of recurse_check to catch mutual recursion
+  countptr        pointer to call count (to catch over complexity)

 Returns:   the minimum length
           -1 if \C in UTF-8 mode or (*ACCEPT) was encountered
@ -80,7 +81,8 @@ Returns:   the minimum length

 static int
 find_minlength(const REAL_PCRE *re, const pcre_uchar *code,
-  const pcre_uchar *startcode, int options, recurse_check *recurses)
+  const pcre_uchar *startcode, int options, recurse_check *recurses,
+  int *countptr)
 {
 int length = -1;
 /* PCRE_UTF16 has the same value as PCRE_UTF8. */
@ -90,6 +92,8 @@ recurse_check this_recurse;
 register int branchlength = 0;
 register pcre_uchar *cc = (pcre_uchar *)code + 1 + LINK_SIZE;

+if ((*countptr)++ > 1000) return -1;   /* too complex */
+
 if (*code == OP_CBRA || *code == OP_SCBRA ||
    *code == OP_CBRAPOS || *code == OP_SCBRAPOS) cc += IMM2_SIZE;

@ -131,7 +135,7 @@ for (;;)
    case OP_SBRAPOS:
    case OP_ONCE:
    case OP_ONCE_NC:
-    d = find_minlength(re, cc, startcode, options, recurses);
+    d = find_minlength(re, cc, startcode, options, recurses, countptr);
    if (d < 0) return d;
    branchlength += d;
    do cc += GET(cc, 1); while (*cc == OP_ALT);
@ -415,7 +419,8 @@ for (;;)
            int dd;
            this_recurse.prev = recurses;
            this_recurse.group = cs;
-            dd = find_minlength(re, cs, startcode, options, &this_recurse);
+            dd = find_minlength(re, cs, startcode, options, &this_recurse,
+              countptr);
            if (dd < d) d = dd;
            }
          }
@ -451,7 +456,8 @@ for (;;)
          {
          this_recurse.prev = recurses;
          this_recurse.group = cs;
-          d = find_minlength(re, cs, startcode, options, &this_recurse);
+          d = find_minlength(re, cs, startcode, options, &this_recurse,
+            countptr);
          }
        }
      }
@ -514,7 +520,7 @@ for (;;)
        this_recurse.prev = recurses;
        this_recurse.group = cs;
        branchlength += find_minlength(re, cs, startcode, options,
-          &this_recurse);
+          &this_recurse, countptr);
        }
      }
    cc += 1 + LINK_SIZE;
@ -1453,6 +1459,7 @@ pcre32_study(const pcre32 *external_re, int options, const char **errorptr)
 #endif
 {
 int min;
+int count = 0;
 BOOL bits_set = FALSE;
 pcre_uint8 start_bits[32];
 PUBL(extra) *extra = NULL;
@ -1539,7 +1546,7 @@ if ((re->options & PCRE_ANCHORED) == 0 &&

 /* Find the minimum length of subject string. */

-switch(min = find_minlength(re, code, code, re->options, NULL))
+switch(min = find_minlength(re, code, code, re->options, NULL, &count))
  {
  case -2: *errorptr = "internal error: missing capturing bracket"; return NULL;
  case -3: *errorptr = "internal error: opcode not recognized"; return NULL;
--- a/pcre/pcre_xclass.c
+++ b/pcre/pcre_xclass.c
@ -246,7 +246,7 @@ while ((t = *data++) != XCL_END)

      case PT_PXPUNCT:
      if ((PRIV(ucp_gentype)[prop->chartype] == ucp_P ||
-            (c < 256 && PRIV(ucp_gentype)[prop->chartype] == ucp_S)) == isprop)
+            (c < 128 && PRIV(ucp_gentype)[prop->chartype] == ucp_S)) == isprop)
        return !negated;
      break;

--- a/pcre/pcregrep.c
+++ b/pcre/pcregrep.c
@ -1692,9 +1692,13 @@ while (ptr < endptr)

    if (filenames == FN_NOMATCH_ONLY) return 1;

+    /* If all we want is a yes/no answer, stop now. */
+
+    if (quiet) return 0;
+
    /* Just count if just counting is wanted. */

-    if (count_only) count++;
+    else if (count_only) count++;

    /* When handling a binary file and binary-files==binary, the "binary"
    variable will be set true (it's false in all other cases). In this
@ -1715,10 +1719,6 @@ while (ptr < endptr)
      return 0;
      }

-    /* Likewise, if all we want is a yes/no answer. */
-
-    else if (quiet) return 0;
-
    /* The --only-matching option prints just the substring that matched,
    and/or one or more captured portions of it, as long as these strings are
    not empty. The --file-offsets and --line-offsets options output offsets for
@ -2089,7 +2089,7 @@ if (filenames == FN_NOMATCH_ONLY)

 /* Print the match count if wanted */

-if (count_only)
+if (count_only && !quiet)
  {
  if (count > 0 || !omit_zero_count)
    {
--- a/pcre/pcretest.c
+++ b/pcre/pcretest.c
@ -4621,9 +4621,9 @@ while (!done)

      else switch ((c = *p++))
        {
-        case 'a': c =    7; break;
+        case 'a': c =  CHAR_BEL; break;
        case 'b': c = '\b'; break;
-        case 'e': c =   27; break;
+        case 'e': c =  CHAR_ESC; break;
        case 'f': c = '\f'; break;
        case 'n': c = '\n'; break;
        case 'r': c = '\r'; break;
--- a/pcre/testdata/grepoutput
+++ b/pcre/testdata/grepoutput
@ -751,3 +751,7 @@ RC=0
 2:3,1
 2:4,1
 RC=0
+---------------------------- Test 108 ------------------------------
+RC=0
+---------------------------- Test 109 -----------------------------
+RC=0
--- a/pcre/testdata/testinput1
+++ b/pcre/testdata/testinput1
@ -5730,4 +5730,7 @@ AbcdCBefgBhiBqz
 "(?1)(?#?'){8}(a)"
    baaaaaaaaac

+"(?|(\k'Pm')|(?'Pm'))"
+    abcd
+
 /-- End of testinput1 --/
--- a/pcre/testdata/testinput11
+++ b/pcre/testdata/testinput11
@ -136,4 +136,6 @@ is required for these tests. --/

 /((?+1)(\1))/B

+/.((?2)(?R)\1)()/B
+
 /-- End of testinput11 --/
--- a/pcre/testdata/testinput12
+++ b/pcre/testdata/testinput12
--- a/pcre/testdata/testinput14
+++ b/pcre/testdata/testinput14
@ -340,4 +340,6 @@ not matter. --/

 /[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/BZ

+/(?'ABC'[bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar](*THEN:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/
+
 /-- End of testinput14 --/
--- a/pcre/testdata/testinput17
+++ b/pcre/testdata/testinput17
--- a/pcre/testdata/testinput2
+++ b/pcre/testdata/testinput2
--- a/pcre/testdata/testinput6
+++ b/pcre/testdata/testinput6
@ -1502,4 +1502,55 @@
 /\C\X*QT/8
    Ӆ\x0aT

+/[\pS#moq]/
+    =
+
+/[[:punct:]]/8W
+    \xc2\xb4
+    \x{b4} 
+
+/[[:^ascii:]]/8W
+    \x{100}
+    \x{200}
+    \x{300}
+    \x{37e}
+    a
+    9
+    g
+
+/[[:^ascii:]\w]/8W
+    a
+    9
+    g
+    \x{100}
+    \x{200}
+    \x{300}
+    \x{37e}
+
+/[\w[:^ascii:]]/8W
+    a
+    9
+    g
+    \x{100}
+    \x{200}
+    \x{300}
+    \x{37e}
+
+/[^[:ascii:]\W]/8W
+    a
+    9
+    g
+    \x{100}
+    \x{200}
+    \x{300}
+    \x{37e}
+
+/[[:^ascii:]a]/8W
+    a
+    9
+    g
+    \x{100}
+    \x{200}
+    \x{37e}
+
 /-- End of testinput6 --/
--- a/pcre/testdata/testinput7
+++ b/pcre/testdata/testinput7
@ -838,4 +838,19 @@ of case for anything other than the ASCII letters. --/
 /^s?c/mi8I
    scat

+/[\W\p{Any}]/BZ
+    abc
+    123 
+
+/[\W\pL]/BZ
+    abc
+    ** Failers 
+    123     
+
+/a[[:punct:]b]/WBZ
+
+/a[[:punct:]b]/8WBZ
+
+/a[b[:punct:]]/8WBZ
+
 /-- End of testinput7 --/
--- a/pcre/testdata/testinputEBC
+++ b/pcre/testdata/testinputEBC
@ -29,13 +29,16 @@ in EBCDIC, but can be specified as escapes. --/

 /^A\ˆ/
    A B
+    A\x41B

 /-- Test \H --/

 /^A\È/
    AB
+    A\x42B
    ** Fail
    A B
+    A\x41B

 /-- Test \R --/

--- a/pcre/testdata/testoutput1
+++ b/pcre/testdata/testoutput1
@ -9429,4 +9429,9 @@ No match
 0: aaaaaaaaa
 1: a

+"(?|(\k'Pm')|(?'Pm'))"
+    abcd
+ 0: 
+ 1: 
+
 /-- End of testinput1 --/
--- a/pcre/testdata/testoutput11-16
+++ b/pcre/testdata/testoutput11-16
@ -231,7 +231,7 @@ Memory allocation (code space): 73
 ------------------------------------------------------------------

 /(?P<a>a)...(?P=a)bbb(?P>a)d/BM
-Memory allocation (code space): 61
+Memory allocation (code space): 77
 ------------------------------------------------------------------
  0  24 Bra
  2   5 CBra 1
@ -650,18 +650,18 @@ Memory allocation (code space): 14

 /[[:^alpha:][:^cntrl:]]+/8WB
 ------------------------------------------------------------------
-  0  26 Bra
-  2     [ -~\x80-\xff\P{L}]++
- 26  26 Ket
- 28     End
+  0  30 Bra
+  2     [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
+ 30  30 Ket
+ 32     End
 ------------------------------------------------------------------

 /[[:^cntrl:][:^alpha:]]+/8WB
 ------------------------------------------------------------------
-  0  26 Bra
-  2     [ -~\x80-\xff\P{L}]++
- 26  26 Ket
- 28     End
+  0  30 Bra
+  2     [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
+ 30  30 Ket
+ 32     End
 ------------------------------------------------------------------

 /[[:alpha:]]+/8WB
@ -748,4 +748,21 @@ Memory allocation (code space): 14
 22     End
 ------------------------------------------------------------------

+/.((?2)(?R)\1)()/B
+------------------------------------------------------------------
+  0  23 Bra
+  2     Any
+  3  13 Once
+  5   9 CBra 1
+  8  18 Recurse
+ 10   0 Recurse
+ 12     \1
+ 14   9 Ket
+ 16  13 Ket
+ 18   3 CBra 2
+ 21   3 Ket
+ 23  23 Ket
+ 25     End
+------------------------------------------------------------------
+
 /-- End of testinput11 --/
--- a/pcre/testdata/testoutput11-32
+++ b/pcre/testdata/testoutput11-32
@ -231,7 +231,7 @@ Memory allocation (code space): 155
 ------------------------------------------------------------------

 /(?P<a>a)...(?P=a)bbb(?P>a)d/BM
-Memory allocation (code space): 125
+Memory allocation (code space): 157
 ------------------------------------------------------------------
  0  24 Bra
  2   5 CBra 1
@ -650,18 +650,18 @@ Memory allocation (code space): 28

 /[[:^alpha:][:^cntrl:]]+/8WB
 ------------------------------------------------------------------
-  0  18 Bra
-  2     [ -~\x80-\xff\P{L}]++
- 18  18 Ket
- 20     End
+  0  21 Bra
+  2     [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
+ 21  21 Ket
+ 23     End
 ------------------------------------------------------------------

 /[[:^cntrl:][:^alpha:]]+/8WB
 ------------------------------------------------------------------
-  0  18 Bra
-  2     [ -~\x80-\xff\P{L}]++
- 18  18 Ket
- 20     End
+  0  21 Bra
+  2     [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
+ 21  21 Ket
+ 23     End
 ------------------------------------------------------------------

 /[[:alpha:]]+/8WB
@ -748,4 +748,21 @@ Memory allocation (code space): 28
 22     End
 ------------------------------------------------------------------

+/.((?2)(?R)\1)()/B
+------------------------------------------------------------------
+  0  23 Bra
+  2     Any
+  3  13 Once
+  5   9 CBra 1
+  8  18 Recurse
+ 10   0 Recurse
+ 12     \1
+ 14   9 Ket
+ 16  13 Ket
+ 18   3 CBra 2
+ 21   3 Ket
+ 23  23 Ket
+ 25     End
+------------------------------------------------------------------
+
 /-- End of testinput11 --/
--- a/pcre/testdata/testoutput11-8
+++ b/pcre/testdata/testoutput11-8
@ -231,7 +231,7 @@ Memory allocation (code space): 45
 ------------------------------------------------------------------

 /(?P<a>a)...(?P=a)bbb(?P>a)d/BM
-Memory allocation (code space): 38
+Memory allocation (code space): 50
 ------------------------------------------------------------------
  0  30 Bra
  3   7 CBra 1
@ -650,18 +650,18 @@ Memory allocation (code space): 10

 /[[:^alpha:][:^cntrl:]]+/8WB
 ------------------------------------------------------------------
-  0  44 Bra
-  3     [ -~\x80-\xff\P{L}]++
- 44  44 Ket
- 47     End
+  0  51 Bra
+  3     [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
+ 51  51 Ket
+ 54     End
 ------------------------------------------------------------------

 /[[:^cntrl:][:^alpha:]]+/8WB
 ------------------------------------------------------------------
-  0  44 Bra
-  3     [ -~\x80-\xff\P{L}]++
- 44  44 Ket
- 47     End
+  0  51 Bra
+  3     [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
+ 51  51 Ket
+ 54     End
 ------------------------------------------------------------------

 /[[:alpha:]]+/8WB
@ -748,4 +748,21 @@ Memory allocation (code space): 10
 34     End
 ------------------------------------------------------------------

+/.((?2)(?R)\1)()/B
+------------------------------------------------------------------
+  0  35 Bra
+  3     Any
+  4  20 Once
+  7  14 CBra 1
+ 12  27 Recurse
+ 15   0 Recurse
+ 18     \1
+ 21  14 Ket
+ 24  20 Ket
+ 27   5 CBra 2
+ 32   5 Ket
+ 35  35 Ket
+ 38     End
+------------------------------------------------------------------
+
 /-- End of testinput11 --/
--- a/pcre/testdata/testoutput12
+++ b/pcre/testdata/testoutput12
--- a/pcre/testdata/testoutput14
+++ b/pcre/testdata/testoutput14
@ -527,4 +527,6 @@ Failed: character value in \u.... sequence is too large at offset 6
        End
 ------------------------------------------------------------------

+/(?'ABC'[bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar](*THEN:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/
+
 /-- End of testinput14 --/
--- a/pcre/testdata/testoutput17
+++ b/pcre/testdata/testoutput17
--- a/pcre/testdata/testoutput2
+++ b/pcre/testdata/testoutput2
--- a/pcre/testdata/testoutput6
+++ b/pcre/testdata/testoutput6
@ -2469,4 +2469,92 @@ No match
    Ӆ\x0aT
 No match

+/[\pS#moq]/
+    =
+ 0: =
+
+/[[:punct:]]/8W
+    \xc2\xb4
+No match
+    \x{b4} 
+No match
+
+/[[:^ascii:]]/8W
+    \x{100}
+ 0: \x{100}
+    \x{200}
+ 0: \x{200}
+    \x{300}
+ 0: \x{300}
+    \x{37e}
+ 0: \x{37e}
+    a
+No match
+    9
+No match
+    g
+No match
+
+/[[:^ascii:]\w]/8W
+    a
+ 0: a
+    9
+ 0: 9
+    g
+ 0: g
+    \x{100}
+ 0: \x{100}
+    \x{200}
+ 0: \x{200}
+    \x{300}
+ 0: \x{300}
+    \x{37e}
+ 0: \x{37e}
+
+/[\w[:^ascii:]]/8W
+    a
+ 0: a
+    9
+ 0: 9
+    g
+ 0: g
+    \x{100}
+ 0: \x{100}
+    \x{200}
+ 0: \x{200}
+    \x{300}
+ 0: \x{300}
+    \x{37e}
+ 0: \x{37e}
+
+/[^[:ascii:]\W]/8W
+    a
+No match
+    9
+No match
+    g
+No match
+    \x{100}
+ 0: \x{100}
+    \x{200}
+ 0: \x{200}
+    \x{300}
+No match
+    \x{37e}
+No match
+
+/[[:^ascii:]a]/8W
+    a
+ 0: a
+    9
+No match
+    g
+No match
+    \x{100}
+ 0: \x{100}
+    \x{200}
+ 0: \x{200}
+    \x{37e}
+ 0: \x{37e}
+
 /-- End of testinput6 --/
--- a/pcre/testdata/testoutput7
+++ b/pcre/testdata/testoutput7
@ -949,7 +949,7 @@ No match
 /[[:^alpha:][:^cntrl:]]+/8WBZ
 ------------------------------------------------------------------
        Bra
-        [ -~\x80-\xff\P{L}]++
+        [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
        Ket
        End
 ------------------------------------------------------------------
@ -961,7 +961,7 @@ No match
 /[[:^cntrl:][:^alpha:]]+/8WBZ
 ------------------------------------------------------------------
        Bra
-        [ -~\x80-\xff\P{L}]++
+        [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
        Ket
        End
 ------------------------------------------------------------------
@ -2295,4 +2295,57 @@ Need char = 'c' (caseless)
    scat
 0: sc

+/[\W\p{Any}]/BZ
+------------------------------------------------------------------
+        Bra
+        [\x00-/:-@[-^`{-\xff\p{Any}]
+        Ket
+        End
+------------------------------------------------------------------
+    abc
+ 0: a
+    123 
+ 0: 1
+
+/[\W\pL]/BZ
+------------------------------------------------------------------
+        Bra
+        [\x00-/:-@[-^`{-\xff\p{L}]
+        Ket
+        End
+------------------------------------------------------------------
+    abc
+ 0: a
+    ** Failers 
+ 0: *
+    123     
+No match
+
+/a[[:punct:]b]/WBZ
+------------------------------------------------------------------
+        Bra
+        a
+        [b[:punct:]]
+        Ket
+        End
+------------------------------------------------------------------
+
+/a[[:punct:]b]/8WBZ
+------------------------------------------------------------------
+        Bra
+        a
+        [b[:punct:]]
+        Ket
+        End
+------------------------------------------------------------------
+
+/a[b[:punct:]]/8WBZ
+------------------------------------------------------------------
+        Bra
+        a
+        [b[:punct:]]
+        Ket
+        End
+------------------------------------------------------------------
+
 /-- End of testinput7 --/
--- a/pcre/testdata/testoutputEBC
+++ b/pcre/testdata/testoutputEBC
@ -41,16 +41,22 @@ No match
 /^A\ˆ/
    A B
 0: A\x20
+    A\x41B
+ 0: AA

 /-- Test \H --/

 /^A\È/
    AB
+ 0: AB
+    A\x42B
 0: AB
    ** Fail
 No match
    A B
 No match
+    A\x41B
+No match

 /-- Test \R --/