Merge branch 'merge/merge-pcre' into 10.0

2015-12-13 16:25:57 +01:00 · 2015-12-13 16:25:57 +01:00 · 095b7b92d1
commit 095b7b92d1
parent 359ae59ac0 e7591a1ba9
39 changed files with 2224 additions and 1263 deletions
--- a/pcre/ChangeLog
+++ b/pcre/ChangeLog
@ -1,6 +1,182 @@
 ChangeLog for PCRE
 ------------------

+Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
+development is happening in the PCRE2 10.xx series.
+
+Version 8.38 23-November-2015
+-----------------------------
+
+1.  If a group that contained a recursive back reference also contained a
+    forward reference subroutine call followed by a non-forward-reference
+    subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
+    compile correct code, leading to undefined behaviour or an internally
+    detected error. This bug was discovered by the LLVM fuzzer.
+
+2.  Quantification of certain items (e.g. atomic back references) could cause
+    incorrect code to be compiled when recursive forward references were
+    involved. For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/.
+    This bug was discovered by the LLVM fuzzer.
+
+3.  A repeated conditional group whose condition was a reference by name caused
+    a buffer overflow if there was more than one group with the given name.
+    This bug was discovered by the LLVM fuzzer.
+
+4.  A recursive back reference by name within a group that had the same name as
+    another group caused a buffer overflow. For example:
+    /(?J)(?'d'(?'d'\g{d}))/. This bug was discovered by the LLVM fuzzer.
+
+5.  A forward reference by name to a group whose number is the same as the
+    current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused
+    a buffer overflow at compile time. This bug was discovered by the LLVM
+    fuzzer.
+
+6.  A lookbehind assertion within a set of mutually recursive subpatterns could
+    provoke a buffer overflow. This bug was discovered by the LLVM fuzzer.
+
+7.  Another buffer overflow bug involved duplicate named groups with a
+    reference between their definition, with a group that reset capture
+    numbers, for example: /(?J:(?|(?'R')(\k'R')|((?'R'))))/. This has been
+    fixed by always allowing for more memory, even if not needed. (A proper fix
+    is implemented in PCRE2, but it involves more refactoring.)
+
+8.  There was no check for integer overflow in subroutine calls such as (?123).
+
+9.  The table entry for \l in EBCDIC environments was incorrect, leading to its
+    being treated as a literal 'l' instead of causing an error.
+
+10. There was a buffer overflow if pcre_exec() was called with an ovector of
+    size 1. This bug was found by american fuzzy lop.
+
+11. If a non-capturing group containing a conditional group that could match
+    an empty string was repeated, it was not identified as matching an empty
+    string itself. For example: /^(?:(?(1)x|)+)+$()/.
+
+12. In an EBCDIC environment, pcretest was mishandling the escape sequences
+    \a and \e in test subject lines.
+
+13. In an EBCDIC environment, \a in a pattern was converted to the ASCII
+    instead of the EBCDIC value.
+
+14. The handling of \c in an EBCDIC environment has been revised so that it is
+    now compatible with the specification in Perl's perlebcdic page.
+
+15. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
+    ASCII/Unicode. This has now been added to the list of characters that are
+    recognized as white space in EBCDIC.
+
+16. When PCRE was compiled without UCP support, the use of \p and \P gave an
+    error (correctly) when used outside a class, but did not give an error
+    within a class.
+
+17. \h within a class was incorrectly compiled in EBCDIC environments.
+
+18. A pattern with an unmatched closing parenthesis that contained a backward
+    assertion which itself contained a forward reference caused buffer
+    overflow. And example pattern is: /(?=di(?<=(?1))|(?=(.))))/.
+
+19. JIT should return with error when the compiled pattern requires more stack
+    space than the maximum.
+
+20. A possessively repeated conditional group that could match an empty string,
+    for example, /(?(R))*+/, was incorrectly compiled.
+
+21. Fix infinite recursion in the JIT compiler when certain patterns such as
+    /(?:|a|){100}x/ are analysed.
+
+22. Some patterns with character classes involving [: and \\ were incorrectly
+    compiled and could cause reading from uninitialized memory or an incorrect
+    error diagnosis.
+
+23. Pathological patterns containing many nested occurrences of [: caused
+    pcre_compile() to run for a very long time.
+
+24. A conditional group with only one branch has an implicit empty alternative
+    branch and must therefore be treated as potentially matching an empty
+    string.
+
+25. If (?R was followed by - or + incorrect behaviour happened instead of a
+    diagnostic.
+
+26. Arrange to give up on finding the minimum matching length for overly
+    complex patterns.
+
+27. Similar to (4) above: in a pattern with duplicated named groups and an
+    occurrence of (?| it is possible for an apparently non-recursive back
+    reference to become recursive if a later named group with the relevant
+    number is encountered. This could lead to a buffer overflow. Wen Guanxing
+    from Venustech ADLAB discovered this bug.
+
+28. If pcregrep was given the -q option with -c or -l, or when handling a
+    binary file, it incorrectly wrote output to stdout.
+
+29. The JIT compiler did not restore the control verb head in case of *THEN
+    control verbs. This issue was found by Karl Skomski with a custom LLVM
+    fuzzer.
+
+30. Error messages for syntax errors following \g and \k were giving inaccurate
+    offsets in the pattern.
+
+31. Added a check for integer overflow in conditions (?(<digits>) and
+    (?(R<digits>). This omission was discovered by Karl Skomski with the LLVM
+    fuzzer.
+
+32. Handling recursive references such as (?2) when the reference is to a group
+    later in the pattern uses code that is very hacked about and error-prone.
+    It has been re-written for PCRE2. Here in PCRE1, a check has been added to
+    give an internal error if it is obvious that compiling has gone wrong.
+
+33. The JIT compiler should not check repeats after a {0,1} repeat byte code.
+    This issue was found by Karl Skomski with a custom LLVM fuzzer.
+
+34. The JIT compiler should restore the control chain for empty possessive
+    repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer.
+
+35. Match limit check added to JIT recursion. This issue was found by Karl
+    Skomski with a custom LLVM fuzzer.
+
+36. Yet another case similar to 27 above has been circumvented by an
+    unconditional allocation of extra memory. This issue is fixed "properly" in
+    PCRE2 by refactoring the way references are handled. Wen Guanxing
+    from Venustech ADLAB discovered this bug.
+
+37. Fix two assertion fails in JIT. These issues were found by Karl Skomski
+    with a custom LLVM fuzzer.
+
+38. Fixed a corner case of range optimization in JIT.
+
+39. An incorrect error "overran compiling workspace" was given if there were
+    exactly enough group forward references such that the last one extended
+    into the workspace safety margin. The next one would have expanded the
+    workspace. The test for overflow was not including the safety margin.
+
+40. A match limit issue is fixed in JIT which was found by Karl Skomski
+    with a custom LLVM fuzzer.
+
+41. Remove the use of /dev/null in testdata/testinput2, because it doesn't
+    work under Windows. (Why has it taken so long for anyone to notice?)
+
+42. In a character class such as [\W\p{Any}] where both a negative-type escape
+    ("not a word character") and a property escape were present, the property
+    escape was being ignored.
+
+43. Fix crash caused by very long (*MARK) or (*THEN) names.
+
+44. A sequence such as [[:punct:]b] that is, a POSIX character class followed
+    by a single ASCII character in a class item, was incorrectly compiled in
+    UCP mode. The POSIX class got lost, but only if the single character
+    followed it.
+
+45. [:punct:] in UCP mode was matching some characters in the range 128-255
+    that should not have been matched.
+
+46. If [:^ascii:] or [:^xdigit:] or [:^cntrl:] are present in a non-negated
+    class, all characters with code points greater than 255 are in the class.
+    When a Unicode property was also in the class (if PCRE_UCP is set, escapes
+    such as \w are turned into Unicode properties), wide characters were not
+    correctly handled, and could fail to match.
+
+
 Version 8.37 28-April-2015
 --------------------------

--- a/pcre/NEWS
+++ b/pcre/NEWS
@ -1,6 +1,14 @@
 News about PCRE releases
 ------------------------

+Release 8.38 23-November-2015
+-----------------------------
+
+This is bug-fix release. Note that this library (now called PCRE1) is now being
+maintained for bug fixes only. New projects are advised to use the new PCRE2
+libraries.
+
+
 Release 8.37 28-April-2015
 --------------------------

--- a/pcre/NON-AUTOTOOLS-BUILD
+++ b/pcre/NON-AUTOTOOLS-BUILD
@ -764,9 +764,9 @@ required. For details, please see this web site:

  http://www.zaconsultants.net

-There is also a mirror here:
-
-  http://www.vsoft-software.com/downloads.html
+You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
+executable, is in EBCDIC and native z/OS file formats and this is the
+recommended download site.

 ==========================
-Last Updated: 10 February 2015
+Last Updated: 25 June 2015
--- a/pcre/RunGrepTest
+++ b/pcre/RunGrepTest
@ -512,6 +512,14 @@ echo "aaaaa" >>testtemp1grep
 (cd $srcdir; $valgrind $pcregrep  --line-offsets '(?<=\Ka)' $builddir/testtemp1grep) >>testtrygrep 2>&1
 echo "RC=$?" >>testtrygrep

+echo "---------------------------- Test 108 ------------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $pcregrep -lq PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep
+echo "RC=$?" >>testtrygrep
+
+echo "---------------------------- Test 109 -----------------------------" >>testtrygrep
+(cd $srcdir; $valgrind $pcregrep -cq lazy ./testdata/grepinput*) >>testtrygrep
+echo "RC=$?" >>testtrygrep
+
 # Now compare the results.

 $cf $srcdir/testdata/grepoutput testtrygrep
--- a/pcre/configure.ac
+++ b/pcre/configure.ac
@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
 dnl be defined as -RC2, for example. For real releases, it should be empty.

 m4_define(pcre_major, [8])
-m4_define(pcre_minor, [37])
+m4_define(pcre_minor, [38])
 m4_define(pcre_prerelease, [])
-m4_define(pcre_date, [2015-04-28])
+m4_define(pcre_date, [2015-11-23])

 # NOTE: The CMakeLists.txt file searches for the above variables in the first
 # 50 lines of this file. Please update that if the variables above are moved.

 # Libtool shared library interface versions (current:revision:age)
-m4_define(libpcre_version, [3:5:2])
-m4_define(libpcre16_version, [2:5:2])
-m4_define(libpcre32_version, [0:5:0])
+m4_define(libpcre_version, [3:6:2])
+m4_define(libpcre16_version, [2:6:2])
+m4_define(libpcre32_version, [0:6:0])
 m4_define(libpcreposix_version, [0:3:0])
 m4_define(libpcrecpp_version, [0:1:0])

--- a/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt
+++ b/pcre/doc/html/NON-AUTOTOOLS-BUILD.txt
@ -764,9 +764,9 @@ required. For details, please see this web site:

  http://www.zaconsultants.net

-There is also a mirror here:
-
-  http://www.vsoft-software.com/downloads.html
+You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
+executable, is in EBCDIC and native z/OS file formats and this is the
+recommended download site.

 ==========================
-Last Updated: 10 February 2015
+Last Updated: 25 June 2015
--- a/pcre/doc/html/pcrepattern.html
+++ b/pcre/doc/html/pcrepattern.html
@ -329,7 +329,8 @@ A second use of backslash provides a way of encoding non-printing characters
 in patterns in a visible manner. There is no restriction on the appearance of
 non-printing characters, apart from the binary zero that terminates a pattern,
 but when a pattern is being prepared by text editing, it is often easier to use
-one of the following escape sequences than the binary character it represents:
+one of the following escape sequences than the binary character it represents.
+In an ASCII or Unicode environment, these escapes are as follows:
 <pre>
  \a        alarm, that is, the BEL character (hex 07)
  \cx       "control-x", where x is any ASCII character
@ -353,19 +354,33 @@ data item (byte or 16-bit value) following \c has a value greater than 127, a
 compile-time error occurs. This locks out non-ASCII characters in all modes.
 </P>
 <P>
-The \c facility was designed for use with ASCII characters, but with the
-extension to Unicode it is even less useful than it once was. It is, however,
-recognized when PCRE is compiled in EBCDIC mode, where data items are always
-bytes. In this mode, all values are valid after \c. If the next character is a
-lower case letter, it is converted to upper case. Then the 0xc0 bits of the
-byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because
-the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other
-characters also generate different values.
+When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
+generate the appropriate EBCDIC code values. The \c escape is processed
+as specified for Perl in the <b>perlebcdic</b> document. The only characters
+that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
+other character provokes a compile-time error. The sequence \@ encodes
+character code 0; the letters (in either case) encode characters 1-26 (hex 01
+to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
+\? becomes either 255 (hex FF) or 95 (hex 5F).
+</P>
+<P>
+Thus, apart from \?, these escapes generate the same character code values as
+they do in an ASCII environment, though the meanings of the values mostly
+differ. For example, \G always generates code value 7, which is BEL in ASCII
+but DEL in EBCDIC.
+</P>
+<P>
+The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
+because 127 is not a control character in EBCDIC, Perl makes it generate the
+APC character. Unfortunately, there are several variants of EBCDIC. In most of
+them the APC character has the value 255 (hex FF), but in the one Perl calls
+POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
+values, PCRE makes \? generate 95; otherwise it generates 255.
 </P>
 <P>
 After \0 up to two further octal digits are read. If there are fewer than two
-digits, just those that are present are used. Thus the sequence \0\x\07
-specifies two binary zeros followed by a BEL character (code value 7). Make
+digits, just those that are present are used. Thus the sequence \0\x\015
+specifies two binary zeros followed by a CR character (code value 13). Make
 sure you supply two digits after the initial zero if the pattern character that
 follows is itself an octal digit.
 </P>
@ -3249,9 +3264,9 @@ Cambridge CB2 3QH, England.
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 08 January 2014
+Last updated: 14 June 2015
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2015 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE index page</a>.
--- a/pcre/doc/pcre.txt
+++ b/pcre/doc/pcre.txt
@ -5000,7 +5000,8 @@ BACKSLASH
       appearance  of non-printing characters, apart from the binary zero that
       terminates a pattern, but when a pattern  is  being  prepared  by  text
       editing,  it  is  often  easier  to  use  one  of  the following escape
-       sequences than the binary character it represents:
+       sequences than the binary character it represents.  In an ASCII or Uni-
+       code environment, these escapes are as follows:

         \a        alarm, that is, the BEL character (hex 07)
         \cx       "control-x", where x is any ASCII character
@ -5024,20 +5025,32 @@ BACKSLASH
       has a value greater than 127, a compile-time error occurs.  This  locks
       out non-ASCII characters in all modes.

-       The \c facility was designed for use with ASCII  characters,  but  with
-       the  extension  to  Unicode it is even less useful than it once was. It
-       is, however, recognized when PCRE is compiled  in  EBCDIC  mode,  where
-       data  items  are always bytes. In this mode, all values are valid after
-       \c. If the next character is a lower case letter, it  is  converted  to
-       upper  case.  Then  the  0xc0  bits  of the byte are inverted. Thus \cA
-       becomes hex 01, as in ASCII (A is C1), but because the  EBCDIC  letters
-       are  disjoint,  \cZ becomes hex 29 (Z is E9), and other characters also
-       generate different values.
+       When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t gener-
+       ate the appropriate EBCDIC code values. The \c escape is  processed  as
+       specified for Perl in the perlebcdic document. The only characters that
+       are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^,  _,  or  ?.
+       Any  other  character  provokes  a  compile-time error. The sequence \@
+       encodes character code 0; the letters (in either case)  encode  charac-
+       ters 1-26 (hex 01 to hex 1A); [, \, ], ^, and _ encode characters 27-31
+       (hex 1B to hex 1F), and \? becomes either 255 (hex FF) or 95 (hex 5F).
+
+       Thus, apart from \?, these escapes generate  the  same  character  code
+       values  as  they do in an ASCII environment, though the meanings of the
+       values mostly differ. For example, \G always generates  code  value  7,
+       which is BEL in ASCII but DEL in EBCDIC.
+
+       The  sequence  \?  generates DEL (127, hex 7F) in an ASCII environment,
+       but because 127 is not a control character in  EBCDIC,  Perl  makes  it
+       generate  the  APC character. Unfortunately, there are several variants
+       of EBCDIC. In most of them the APC character has  the  value  255  (hex
+       FF),  but  in  the one Perl calls POSIX-BC its value is 95 (hex 5F). If
+       certain other characters have POSIX-BC values, PCRE makes  \?  generate
+       95; otherwise it generates 255.

       After  \0  up  to two further octal digits are read. If there are fewer
       than two digits, just  those  that  are  present  are  used.  Thus  the
-       sequence \0\x\07 specifies two binary zeros followed by a BEL character
-       (code  value 7). Make sure you supply two digits after the initial zero
+       sequence \0\x\015 specifies two binary zeros followed by a CR character
+       (code value 13). Make sure you supply two digits after the initial zero
       if the pattern character that follows is itself an octal digit.

       The  escape \o must be followed by a sequence of octal digits, enclosed
@ -7656,8 +7669,8 @@ AUTHOR

 REVISION

-       Last updated: 08 January 2014
-       Copyright (c) 1997-2014 University of Cambridge.
+       Last updated: 14 June 2015
+       Copyright (c) 1997-2015 University of Cambridge.
 ------------------------------------------------------------------------------


--- a/pcre/doc/pcrepattern.3
+++ b/pcre/doc/pcrepattern.3
@ -1,4 +1,4 @@
-.TH PCREPATTERN 3 "08 January 2014" "PCRE 8.35"
+.TH PCREPATTERN 3 "14 June 2015" "PCRE 8.38"
 .SH NAME
 PCRE - Perl-compatible regular expressions
 .SH "PCRE REGULAR EXPRESSION DETAILS"
@ -308,7 +308,8 @@ A second use of backslash provides a way of encoding non-printing characters
 in patterns in a visible manner. There is no restriction on the appearance of
 non-printing characters, apart from the binary zero that terminates a pattern,
 but when a pattern is being prepared by text editing, it is often easier to use
-one of the following escape sequences than the binary character it represents:
+one of the following escape sequences than the binary character it represents.
+In an ASCII or Unicode environment, these escapes are as follows:
 .sp
  \ea        alarm, that is, the BEL character (hex 07)
  \ecx       "control-x", where x is any ASCII character
@ -331,18 +332,30 @@ but \ec{ becomes hex 3B ({ is 7B), and \ec; becomes hex 7B (; is 3B). If the
 data item (byte or 16-bit value) following \ec has a value greater than 127, a
 compile-time error occurs. This locks out non-ASCII characters in all modes.
 .P
-The \ec facility was designed for use with ASCII characters, but with the
-extension to Unicode it is even less useful than it once was. It is, however,
-recognized when PCRE is compiled in EBCDIC mode, where data items are always
-bytes. In this mode, all values are valid after \ec. If the next character is a
-lower case letter, it is converted to upper case. Then the 0xc0 bits of the
-byte are inverted. Thus \ecA becomes hex 01, as in ASCII (A is C1), but because
-the EBCDIC letters are disjoint, \ecZ becomes hex 29 (Z is E9), and other
-characters also generate different values.
+When PCRE is compiled in EBCDIC mode, \ea, \ee, \ef, \en, \er, and \et
+generate the appropriate EBCDIC code values. The \ec escape is processed
+as specified for Perl in the \fBperlebcdic\fP document. The only characters
+that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ], ^, _, or ?. Any
+other character provokes a compile-time error. The sequence \e@ encodes
+character code 0; the letters (in either case) encode characters 1-26 (hex 01
+to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
+\e? becomes either 255 (hex FF) or 95 (hex 5F).
+.P
+Thus, apart from \e?, these escapes generate the same character code values as
+they do in an ASCII environment, though the meanings of the values mostly
+differ. For example, \eG always generates code value 7, which is BEL in ASCII
+but DEL in EBCDIC.
+.P
+The sequence \e? generates DEL (127, hex 7F) in an ASCII environment, but
+because 127 is not a control character in EBCDIC, Perl makes it generate the
+APC character. Unfortunately, there are several variants of EBCDIC. In most of
+them the APC character has the value 255 (hex FF), but in the one Perl calls
+POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
+values, PCRE makes \e? generate 95; otherwise it generates 255.
 .P
 After \e0 up to two further octal digits are read. If there are fewer than two
-digits, just those that are present are used. Thus the sequence \e0\ex\e07
-specifies two binary zeros followed by a BEL character (code value 7). Make
+digits, just those that are present are used. Thus the sequence \e0\ex\e015
+specifies two binary zeros followed by a CR character (code value 13). Make
 sure you supply two digits after the initial zero if the pattern character that
 follows is itself an octal digit.
 .P
@ -3283,6 +3296,6 @@ Cambridge CB2 3QH, England.
 .rs
 .sp
 .nf
-Last updated: 08 January 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 14 June 2015
+Copyright (c) 1997-2015 University of Cambridge.
 .fi
--- a/pcre/pcre_compile.c
+++ b/pcre/pcre_compile.c
@ -174,7 +174,7 @@ static const short int escapes[] = {
     -ESC_Z,                  CHAR_LEFT_SQUARE_BRACKET,
     CHAR_BACKSLASH,          CHAR_RIGHT_SQUARE_BRACKET,
     CHAR_CIRCUMFLEX_ACCENT,  CHAR_UNDERSCORE,
-     CHAR_GRAVE_ACCENT,       7,
+     CHAR_GRAVE_ACCENT,       ESC_a,
     -ESC_b,                  0,
     -ESC_d,                  ESC_e,
     ESC_f,                   0,
@ -202,9 +202,9 @@ static const short int escapes[] = {
 /*  68 */     0,     0,    '|',     ',',    '%',   '_',    '>',    '?',
 /*  70 */     0,     0,      0,       0,      0,     0,      0,      0,
 /*  78 */     0,   '`',    ':',     '#',    '@',  '\'',    '=',    '"',
-/*  80 */     0,     7, -ESC_b,       0, -ESC_d, ESC_e,  ESC_f,      0,
+/*  80 */     0, ESC_a, -ESC_b,       0, -ESC_d, ESC_e,  ESC_f,      0,
 /*  88 */-ESC_h,     0,      0,     '{',      0,     0,      0,      0,
-/*  90 */     0,     0, -ESC_k,     'l',      0, ESC_n,      0, -ESC_p,
+/*  90 */     0,     0, -ESC_k,       0,      0, ESC_n,      0, -ESC_p,
 /*  98 */     0, ESC_r,      0,     '}',      0,     0,      0,      0,
 /*  A0 */     0,   '~', -ESC_s, ESC_tee,      0,-ESC_v, -ESC_w,      0,
 /*  A8 */     0,-ESC_z,      0,       0,      0,   '[',      0,      0,
@ -219,6 +219,12 @@ static const short int escapes[] = {
 /*  F0 */     0,     0,      0,       0,      0,     0,      0,      0,
 /*  F8 */     0,     0,      0,       0,      0,     0,      0,      0
 };
+
+/* We also need a table of characters that may follow \c in an EBCDIC
+environment for characters 0-31. */
+
+static unsigned char ebcdic_escape_c[] = "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_";
+
 #endif


@ -458,7 +464,7 @@ static const char error_texts[] =
  "range out of order in character class\0"
  "nothing to repeat\0"
  /* 10 */
-  "operand of unlimited repeat could match the empty string\0"  /** DEAD **/
+  "internal error: invalid forward reference offset\0"
  "internal error: unexpected repeat\0"
  "unrecognized character after (? or (?-\0"
  "POSIX named classes are supported only within a class\0"
@ -527,7 +533,11 @@ static const char error_texts[] =
  "different names for subpatterns of the same number are not allowed\0"
  "(*MARK) must have an argument\0"
  "this version of PCRE is not compiled with Unicode property support\0"
+#ifndef EBCDIC
  "\\c must be followed by an ASCII character\0"
+#else
+  "\\c must be followed by a letter or one of [\\]^_?\0"
+#endif
  "\\k is not followed by a braced, angle-bracketed, or quoted name\0"
  /* 70 */
  "internal error: unknown opcode in find_fixedlength()\0"
@ -1425,7 +1435,16 @@ else
    c ^= 0x40;
 #else             /* EBCDIC coding */
    if (c >= CHAR_a && c <= CHAR_z) c += 64;
-    c ^= 0xC0;
+    if (c == CHAR_QUESTION_MARK)
+      c = ('\\' == 188 && '`' == 74)? 0x5f : 0xff;
+    else
+      {
+      for (i = 0; i < 32; i++)
+        {
+        if (c == ebcdic_escape_c[i]) break;
+        }
+      if (i < 32) c = i; else *errorcodeptr = ERR68;
+      }
 #endif
    break;

@ -1799,7 +1818,7 @@ for (;;)
    case OP_ASSERTBACK:
    case OP_ASSERTBACK_NOT:
    do cc += GET(cc, 1); while (*cc == OP_ALT);
-    cc += PRIV(OP_lengths)[*cc];
+    cc += 1 + LINK_SIZE;
    break;

    /* Skip over things that don't match chars */
@ -2487,7 +2506,7 @@ for (code = first_significant_code(code + PRIV(OP_lengths)[*code], TRUE);
  if (c == OP_BRA  || c == OP_BRAPOS ||
      c == OP_CBRA || c == OP_CBRAPOS ||
      c == OP_ONCE || c == OP_ONCE_NC ||
-      c == OP_COND)
+      c == OP_COND || c == OP_SCOND)
    {
    BOOL empty_branch;
    if (GET(code, 1) == 0) return TRUE;    /* Hit unclosed bracket */
@ -3886,11 +3905,11 @@ didn't consider this to be a POSIX class. Likewise for [:1234:].
 The problem in trying to be exactly like Perl is in the handling of escapes. We
 have to be sure that [abc[:x\]pqr] is *not* treated as containing a POSIX
 class, but [abc[:x\]pqr:]] is (so that an error can be generated). The code
-below handles the special case of \], but does not try to do any other escape
-processing. This makes it different from Perl for cases such as [:l\ower:]
-where Perl recognizes it as the POSIX class "lower" but PCRE does not recognize
-"l\ower". This is a lesser evil than not diagnosing bad classes when Perl does,
-I think.
+below handles the special cases \\ and \], but does not try to do any other
+escape processing. This makes it different from Perl for cases such as
+[:l\ower:] where Perl recognizes it as the POSIX class "lower" but PCRE does
+not recognize "l\ower". This is a lesser evil than not diagnosing bad classes
+when Perl does, I think.

 A user pointed out that PCRE was rejecting [:a[:digit:]] whereas Perl was not.
 It seems that the appearance of a nested POSIX class supersedes an apparent
@ -3917,22 +3936,17 @@ pcre_uchar terminator;          /* Don't combine these lines; the Solaris cc */
 terminator = *(++ptr);   /* compiler warns about "non-constant" initializer. */
 for (++ptr; *ptr != CHAR_NULL; ptr++)
  {
-  if (*ptr == CHAR_BACKSLASH && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
+  if (*ptr == CHAR_BACKSLASH &&
+      (ptr[1] == CHAR_RIGHT_SQUARE_BRACKET ||
+       ptr[1] == CHAR_BACKSLASH))
    ptr++;
-  else if (*ptr == CHAR_RIGHT_SQUARE_BRACKET) return FALSE;
-  else
-    {
-    if (*ptr == terminator && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
+  else if ((*ptr == CHAR_LEFT_SQUARE_BRACKET && ptr[1] == terminator) ||
+            *ptr == CHAR_RIGHT_SQUARE_BRACKET) return FALSE;
+  else if (*ptr == terminator && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
    {
    *endptr = ptr;
    return TRUE;
    }
-    if (*ptr == CHAR_LEFT_SQUARE_BRACKET &&
-         (ptr[1] == CHAR_COLON || ptr[1] == CHAR_DOT ||
-          ptr[1] == CHAR_EQUALS_SIGN) &&
-        check_posix_syntax(ptr, endptr))
-      return FALSE;
-    }
  }
 return FALSE;
 }
@ -3985,11 +3999,12 @@ have their offsets adjusted. That one of the jobs of this function. Before it
 is called, the partially compiled regex must be temporarily terminated with
 OP_END.

-This function has been extended with the possibility of forward references for
-recursions and subroutine calls. It must also check the list of such references
-for the group we are dealing with. If it finds that one of the recursions in
-the current group is on this list, it adjusts the offset in the list, not the
-value in the reference (which is a group number).
+This function has been extended to cope with forward references for recursions
+and subroutine calls. It must check the list of such references for the
+group we are dealing with. If it finds that one of the recursions in the
+current group is on this list, it does not adjust the value in the reference
+(which is a group number). After the group has been scanned, all the offsets in
+the forward reference list for the group are adjusted.

 Arguments:
  group      points to the start of the group
@ -4005,29 +4020,21 @@ static void
 adjust_recurse(pcre_uchar *group, int adjust, BOOL utf, compile_data *cd,
  size_t save_hwm_offset)
 {
+int offset;
+pcre_uchar *hc;
 pcre_uchar *ptr = group;

 while ((ptr = (pcre_uchar *)find_recurse(ptr, utf)) != NULL)
  {
-  int offset;
-  pcre_uchar *hc;
-
-  /* See if this recursion is on the forward reference list. If so, adjust the
-  reference. */
-
  for (hc = (pcre_uchar *)cd->start_workspace + save_hwm_offset; hc < cd->hwm;
       hc += LINK_SIZE)
    {
    offset = (int)GET(hc, 0);
-    if (cd->start_code + offset == ptr + 1)
-      {
-      PUT(hc, 0, offset + adjust);
-      break;
-      }
+    if (cd->start_code + offset == ptr + 1) break;
    }

-  /* Otherwise, adjust the recursion offset if it's after the start of this
-  group. */
+  /* If we have not found this recursion on the forward reference list, adjust
+  the recursion's offset if it's after the start of this group. */

  if (hc >= cd->hwm)
    {
@ -4037,6 +4044,15 @@ while ((ptr = (pcre_uchar *)find_recurse(ptr, utf)) != NULL)

  ptr += 1 + LINK_SIZE;
  }
+
+/* Now adjust all forward reference offsets for the group. */
+
+for (hc = (pcre_uchar *)cd->start_workspace + save_hwm_offset; hc < cd->hwm;
+     hc += LINK_SIZE)
+  {
+  offset = (int)GET(hc, 0);
+  PUT(hc, 0, offset + adjust);
+  }
 }


@ -4465,7 +4481,7 @@ const pcre_uchar *tempptr;
 const pcre_uchar *nestptr = NULL;
 pcre_uchar *previous = NULL;
 pcre_uchar *previous_callout = NULL;
-size_t save_hwm_offset = 0;
+size_t item_hwm_offset = 0;
 pcre_uint8 classbits[32];

 /* We can fish out the UTF-8 setting once and for all into a BOOL, but we
@ -4623,8 +4639,7 @@ for (;; ptr++)
  /* In the real compile phase, just check the workspace used by the forward
  reference list. */

-  else if (cd->hwm > cd->start_workspace + cd->workspace_size -
-           WORK_SIZE_SAFETY_MARGIN)
+  else if (cd->hwm > cd->start_workspace + cd->workspace_size)
    {
    *errorcodeptr = ERR52;
    goto FAILED;
@ -4767,6 +4782,7 @@ for (;; ptr++)
    zeroreqchar = reqchar;
    zeroreqcharflags = reqcharflags;
    previous = code;
+    item_hwm_offset = cd->hwm - cd->start_workspace;
    *code++ = ((options & PCRE_DOTALL) != 0)? OP_ALLANY: OP_ANY;
    break;

@ -4818,6 +4834,7 @@ for (;; ptr++)
    /* Handle a real character class. */

    previous = code;
+    item_hwm_offset = cd->hwm - cd->start_workspace;

    /* PCRE supports POSIX class stuff inside a class. Perl gives an error if
    they are encountered at the top level, so we'll do that too. */
@ -4923,9 +4940,10 @@ for (;; ptr++)
      (which is on the stack). We have to remember that there was XCLASS data,
      however. */

+      if (class_uchardata > class_uchardata_base) xclass = TRUE;
+
      if (lengthptr != NULL && class_uchardata > class_uchardata_base)
        {
-        xclass = TRUE;
        *lengthptr += (int)(class_uchardata - class_uchardata_base);
        class_uchardata = class_uchardata_base;
        }
@ -5028,10 +5046,26 @@ for (;; ptr++)
            ptr = tempptr + 1;
            continue;

-            /* For all other POSIX classes, no special action is taken in UCP
-            mode. Fall through to the non_UCP case. */
+            /* For the other POSIX classes (ascii, xdigit) we are going to fall
+            through to the non-UCP case and build a bit map for characters with
+            code points less than 256. If we are in a negated POSIX class
+            within a non-negated overall class, characters with code points
+            greater than 255 must all match. In the special case where we have
+            not yet generated any xclass data, and this is the final item in
+            the overall class, we need do nothing: later on, the opcode
+            OP_NCLASS will be used to indicate that characters greater than 255
+            are acceptable. If we have already seen an xclass item or one may
+            follow (we have to assume that it might if this is not the end of
+            the class), explicitly match all wide codepoints. */

            default:
+            if (!negate_class && local_negate &&
+                (xclass || tempptr[2] != CHAR_RIGHT_SQUARE_BRACKET))
+              {
+              *class_uchardata++ = XCL_RANGE;
+              class_uchardata += PRIV(ord2utf)(0x100, class_uchardata);
+              class_uchardata += PRIV(ord2utf)(0x10ffff, class_uchardata);
+              }
            break;
            }
          }
@ -5195,9 +5229,9 @@ for (;; ptr++)
              cd, PRIV(vspace_list));
            continue;

-#ifdef SUPPORT_UCP
            case ESC_p:
            case ESC_P:
+#ifdef SUPPORT_UCP
              {
              BOOL negated;
              unsigned int ptype = 0, pdata = 0;
@ -5211,6 +5245,9 @@ for (;; ptr++)
              class_has_8bitchar--;                /* Undo! */
              continue;
              }
+#else
+            *errorcodeptr = ERR45;
+            goto FAILED;
 #endif
            /* Unrecognized escapes are faulted if PCRE is running in its
            strict mode. By default, for compatibility with Perl, they are
@ -5367,16 +5404,20 @@ for (;; ptr++)
      CLASS_SINGLE_CHARACTER:
      if (class_one_char < 2) class_one_char++;

-      /* If class_one_char is 1, we have the first single character in the
-      class, and there have been no prior ranges, or XCLASS items generated by
-      escapes. If this is the final character in the class, we can optimize by
-      turning the item into a 1-character OP_CHAR[I] if it's positive, or
-      OP_NOT[I] if it's negative. In the positive case, it can cause firstchar
-      to be set. Otherwise, there can be no first char if this item is first,
-      whatever repeat count may follow. In the case of reqchar, save the
-      previous value for reinstating. */
+      /* If xclass_has_prop is false and class_one_char is 1, we have the first
+      single character in the class, and there have been no prior ranges, or
+      XCLASS items generated by escapes. If this is the final character in the
+      class, we can optimize by turning the item into a 1-character OP_CHAR[I]
+      if it's positive, or OP_NOT[I] if it's negative. In the positive case, it
+      can cause firstchar to be set. Otherwise, there can be no first char if
+      this item is first, whatever repeat count may follow. In the case of
+      reqchar, save the previous value for reinstating. */

-      if (!inescq && class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
+      if (!inescq &&
+#ifdef SUPPORT_UCP
+          !xclass_has_prop &&
+#endif
+          class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
        {
        ptr++;
        zeroreqchar = reqchar;
@ -5492,9 +5533,10 @@ for (;; ptr++)
    actual compiled code. */

 #ifdef SUPPORT_UTF
-    if (xclass && (!should_flip_negation || (options & PCRE_UCP) != 0))
+    if (xclass && (xclass_has_prop || !should_flip_negation ||
+        (options & PCRE_UCP) != 0))
 #elif !defined COMPILE_PCRE8
-    if (xclass && !should_flip_negation)
+    if (xclass && (xclass_has_prop || !should_flip_negation))
 #endif
 #if defined SUPPORT_UTF || !defined COMPILE_PCRE8
      {
@ -5930,7 +5972,7 @@ for (;; ptr++)
      {
      register int i;
      int len = (int)(code - previous);
-      size_t base_hwm_offset = save_hwm_offset;
+      size_t base_hwm_offset = item_hwm_offset;
      pcre_uchar *bralink = NULL;
      pcre_uchar *brazeroptr = NULL;

@ -5985,7 +6027,7 @@ for (;; ptr++)
        if (repeat_max <= 1)    /* Covers 0, 1, and unlimited */
          {
          *code = OP_END;
-          adjust_recurse(previous, 1, utf, cd, save_hwm_offset);
+          adjust_recurse(previous, 1, utf, cd, item_hwm_offset);
          memmove(previous + 1, previous, IN_UCHARS(len));
          code++;
          if (repeat_max == 0)
@ -6009,7 +6051,7 @@ for (;; ptr++)
          {
          int offset;
          *code = OP_END;
-          adjust_recurse(previous, 2 + LINK_SIZE, utf, cd, save_hwm_offset);
+          adjust_recurse(previous, 2 + LINK_SIZE, utf, cd, item_hwm_offset);
          memmove(previous + 2 + LINK_SIZE, previous, IN_UCHARS(len));
          code += 2 + LINK_SIZE;
          *previous++ = OP_BRAZERO + repeat_type;
@ -6254,6 +6296,12 @@ for (;; ptr++)
            while (*scode == OP_ALT);
            }

+          /* A conditional group with only one branch has an implicit empty
+          alternative branch. */
+
+          if (*bracode == OP_COND && bracode[GET(bracode,1)] != OP_ALT)
+            *bracode = OP_SCOND;
+
          /* Handle possessive quantifiers. */

          if (possessive_quantifier)
@ -6267,11 +6315,11 @@ for (;; ptr++)
              {
              int nlen = (int)(code - bracode);
              *code = OP_END;
-              adjust_recurse(bracode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
+              adjust_recurse(bracode, 1 + LINK_SIZE, utf, cd, item_hwm_offset);
              memmove(bracode + 1 + LINK_SIZE, bracode, IN_UCHARS(nlen));
              code += 1 + LINK_SIZE;
              nlen += 1 + LINK_SIZE;
-              *bracode = OP_BRAPOS;
+              *bracode = (*bracode == OP_COND)? OP_BRAPOS : OP_SBRAPOS;
              *code++ = OP_KETRPOS;
              PUTINC(code, 0, nlen);
              PUT(bracode, 1, nlen);
@ -6401,7 +6449,7 @@ for (;; ptr++)
        else
          {
          *code = OP_END;
-          adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
+          adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, item_hwm_offset);
          memmove(tempcode + 1 + LINK_SIZE, tempcode, IN_UCHARS(len));
          code += 1 + LINK_SIZE;
          len += 1 + LINK_SIZE;
@ -6450,7 +6498,7 @@ for (;; ptr++)

        default:
        *code = OP_END;
-        adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
+        adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, item_hwm_offset);
        memmove(tempcode + 1 + LINK_SIZE, tempcode, IN_UCHARS(len));
        code += 1 + LINK_SIZE;
        len += 1 + LINK_SIZE;
@ -6586,9 +6634,17 @@ for (;; ptr++)
              goto FAILED;
              }
            setverb = *code++ = verbs[i].op_arg;
+            if (lengthptr != NULL)    /* In pass 1 just add in the length */
+              {                       /* to avoid potential workspace */
+              *lengthptr += arglen;   /* overflow. */
+              *code++ = 0;
+              }
+            else
+              {
              *code++ = arglen;
              memcpy(code, arg, IN_UCHARS(arglen));
              code += arglen;
+              }
            *code++ = 0;
            }

@ -6623,7 +6679,7 @@ for (;; ptr++)
    newoptions = options;
    skipbytes = 0;
    bravalue = OP_CBRA;
-    save_hwm_offset = cd->hwm - cd->start_workspace;
+    item_hwm_offset = cd->hwm - cd->start_workspace;
    reset_bracount = FALSE;

    /* Deal with the extended parentheses; all are introduced by '?', and the
@ -6641,6 +6697,7 @@ for (;; ptr++)
        /* ------------------------------------------------------------ */
        case CHAR_VERTICAL_LINE:  /* Reset capture count for each branch */
        reset_bracount = TRUE;
+        cd->dupgroups = TRUE;     /* Record (?| encountered */
        /* Fall through */

        /* ------------------------------------------------------------ */
@ -6741,6 +6798,12 @@ for (;; ptr++)
          {
          while (IS_DIGIT(*ptr))
            {
+            if (recno > INT_MAX / 10 - 1)  /* Integer overflow */
+              {
+              while (IS_DIGIT(*ptr)) ptr++;
+              *errorcodeptr = ERR61;
+              goto FAILED;
+              }
            recno = recno * 10 + (int)(*ptr - CHAR_0);
            ptr++;
            }
@ -6769,7 +6832,7 @@ for (;; ptr++)
            ptr++;
            }
          namelen = (int)(ptr - name);
-          if (lengthptr != NULL) *lengthptr += IMM2_SIZE;
+          if (lengthptr != NULL) skipbytes += IMM2_SIZE;
          }

        /* Check the terminator */
@ -6875,6 +6938,11 @@ for (;; ptr++)
              *errorcodeptr = ERR15;
              goto FAILED;
              }
+            if (recno > INT_MAX / 10 - 1)   /* Integer overflow */
+              {
+              *errorcodeptr = ERR61;
+              goto FAILED;
+              }
            recno = recno * 10 + name[i] - CHAR_0;
            }
          if (recno == 0) recno = RREF_ANY;
@ -7151,6 +7219,7 @@ for (;; ptr++)
        if (lengthptr != NULL)
          {
          named_group *ng;
+          recno = 0;

          if (namelen == 0)
            {
@ -7168,20 +7237,6 @@ for (;; ptr++)
            goto FAILED;
            }

-          /* The name table does not exist in the first pass; instead we must
-          scan the list of names encountered so far in order to get the
-          number. If the name is not found, set the value to 0 for a forward
-          reference. */
-
-          ng = cd->named_groups;
-          for (i = 0; i < cd->names_found; i++, ng++)
-            {
-            if (namelen == ng->length &&
-                STRNCMP_UC_UC(name, ng->name, namelen) == 0)
-              break;
-            }
-          recno = (i < cd->names_found)? ng->number : 0;
-
          /* Count named back references. */

          if (!is_recurse) cd->namedrefcount++;
@ -7191,6 +7246,56 @@ for (;; ptr++)
          16-bit data item. */

          *lengthptr += IMM2_SIZE;
+
+          /* If this is a forward reference and we are within a (?|...) group,
+          the reference may end up as the number of a group which we are
+          currently inside, that is, it could be a recursive reference. In the
+          real compile this will be picked up and the reference wrapped with
+          OP_ONCE to make it atomic, so we must space in case this occurs. */
+
+          /* In fact, this can happen for a non-forward reference because
+          another group with the same number might be created later. This
+          issue is fixed "properly" in PCRE2. As PCRE1 is now in maintenance
+          only mode, we finesse the bug by allowing more memory always. */
+
+          *lengthptr += 2 + 2*LINK_SIZE;
+
+          /* It is even worse than that. The current reference may be to an
+          existing named group with a different number (so apparently not
+          recursive) but which later on is also attached to a group with the
+          current number. This can only happen if $(| has been previous
+          encountered. In that case, we allow yet more memory, just in case.
+          (Again, this is fixed "properly" in PCRE2. */
+
+          if (cd->dupgroups) *lengthptr += 4 + 4*LINK_SIZE;
+
+          /* Otherwise, check for recursion here. The name table does not exist
+          in the first pass; instead we must scan the list of names encountered
+          so far in order to get the number. If the name is not found, leave
+          the value of recno as 0 for a forward reference. */
+
+          else
+            {
+            ng = cd->named_groups;
+            for (i = 0; i < cd->names_found; i++, ng++)
+              {
+              if (namelen == ng->length &&
+                  STRNCMP_UC_UC(name, ng->name, namelen) == 0)
+                {
+                open_capitem *oc;
+                recno = ng->number;
+                if (is_recurse) break;
+                for (oc = cd->open_caps; oc != NULL; oc = oc->next)
+                  {
+                  if (oc->number == recno)
+                    {
+                    oc->flag = TRUE;
+                    break;
+                    }
+                  }
+                }
+              }
+            }
          }

        /* In the real compile, search the name table. We check the name
@ -7237,8 +7342,6 @@ for (;; ptr++)
          for (i++; i < cd->names_found; i++)
            {
            if (STRCMP_UC_UC(slot + IMM2_SIZE, cslot + IMM2_SIZE) != 0) break;
-
-
            count++;
            cslot += cd->name_entry_size;
            }
@ -7247,6 +7350,7 @@ for (;; ptr++)
            {
            if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE;
            previous = code;
+            item_hwm_offset = cd->hwm - cd->start_workspace;
            *code++ = ((options & PCRE_CASELESS) != 0)? OP_DNREFI : OP_DNREF;
            PUT2INC(code, 0, index);
            PUT2INC(code, 0, count);
@ -7284,9 +7388,14 @@ for (;; ptr++)


        /* ------------------------------------------------------------ */
-        case CHAR_R:              /* Recursion */
-        ptr++;                    /* Same as (?0)      */
-        /* Fall through */
+        case CHAR_R:              /* Recursion, same as (?0) */
+        recno = 0;
+        if (*(++ptr) != CHAR_RIGHT_PARENTHESIS)
+          {
+          *errorcodeptr = ERR29;
+          goto FAILED;
+          }
+        goto HANDLE_RECURSION;


        /* ------------------------------------------------------------ */
@ -7323,7 +7432,15 @@ for (;; ptr++)

          recno = 0;
          while(IS_DIGIT(*ptr))
+            {
+            if (recno > INT_MAX / 10 - 1) /* Integer overflow */
+              {
+              while (IS_DIGIT(*ptr)) ptr++;
+              *errorcodeptr = ERR61;
+              goto FAILED;
+              }
            recno = recno * 10 + *ptr++ - CHAR_0;
+            }

          if (*ptr != (pcre_uchar)terminator)
            {
@ -7360,6 +7477,7 @@ for (;; ptr++)
          HANDLE_RECURSION:

          previous = code;
+          item_hwm_offset = cd->hwm - cd->start_workspace;
          called = cd->start_code;

          /* When we are actually compiling, find the bracket that is being
@ -7561,7 +7679,11 @@ for (;; ptr++)
      previous = NULL;
      cd->iscondassert = FALSE;
      }
-    else previous = code;
+    else
+      {
+      previous = code;
+      item_hwm_offset = cd->hwm - cd->start_workspace;
+      }

    *code = bravalue;
    tempcode = code;
@ -7809,7 +7931,7 @@ for (;; ptr++)
        const pcre_uchar *p;
        pcre_uint32 cf;

-        save_hwm_offset = cd->hwm - cd->start_workspace;   /* Normally this is set when '(' is read */
+        item_hwm_offset = cd->hwm - cd->start_workspace;   /* Normally this is set when '(' is read */
        terminator = (*(++ptr) == CHAR_LESS_THAN_SIGN)?
          CHAR_GREATER_THAN_SIGN : CHAR_APOSTROPHE;

@ -7838,7 +7960,7 @@ for (;; ptr++)
        if (*p != (pcre_uchar)terminator)
          {
          *errorcodeptr = ERR57;
-          break;
+          goto FAILED;
          }
        ptr++;
        goto HANDLE_NUMERICAL_RECURSION;
@ -7853,7 +7975,7 @@ for (;; ptr++)
          ptr[1] != CHAR_APOSTROPHE && ptr[1] != CHAR_LEFT_CURLY_BRACKET))
          {
          *errorcodeptr = ERR69;
-          break;
+          goto FAILED;
          }
        is_recurse = FALSE;
        terminator = (*(++ptr) == CHAR_LESS_THAN_SIGN)?
@ -7877,6 +7999,7 @@ for (;; ptr++)
        HANDLE_REFERENCE:
        if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE;
        previous = code;
+        item_hwm_offset = cd->hwm - cd->start_workspace;
        *code++ = ((options & PCRE_CASELESS) != 0)? OP_REFI : OP_REF;
        PUT2INC(code, 0, recno);
        cd->backref_map |= (recno < 32)? (1 << recno) : 1;
@ -7906,6 +8029,7 @@ for (;; ptr++)
        if (!get_ucp(&ptr, &negated, &ptype, &pdata, errorcodeptr))
          goto FAILED;
        previous = code;
+        item_hwm_offset = cd->hwm - cd->start_workspace;
        *code++ = ((escape == ESC_p) != negated)? OP_PROP : OP_NOTPROP;
        *code++ = ptype;
        *code++ = pdata;
@ -7946,6 +8070,7 @@ for (;; ptr++)

          {
          previous = (escape > ESC_b && escape < ESC_Z)? code : NULL;
+          item_hwm_offset = cd->hwm - cd->start_workspace;
          *code++ = (!utf && escape == ESC_C)? OP_ALLANY : escape;
          }
        }
@ -7989,6 +8114,7 @@ for (;; ptr++)

    ONE_CHAR:
    previous = code;
+    item_hwm_offset = cd->hwm - cd->start_workspace;

    /* For caseless UTF-8 mode when UCP support is available, check whether
    this character has more than one other case. If so, generate a special
@ -9164,6 +9290,7 @@ cd->names_found = 0;
 cd->name_entry_size = 0;
 cd->name_table = NULL;
 cd->dupnames = FALSE;
+cd->dupgroups = FALSE;
 cd->namedrefcount = 0;
 cd->start_code = cworkspace;
 cd->hwm = cworkspace;
@ -9336,6 +9463,16 @@ if (cd->hwm > cd->start_workspace)
    int offset, recno;
    cd->hwm -= LINK_SIZE;
    offset = GET(cd->hwm, 0);
+
+    /* Check that the hwm handling hasn't gone wrong. This whole area is
+    rewritten in PCRE2 because there are some obscure cases. */
+
+    if (offset == 0 || codestart[offset-1] != OP_RECURSE)
+      {
+      errorcode = ERR10;
+      break;
+      }
+
    recno = GET(codestart, offset);
    if (recno != prev_recno)
      {
@ -9366,7 +9503,7 @@ used in this code because at least one compiler gives a warning about loss of
 "const" attribute if the cast (pcre_uchar *)codestart is used directly in the
 function call. */

-if ((options & PCRE_NO_AUTO_POSSESS) == 0)
+if (errorcode == 0 && (options & PCRE_NO_AUTO_POSSESS) == 0)
  {
  pcre_uchar *temp = (pcre_uchar *)codestart;
  auto_possessify(temp, utf, cd);
@ -9380,7 +9517,7 @@ OP_RECURSE that are not fixed length get a diagnosic with a useful offset. The
 exceptional ones forgo this. We scan the pattern to check that they are fixed
 length, and set their lengths. */

-if (cd->check_lookbehind)
+if (errorcode == 0 && cd->check_lookbehind)
  {
  pcre_uchar *cc = (pcre_uchar *)codestart;

@ -9593,4 +9730,3 @@ return (pcre32 *)re;
 }

 /* End of pcre_compile.c */
-
--- a/pcre/pcre_exec.c
+++ b/pcre/pcre_exec.c
@ -6685,7 +6685,8 @@ if (md->offset_vector != NULL)
  register int *iend = iptr - re->top_bracket;
  if (iend < md->offset_vector + 2) iend = md->offset_vector + 2;
  while (--iptr >= iend) *iptr = -1;
-  md->offset_vector[0] = md->offset_vector[1] = -1;
+  if (offsetcount > 0) md->offset_vector[0] = -1;
+  if (offsetcount > 1) md->offset_vector[1] = -1;
  }

 /* Set up the first character to match, if available. The first_char value is
--- a/pcre/pcre_internal.h
+++ b/pcre/pcre_internal.h
@ -984,7 +984,7 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
 #ifndef EBCDIC

 #define HSPACE_LIST \
-  CHAR_HT, CHAR_SPACE, 0xa0, \
+  CHAR_HT, CHAR_SPACE, CHAR_NBSP, \
  0x1680, 0x180e, 0x2000, 0x2001, 0x2002, 0x2003, 0x2004, 0x2005, \
  0x2006, 0x2007, 0x2008, 0x2009, 0x200A, 0x202f, 0x205f, 0x3000, \
  NOTACHAR
@ -1010,7 +1010,7 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
 #define HSPACE_BYTE_CASES \
  case CHAR_HT: \
  case CHAR_SPACE: \
-  case 0xa0     /* NBSP */
+  case CHAR_NBSP

 #define HSPACE_CASES \
  HSPACE_BYTE_CASES: \
@ -1037,11 +1037,12 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
 /* ------ EBCDIC environments ------ */

 #else
-#define HSPACE_LIST CHAR_HT, CHAR_SPACE
+#define HSPACE_LIST CHAR_HT, CHAR_SPACE, CHAR_NBSP, NOTACHAR

 #define HSPACE_BYTE_CASES \
  case CHAR_HT: \
-  case CHAR_SPACE
+  case CHAR_SPACE: \
+  case CHAR_NBSP

 #define HSPACE_CASES HSPACE_BYTE_CASES

@ -1215,6 +1216,7 @@ same code point. */

 #define CHAR_ESC                    '\047'
 #define CHAR_DEL                    '\007'
+#define CHAR_NBSP                   '\x41'
 #define STR_ESC                     "\047"
 #define STR_DEL                     "\007"

@ -1229,6 +1231,7 @@ a positive value. */
 #define CHAR_NEL                    ((unsigned char)'\x85')
 #define CHAR_ESC                    '\033'
 #define CHAR_DEL                    '\177'
+#define CHAR_NBSP                   ((unsigned char)'\xa0')

 #define STR_LF                      "\n"
 #define STR_NL                      STR_LF
@ -1606,6 +1609,7 @@ only. */
 #define CHAR_VERTICAL_LINE          '\174'
 #define CHAR_RIGHT_CURLY_BRACKET    '\175'
 #define CHAR_TILDE                  '\176'
+#define CHAR_NBSP                   ((unsigned char)'\xa0')

 #define STR_HT                      "\011"
 #define STR_VT                      "\013"
@ -1762,6 +1766,10 @@ only. */

 /* Escape items that are just an encoding of a particular data value. */

+#ifndef ESC_a
+#define ESC_a CHAR_BEL
+#endif
+
 #ifndef ESC_e
 #define ESC_e CHAR_ESC
 #endif
@ -2446,6 +2454,7 @@ typedef struct compile_data {
  BOOL had_pruneorskip;             /* (*PRUNE) or (*SKIP) encountered */
  BOOL check_lookbehind;            /* Lookbehinds need later checking */
  BOOL dupnames;                    /* Duplicate names exist */
+  BOOL dupgroups;                   /* Duplicate groups exist: (?| found */
  BOOL iscondassert;                /* Next assert is a condition */
  int  nltype;                      /* Newline type */
  int  nllen;                       /* Newline string length */
--- a/pcre/pcre_jit_compile.c
+++ b/pcre/pcre_jit_compile.c
@ -1064,6 +1064,7 @@ pcre_uchar *alternative;
 pcre_uchar *end = NULL;
 int private_data_ptr = *private_data_start;
 int space, size, bracketlen;
+BOOL repeat_check = TRUE;

 while (cc < ccend)
  {
@ -1071,9 +1072,10 @@ while (cc < ccend)
  size = 0;
  bracketlen = 0;
  if (private_data_ptr > SLJIT_MAX_LOCAL_SIZE)
-    return;
+    break;

-  if (*cc == OP_ONCE || *cc == OP_ONCE_NC || *cc == OP_BRA || *cc == OP_CBRA || *cc == OP_COND)
+  if (repeat_check && (*cc == OP_ONCE || *cc == OP_ONCE_NC || *cc == OP_BRA || *cc == OP_CBRA || *cc == OP_COND))
+    {
    if (detect_repeat(common, cc))
      {
      /* These brackets are converted to repeats, so no global
@ -1081,6 +1083,8 @@ while (cc < ccend)
      if (cc >= end)
        end = bracketend(cc);
      }
+    }
+  repeat_check = TRUE;

  switch(*cc)
    {
@ -1136,6 +1140,13 @@ while (cc < ccend)
    bracketlen = 1 + LINK_SIZE + IMM2_SIZE;
    break;

+    case OP_BRAZERO:
+    case OP_BRAMINZERO:
+    case OP_BRAPOSZERO:
+    repeat_check = FALSE;
+    size = 1;
+    break;
+
    CASE_ITERATOR_PRIVATE_DATA_1
    space = 1;
    size = -2;
@ -1162,12 +1173,17 @@ while (cc < ccend)
    size = 1;
    break;

-    CASE_ITERATOR_TYPE_PRIVATE_DATA_2B
+    case OP_TYPEUPTO:
    if (cc[1 + IMM2_SIZE] != OP_ANYNL && cc[1 + IMM2_SIZE] != OP_EXTUNI)
      space = 2;
    size = 1 + IMM2_SIZE;
    break;

+    case OP_TYPEMINUPTO:
+    space = 2;
+    size = 1 + IMM2_SIZE;
+    break;
+
    case OP_CLASS:
    case OP_NCLASS:
    size += 1 + 32 / sizeof(pcre_uchar);
@ -1316,6 +1332,13 @@ while (cc < ccend)
    cc += 1 + LINK_SIZE + IMM2_SIZE;
    break;

+    case OP_THEN:
+    stack_restore = TRUE;
+    if (common->control_head_ptr != 0)
+      *needs_control_head = TRUE;
+    cc ++;
+    break;
+
    default:
    stack_restore = TRUE;
    /* Fall through. */
@ -2220,6 +2243,7 @@ while (current != NULL)
    SLJIT_ASSERT_STOP();
    break;
    }
+  SLJIT_ASSERT(current > (sljit_sw*)current[-1]);
  current = (sljit_sw*)current[-1];
  }
 return -1;
@ -3209,7 +3233,7 @@ bytes[len] = byte;
 bytes[0] = len;
 }

-static int scan_prefix(compiler_common *common, pcre_uchar *cc, pcre_uint32 *chars, pcre_uint8 *bytes, int max_chars)
+static int scan_prefix(compiler_common *common, pcre_uchar *cc, pcre_uint32 *chars, pcre_uint8 *bytes, int max_chars, pcre_uint32 *rec_count)
 {
 /* Recursive function, which scans prefix literals. */
 BOOL last, any, caseless;
@ -3227,9 +3251,14 @@ pcre_uchar othercase[1];
 repeat = 1;
 while (TRUE)
  {
+  if (*rec_count == 0)
+    return 0;
+  (*rec_count)--;
+
  last = TRUE;
  any = FALSE;
  caseless = FALSE;
+
  switch (*cc)
    {
    case OP_CHARI:
@ -3291,7 +3320,7 @@ while (TRUE)
 #ifdef SUPPORT_UTF
    if (common->utf && HAS_EXTRALEN(*cc)) len += GET_EXTRALEN(*cc);
 #endif
-    max_chars = scan_prefix(common, cc + len, chars, bytes, max_chars);
+    max_chars = scan_prefix(common, cc + len, chars, bytes, max_chars, rec_count);
    if (max_chars == 0)
      return consumed;
    last = FALSE;
@ -3314,7 +3343,7 @@ while (TRUE)
    alternative = cc + GET(cc, 1);
    while (*alternative == OP_ALT)
      {
-      max_chars = scan_prefix(common, alternative + 1 + LINK_SIZE, chars, bytes, max_chars);
+      max_chars = scan_prefix(common, alternative + 1 + LINK_SIZE, chars, bytes, max_chars, rec_count);
      if (max_chars == 0)
        return consumed;
      alternative += GET(alternative, 1);
@ -3556,6 +3585,7 @@ int i, max, from;
 int range_right = -1, range_len = 3 - 1;
 sljit_ub *update_table = NULL;
 BOOL in_range;
+pcre_uint32 rec_count;

 for (i = 0; i < MAX_N_CHARS; i++)
  {
@ -3564,7 +3594,8 @@ for (i = 0; i < MAX_N_CHARS; i++)
  bytes[i * MAX_N_BYTES] = 0;
  }

-max = scan_prefix(common, common->start, chars, bytes, MAX_N_CHARS);
+rec_count = 10000;
+max = scan_prefix(common, common->start, chars, bytes, MAX_N_CHARS, &rec_count);

 if (max <= 1)
  return FALSE;
@ -4311,8 +4342,10 @@ switch(length)
  case 4:
  if ((ranges[1] - ranges[0]) == (ranges[3] - ranges[2])
      && (ranges[0] | (ranges[2] - ranges[0])) == ranges[2]
+      && (ranges[1] & (ranges[2] - ranges[0])) == 0
      && is_powerof2(ranges[2] - ranges[0]))
    {
+    SLJIT_ASSERT((ranges[0] & (ranges[2] - ranges[0])) == 0 && (ranges[2] & ranges[3] & (ranges[2] - ranges[0])) != 0);
    OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, ranges[2] - ranges[0]);
    if (ranges[2] + 1 != ranges[3])
      {
@ -4900,7 +4933,8 @@ else if ((cc[-1] & XCL_MAP) != 0)
  if (!check_class_ranges(common, (const pcre_uint8 *)cc, FALSE, TRUE, list))
    {
 #ifdef COMPILE_PCRE8
-    SLJIT_ASSERT(common->utf);
+    jump = NULL;
+    if (common->utf)
 #endif
      jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255);

@ -4911,6 +4945,9 @@ else if ((cc[-1] & XCL_MAP) != 0)
    OP2(SLJIT_AND | SLJIT_SET_E, SLJIT_UNUSED, 0, TMP1, 0, TMP2, 0);
    add_jump(compiler, list, JUMP(SLJIT_NOT_ZERO));

+#ifdef COMPILE_PCRE8
+    if (common->utf)
+#endif
      JUMPHERE(jump);
    }

@ -5219,7 +5256,7 @@ while (*cc != XCL_END)
      OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_UNUSED, 0, SLJIT_LESS_EQUAL);

      SET_CHAR_OFFSET(0);
-      OP2(SLJIT_SUB | SLJIT_SET_U, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0xff);
+      OP2(SLJIT_SUB | SLJIT_SET_U, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x7f);
      OP_FLAGS(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_LESS_EQUAL);

      SET_TYPE_OFFSET(ucp_Pc);
@ -7665,6 +7702,10 @@ while (*cc != OP_KETRPOS)
      OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), STR_PTR, 0);
      }

+    /* Even if the match is empty, we need to reset the control head. */
+    if (needs_control_head)
+      OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
+
    if (opcode == OP_SBRAPOS || opcode == OP_SCBRAPOS)
      add_jump(compiler, &emptymatch, CMP(SLJIT_EQUAL, TMP1, 0, STR_PTR, 0));

@ -7692,6 +7733,10 @@ while (*cc != OP_KETRPOS)
      OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), (framesize + 1) * sizeof(sljit_sw), STR_PTR, 0);
      }

+    /* Even if the match is empty, we need to reset the control head. */
+    if (needs_control_head)
+      OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
+
    if (opcode == OP_SBRAPOS || opcode == OP_SCBRAPOS)
      add_jump(compiler, &emptymatch, CMP(SLJIT_EQUAL, TMP1, 0, STR_PTR, 0));

@ -7704,9 +7749,6 @@ while (*cc != OP_KETRPOS)
      }
    }

-  if (needs_control_head)
-    OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
-
  JUMPTO(SLJIT_JUMP, loop);
  flush_stubs(common);

@ -8441,7 +8483,6 @@ while (cc < ccend)
      OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(1), STR_PTR, 0);
      }
    BACKTRACK_AS(braminzero_backtrack)->matchingpath = LABEL();
-    if (cc[1] > OP_ASSERTBACK_NOT)
    count_match(common);
    break;

@ -9624,7 +9665,7 @@ static SLJIT_INLINE void compile_recurse(compiler_common *common)
 DEFINE_COMPILER;
 pcre_uchar *cc = common->start + common->currententry->start;
 pcre_uchar *ccbegin = cc + 1 + LINK_SIZE + (*cc == OP_BRA ? 0 : IMM2_SIZE);
-pcre_uchar *ccend = bracketend(cc);
+pcre_uchar *ccend = bracketend(cc) - (1 + LINK_SIZE);
 BOOL needs_control_head;
 int framesize = get_framesize(common, cc, NULL, TRUE, &needs_control_head);
 int private_data_size = get_private_data_copy_length(common, ccbegin, ccend, needs_control_head);
@ -9648,6 +9689,7 @@ set_jumps(common->currententry->calls, common->currententry->entry);

 sljit_emit_fast_enter(compiler, TMP2, 0);
 allocate_stack(common, private_data_size + framesize + alternativesize);
+count_match(common);
 OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(private_data_size + framesize + alternativesize - 1), TMP2, 0);
 copy_private_data(common, ccbegin, ccend, TRUE, private_data_size + framesize + alternativesize, framesize + alternativesize, needs_control_head);
 if (needs_control_head)
@ -9992,6 +10034,7 @@ OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, stack));
 OP1(SLJIT_MOV_UI, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, limit_match));
 OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(struct sljit_stack, base));
 OP1(SLJIT_MOV, STACK_LIMIT, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(struct sljit_stack, limit));
+OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 1);
 OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LIMIT_MATCH, TMP1, 0);

 if (mode == JIT_PARTIAL_SOFT_COMPILE)
--- a/pcre/pcre_jit_test.c
+++ b/pcre/pcre_jit_test.c
@ -182,6 +182,7 @@ static struct regression_test_case regression_test_cases[] = {
 	{ CMUAP, 0, "\xf0\x90\x90\x80{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" },
 	{ CMUAP, 0, "\xf0\x90\x90\xa8{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" },
 	{ CMUAP, 0, "\xe1\xbd\xb8\xe1\xbf\xb8", "\xe1\xbf\xb8\xe1\xbd\xb8" },
+	{ MA, 0, "[3-57-9]", "5" },

 	/* Assertions. */
 	{ MUA, 0, "\\b[^A]", "A_B#" },
--- a/pcre/pcre_study.c
+++ b/pcre/pcre_study.c
@ -71,6 +71,7 @@ Arguments:
  startcode       pointer to start of the whole pattern's code
  options         the compiling options
  recurses        chain of recurse_check to catch mutual recursion
+  countptr        pointer to call count (to catch over complexity)

 Returns:   the minimum length
           -1 if \C in UTF-8 mode or (*ACCEPT) was encountered
@ -80,7 +81,8 @@ Returns:   the minimum length

 static int
 find_minlength(const REAL_PCRE *re, const pcre_uchar *code,
-  const pcre_uchar *startcode, int options, recurse_check *recurses)
+  const pcre_uchar *startcode, int options, recurse_check *recurses,
+  int *countptr)
 {
 int length = -1;
 /* PCRE_UTF16 has the same value as PCRE_UTF8. */
@ -90,6 +92,8 @@ recurse_check this_recurse;
 register int branchlength = 0;
 register pcre_uchar *cc = (pcre_uchar *)code + 1 + LINK_SIZE;

+if ((*countptr)++ > 1000) return -1;   /* too complex */
+
 if (*code == OP_CBRA || *code == OP_SCBRA ||
    *code == OP_CBRAPOS || *code == OP_SCBRAPOS) cc += IMM2_SIZE;

@ -131,7 +135,7 @@ for (;;)
    case OP_SBRAPOS:
    case OP_ONCE:
    case OP_ONCE_NC:
-    d = find_minlength(re, cc, startcode, options, recurses);
+    d = find_minlength(re, cc, startcode, options, recurses, countptr);
    if (d < 0) return d;
    branchlength += d;
    do cc += GET(cc, 1); while (*cc == OP_ALT);
@ -415,7 +419,8 @@ for (;;)
            int dd;
            this_recurse.prev = recurses;
            this_recurse.group = cs;
-            dd = find_minlength(re, cs, startcode, options, &this_recurse);
+            dd = find_minlength(re, cs, startcode, options, &this_recurse,
+              countptr);
            if (dd < d) d = dd;
            }
          }
@ -451,7 +456,8 @@ for (;;)
          {
          this_recurse.prev = recurses;
          this_recurse.group = cs;
-          d = find_minlength(re, cs, startcode, options, &this_recurse);
+          d = find_minlength(re, cs, startcode, options, &this_recurse,
+            countptr);
          }
        }
      }
@ -514,7 +520,7 @@ for (;;)
        this_recurse.prev = recurses;
        this_recurse.group = cs;
        branchlength += find_minlength(re, cs, startcode, options,
-          &this_recurse);
+          &this_recurse, countptr);
        }
      }
    cc += 1 + LINK_SIZE;
@ -1453,6 +1459,7 @@ pcre32_study(const pcre32 *external_re, int options, const char **errorptr)
 #endif
 {
 int min;
+int count = 0;
 BOOL bits_set = FALSE;
 pcre_uint8 start_bits[32];
 PUBL(extra) *extra = NULL;
@ -1539,7 +1546,7 @@ if ((re->options & PCRE_ANCHORED) == 0 &&

 /* Find the minimum length of subject string. */

-switch(min = find_minlength(re, code, code, re->options, NULL))
+switch(min = find_minlength(re, code, code, re->options, NULL, &count))
  {
  case -2: *errorptr = "internal error: missing capturing bracket"; return NULL;
  case -3: *errorptr = "internal error: opcode not recognized"; return NULL;
--- a/pcre/pcre_xclass.c
+++ b/pcre/pcre_xclass.c
@ -246,7 +246,7 @@ while ((t = *data++) != XCL_END)

      case PT_PXPUNCT:
      if ((PRIV(ucp_gentype)[prop->chartype] == ucp_P ||
-            (c < 256 && PRIV(ucp_gentype)[prop->chartype] == ucp_S)) == isprop)
+            (c < 128 && PRIV(ucp_gentype)[prop->chartype] == ucp_S)) == isprop)
        return !negated;
      break;

--- a/pcre/pcregrep.c
+++ b/pcre/pcregrep.c
@ -1692,9 +1692,13 @@ while (ptr < endptr)

    if (filenames == FN_NOMATCH_ONLY) return 1;

+    /* If all we want is a yes/no answer, stop now. */
+
+    if (quiet) return 0;
+
    /* Just count if just counting is wanted. */

-    if (count_only) count++;
+    else if (count_only) count++;

    /* When handling a binary file and binary-files==binary, the "binary"
    variable will be set true (it's false in all other cases). In this
@ -1715,10 +1719,6 @@ while (ptr < endptr)
      return 0;
      }

-    /* Likewise, if all we want is a yes/no answer. */
-
-    else if (quiet) return 0;
-
    /* The --only-matching option prints just the substring that matched,
    and/or one or more captured portions of it, as long as these strings are
    not empty. The --file-offsets and --line-offsets options output offsets for
@ -2089,7 +2089,7 @@ if (filenames == FN_NOMATCH_ONLY)

 /* Print the match count if wanted */

-if (count_only)
+if (count_only && !quiet)
  {
  if (count > 0 || !omit_zero_count)
    {
--- a/pcre/pcretest.c
+++ b/pcre/pcretest.c
@ -4621,9 +4621,9 @@ while (!done)

      else switch ((c = *p++))
        {
-        case 'a': c =    7; break;
+        case 'a': c =  CHAR_BEL; break;
        case 'b': c = '\b'; break;
-        case 'e': c =   27; break;
+        case 'e': c =  CHAR_ESC; break;
        case 'f': c = '\f'; break;
        case 'n': c = '\n'; break;
        case 'r': c = '\r'; break;
--- a/pcre/testdata/grepoutput
+++ b/pcre/testdata/grepoutput
@ -751,3 +751,7 @@ RC=0
 2:3,1
 2:4,1
 RC=0
+---------------------------- Test 108 ------------------------------
+RC=0
+---------------------------- Test 109 -----------------------------
+RC=0
--- a/pcre/testdata/testinput1
+++ b/pcre/testdata/testinput1
@ -5730,4 +5730,7 @@ AbcdCBefgBhiBqz
 "(?1)(?#?'){8}(a)"
    baaaaaaaaac

+"(?|(\k'Pm')|(?'Pm'))"
+    abcd
+
 /-- End of testinput1 --/
--- a/pcre/testdata/testinput11
+++ b/pcre/testdata/testinput11
@ -136,4 +136,6 @@ is required for these tests. --/

 /((?+1)(\1))/B

+/.((?2)(?R)\1)()/B
+
 /-- End of testinput11 --/
--- a/pcre/testdata/testinput12
+++ b/pcre/testdata/testinput12
--- a/pcre/testdata/testinput14
+++ b/pcre/testdata/testinput14
@ -340,4 +340,6 @@ not matter. --/

 /[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/BZ

+/(?'ABC'[bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar](*THEN:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/
+
 /-- End of testinput14 --/
--- a/pcre/testdata/testinput17
+++ b/pcre/testdata/testinput17
--- a/pcre/testdata/testinput2
+++ b/pcre/testdata/testinput2
--- a/pcre/testdata/testinput6
+++ b/pcre/testdata/testinput6
@ -1502,4 +1502,55 @@
 /\C\X*QT/8
    Ӆ\x0aT

+/[\pS#moq]/
+    =
+
+/[[:punct:]]/8W
+    \xc2\xb4
+    \x{b4} 
+
+/[[:^ascii:]]/8W
+    \x{100}
+    \x{200}
+    \x{300}
+    \x{37e}
+    a
+    9
+    g
+
+/[[:^ascii:]\w]/8W
+    a
+    9
+    g
+    \x{100}
+    \x{200}
+    \x{300}
+    \x{37e}
+
+/[\w[:^ascii:]]/8W
+    a
+    9
+    g
+    \x{100}
+    \x{200}
+    \x{300}
+    \x{37e}
+
+/[^[:ascii:]\W]/8W
+    a
+    9
+    g
+    \x{100}
+    \x{200}
+    \x{300}
+    \x{37e}
+
+/[[:^ascii:]a]/8W
+    a
+    9
+    g
+    \x{100}
+    \x{200}
+    \x{37e}
+
 /-- End of testinput6 --/
--- a/pcre/testdata/testinput7
+++ b/pcre/testdata/testinput7
@ -838,4 +838,19 @@ of case for anything other than the ASCII letters. --/
 /^s?c/mi8I
    scat

+/[\W\p{Any}]/BZ
+    abc
+    123 
+
+/[\W\pL]/BZ
+    abc
+    ** Failers 
+    123     
+
+/a[[:punct:]b]/WBZ
+
+/a[[:punct:]b]/8WBZ
+
+/a[b[:punct:]]/8WBZ
+
 /-- End of testinput7 --/
--- a/pcre/testdata/testinputEBC
+++ b/pcre/testdata/testinputEBC
@ -29,13 +29,16 @@ in EBCDIC, but can be specified as escapes. --/

 /^A\ˆ/
    A B
+    A\x41B

 /-- Test \H --/

 /^A\È/
    AB
+    A\x42B
    ** Fail
    A B
+    A\x41B

 /-- Test \R --/

--- a/pcre/testdata/testoutput1
+++ b/pcre/testdata/testoutput1
@ -9429,4 +9429,9 @@ No match
 0: aaaaaaaaa
 1: a

+"(?|(\k'Pm')|(?'Pm'))"
+    abcd
+ 0: 
+ 1: 
+
 /-- End of testinput1 --/
--- a/pcre/testdata/testoutput11-16
+++ b/pcre/testdata/testoutput11-16
@ -231,7 +231,7 @@ Memory allocation (code space): 73
 ------------------------------------------------------------------

 /(?P<a>a)...(?P=a)bbb(?P>a)d/BM
-Memory allocation (code space): 61
+Memory allocation (code space): 77
 ------------------------------------------------------------------
  0  24 Bra
  2   5 CBra 1
@ -650,18 +650,18 @@ Memory allocation (code space): 14

 /[[:^alpha:][:^cntrl:]]+/8WB
 ------------------------------------------------------------------
-  0  26 Bra
-  2     [ -~\x80-\xff\P{L}]++
- 26  26 Ket
- 28     End
+  0  30 Bra
+  2     [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
+ 30  30 Ket
+ 32     End
 ------------------------------------------------------------------

 /[[:^cntrl:][:^alpha:]]+/8WB
 ------------------------------------------------------------------
-  0  26 Bra
-  2     [ -~\x80-\xff\P{L}]++
- 26  26 Ket
- 28     End
+  0  30 Bra
+  2     [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
+ 30  30 Ket
+ 32     End
 ------------------------------------------------------------------

 /[[:alpha:]]+/8WB
@ -748,4 +748,21 @@ Memory allocation (code space): 14
 22     End
 ------------------------------------------------------------------

+/.((?2)(?R)\1)()/B
+------------------------------------------------------------------
+  0  23 Bra
+  2     Any
+  3  13 Once
+  5   9 CBra 1
+  8  18 Recurse
+ 10   0 Recurse
+ 12     \1
+ 14   9 Ket
+ 16  13 Ket
+ 18   3 CBra 2
+ 21   3 Ket
+ 23  23 Ket
+ 25     End
+------------------------------------------------------------------
+
 /-- End of testinput11 --/
--- a/pcre/testdata/testoutput11-32
+++ b/pcre/testdata/testoutput11-32
@ -231,7 +231,7 @@ Memory allocation (code space): 155
 ------------------------------------------------------------------

 /(?P<a>a)...(?P=a)bbb(?P>a)d/BM
-Memory allocation (code space): 125
+Memory allocation (code space): 157
 ------------------------------------------------------------------
  0  24 Bra
  2   5 CBra 1
@ -650,18 +650,18 @@ Memory allocation (code space): 28

 /[[:^alpha:][:^cntrl:]]+/8WB
 ------------------------------------------------------------------
-  0  18 Bra
-  2     [ -~\x80-\xff\P{L}]++
- 18  18 Ket
- 20     End
+  0  21 Bra
+  2     [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
+ 21  21 Ket
+ 23     End
 ------------------------------------------------------------------

 /[[:^cntrl:][:^alpha:]]+/8WB
 ------------------------------------------------------------------
-  0  18 Bra
-  2     [ -~\x80-\xff\P{L}]++
- 18  18 Ket
- 20     End
+  0  21 Bra
+  2     [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
+ 21  21 Ket
+ 23     End
 ------------------------------------------------------------------

 /[[:alpha:]]+/8WB
@ -748,4 +748,21 @@ Memory allocation (code space): 28
 22     End
 ------------------------------------------------------------------

+/.((?2)(?R)\1)()/B
+------------------------------------------------------------------
+  0  23 Bra
+  2     Any
+  3  13 Once
+  5   9 CBra 1
+  8  18 Recurse
+ 10   0 Recurse
+ 12     \1
+ 14   9 Ket
+ 16  13 Ket
+ 18   3 CBra 2
+ 21   3 Ket
+ 23  23 Ket
+ 25     End
+------------------------------------------------------------------
+
 /-- End of testinput11 --/
--- a/pcre/testdata/testoutput11-8
+++ b/pcre/testdata/testoutput11-8
@ -231,7 +231,7 @@ Memory allocation (code space): 45
 ------------------------------------------------------------------

 /(?P<a>a)...(?P=a)bbb(?P>a)d/BM
-Memory allocation (code space): 38
+Memory allocation (code space): 50
 ------------------------------------------------------------------
  0  30 Bra
  3   7 CBra 1
@ -650,18 +650,18 @@ Memory allocation (code space): 10

 /[[:^alpha:][:^cntrl:]]+/8WB
 ------------------------------------------------------------------
-  0  44 Bra
-  3     [ -~\x80-\xff\P{L}]++
- 44  44 Ket
- 47     End
+  0  51 Bra
+  3     [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
+ 51  51 Ket
+ 54     End
 ------------------------------------------------------------------

 /[[:^cntrl:][:^alpha:]]+/8WB
 ------------------------------------------------------------------
-  0  44 Bra
-  3     [ -~\x80-\xff\P{L}]++
- 44  44 Ket
- 47     End
+  0  51 Bra
+  3     [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
+ 51  51 Ket
+ 54     End
 ------------------------------------------------------------------

 /[[:alpha:]]+/8WB
@ -748,4 +748,21 @@ Memory allocation (code space): 10
 34     End
 ------------------------------------------------------------------

+/.((?2)(?R)\1)()/B
+------------------------------------------------------------------
+  0  35 Bra
+  3     Any
+  4  20 Once
+  7  14 CBra 1
+ 12  27 Recurse
+ 15   0 Recurse
+ 18     \1
+ 21  14 Ket
+ 24  20 Ket
+ 27   5 CBra 2
+ 32   5 Ket
+ 35  35 Ket
+ 38     End
+------------------------------------------------------------------
+
 /-- End of testinput11 --/
--- a/pcre/testdata/testoutput12
+++ b/pcre/testdata/testoutput12
--- a/pcre/testdata/testoutput14
+++ b/pcre/testdata/testoutput14
@ -527,4 +527,6 @@ Failed: character value in \u.... sequence is too large at offset 6
        End
 ------------------------------------------------------------------

+/(?'ABC'[bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar](*THEN:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/
+
 /-- End of testinput14 --/
--- a/pcre/testdata/testoutput17
+++ b/pcre/testdata/testoutput17
--- a/pcre/testdata/testoutput2
+++ b/pcre/testdata/testoutput2
--- a/pcre/testdata/testoutput6
+++ b/pcre/testdata/testoutput6
@ -2469,4 +2469,92 @@ No match
    Ӆ\x0aT
 No match

+/[\pS#moq]/
+    =
+ 0: =
+
+/[[:punct:]]/8W
+    \xc2\xb4
+No match
+    \x{b4} 
+No match
+
+/[[:^ascii:]]/8W
+    \x{100}
+ 0: \x{100}
+    \x{200}
+ 0: \x{200}
+    \x{300}
+ 0: \x{300}
+    \x{37e}
+ 0: \x{37e}
+    a
+No match
+    9
+No match
+    g
+No match
+
+/[[:^ascii:]\w]/8W
+    a
+ 0: a
+    9
+ 0: 9
+    g
+ 0: g
+    \x{100}
+ 0: \x{100}
+    \x{200}
+ 0: \x{200}
+    \x{300}
+ 0: \x{300}
+    \x{37e}
+ 0: \x{37e}
+
+/[\w[:^ascii:]]/8W
+    a
+ 0: a
+    9
+ 0: 9
+    g
+ 0: g
+    \x{100}
+ 0: \x{100}
+    \x{200}
+ 0: \x{200}
+    \x{300}
+ 0: \x{300}
+    \x{37e}
+ 0: \x{37e}
+
+/[^[:ascii:]\W]/8W
+    a
+No match
+    9
+No match
+    g
+No match
+    \x{100}
+ 0: \x{100}
+    \x{200}
+ 0: \x{200}
+    \x{300}
+No match
+    \x{37e}
+No match
+
+/[[:^ascii:]a]/8W
+    a
+ 0: a
+    9
+No match
+    g
+No match
+    \x{100}
+ 0: \x{100}
+    \x{200}
+ 0: \x{200}
+    \x{37e}
+ 0: \x{37e}
+
 /-- End of testinput6 --/
--- a/pcre/testdata/testoutput7
+++ b/pcre/testdata/testoutput7
@ -949,7 +949,7 @@ No match
 /[[:^alpha:][:^cntrl:]]+/8WBZ
 ------------------------------------------------------------------
        Bra
-        [ -~\x80-\xff\P{L}]++
+        [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
        Ket
        End
 ------------------------------------------------------------------
@ -961,7 +961,7 @@ No match
 /[[:^cntrl:][:^alpha:]]+/8WBZ
 ------------------------------------------------------------------
        Bra
-        [ -~\x80-\xff\P{L}]++
+        [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
        Ket
        End
 ------------------------------------------------------------------
@ -2295,4 +2295,57 @@ Need char = 'c' (caseless)
    scat
 0: sc

+/[\W\p{Any}]/BZ
+------------------------------------------------------------------
+        Bra
+        [\x00-/:-@[-^`{-\xff\p{Any}]
+        Ket
+        End
+------------------------------------------------------------------
+    abc
+ 0: a
+    123 
+ 0: 1
+
+/[\W\pL]/BZ
+------------------------------------------------------------------
+        Bra
+        [\x00-/:-@[-^`{-\xff\p{L}]
+        Ket
+        End
+------------------------------------------------------------------
+    abc
+ 0: a
+    ** Failers 
+ 0: *
+    123     
+No match
+
+/a[[:punct:]b]/WBZ
+------------------------------------------------------------------
+        Bra
+        a
+        [b[:punct:]]
+        Ket
+        End
+------------------------------------------------------------------
+
+/a[[:punct:]b]/8WBZ
+------------------------------------------------------------------
+        Bra
+        a
+        [b[:punct:]]
+        Ket
+        End
+------------------------------------------------------------------
+
+/a[b[:punct:]]/8WBZ
+------------------------------------------------------------------
+        Bra
+        a
+        [b[:punct:]]
+        Ket
+        End
+------------------------------------------------------------------
+
 /-- End of testinput7 --/
--- a/pcre/testdata/testoutputEBC
+++ b/pcre/testdata/testoutputEBC
@ -41,16 +41,22 @@ No match
 /^A\ˆ/
    A B
 0: A\x20
+    A\x41B
+ 0: AA

 /-- Test \H --/

 /^A\È/
    AB
+ 0: AB
+    A\x42B
 0: AB
    ** Fail
 No match
    A B
 No match
+    A\x41B
+No match

 /-- Test \R --/