Merge branch 'merge/merge-pcre' into 10.0

This commit is contained in:
Sergei Golubchik 2015-12-13 16:25:57 +01:00
commit 095b7b92d1
39 changed files with 2224 additions and 1263 deletions

View File

@ -1,6 +1,182 @@
ChangeLog for PCRE
------------------
Note that the PCRE 8.xx series (PCRE1) is now in a bugfix-only state. All
development is happening in the PCRE2 10.xx series.
Version 8.38 23-November-2015
-----------------------------
1. If a group that contained a recursive back reference also contained a
forward reference subroutine call followed by a non-forward-reference
subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
compile correct code, leading to undefined behaviour or an internally
detected error. This bug was discovered by the LLVM fuzzer.
2. Quantification of certain items (e.g. atomic back references) could cause
incorrect code to be compiled when recursive forward references were
involved. For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/.
This bug was discovered by the LLVM fuzzer.
3. A repeated conditional group whose condition was a reference by name caused
a buffer overflow if there was more than one group with the given name.
This bug was discovered by the LLVM fuzzer.
4. A recursive back reference by name within a group that had the same name as
another group caused a buffer overflow. For example:
/(?J)(?'d'(?'d'\g{d}))/. This bug was discovered by the LLVM fuzzer.
5. A forward reference by name to a group whose number is the same as the
current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused
a buffer overflow at compile time. This bug was discovered by the LLVM
fuzzer.
6. A lookbehind assertion within a set of mutually recursive subpatterns could
provoke a buffer overflow. This bug was discovered by the LLVM fuzzer.
7. Another buffer overflow bug involved duplicate named groups with a
reference between their definition, with a group that reset capture
numbers, for example: /(?J:(?|(?'R')(\k'R')|((?'R'))))/. This has been
fixed by always allowing for more memory, even if not needed. (A proper fix
is implemented in PCRE2, but it involves more refactoring.)
8. There was no check for integer overflow in subroutine calls such as (?123).
9. The table entry for \l in EBCDIC environments was incorrect, leading to its
being treated as a literal 'l' instead of causing an error.
10. There was a buffer overflow if pcre_exec() was called with an ovector of
size 1. This bug was found by american fuzzy lop.
11. If a non-capturing group containing a conditional group that could match
an empty string was repeated, it was not identified as matching an empty
string itself. For example: /^(?:(?(1)x|)+)+$()/.
12. In an EBCDIC environment, pcretest was mishandling the escape sequences
\a and \e in test subject lines.
13. In an EBCDIC environment, \a in a pattern was converted to the ASCII
instead of the EBCDIC value.
14. The handling of \c in an EBCDIC environment has been revised so that it is
now compatible with the specification in Perl's perlebcdic page.
15. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
ASCII/Unicode. This has now been added to the list of characters that are
recognized as white space in EBCDIC.
16. When PCRE was compiled without UCP support, the use of \p and \P gave an
error (correctly) when used outside a class, but did not give an error
within a class.
17. \h within a class was incorrectly compiled in EBCDIC environments.
18. A pattern with an unmatched closing parenthesis that contained a backward
assertion which itself contained a forward reference caused buffer
overflow. And example pattern is: /(?=di(?<=(?1))|(?=(.))))/.
19. JIT should return with error when the compiled pattern requires more stack
space than the maximum.
20. A possessively repeated conditional group that could match an empty string,
for example, /(?(R))*+/, was incorrectly compiled.
21. Fix infinite recursion in the JIT compiler when certain patterns such as
/(?:|a|){100}x/ are analysed.
22. Some patterns with character classes involving [: and \\ were incorrectly
compiled and could cause reading from uninitialized memory or an incorrect
error diagnosis.
23. Pathological patterns containing many nested occurrences of [: caused
pcre_compile() to run for a very long time.
24. A conditional group with only one branch has an implicit empty alternative
branch and must therefore be treated as potentially matching an empty
string.
25. If (?R was followed by - or + incorrect behaviour happened instead of a
diagnostic.
26. Arrange to give up on finding the minimum matching length for overly
complex patterns.
27. Similar to (4) above: in a pattern with duplicated named groups and an
occurrence of (?| it is possible for an apparently non-recursive back
reference to become recursive if a later named group with the relevant
number is encountered. This could lead to a buffer overflow. Wen Guanxing
from Venustech ADLAB discovered this bug.
28. If pcregrep was given the -q option with -c or -l, or when handling a
binary file, it incorrectly wrote output to stdout.
29. The JIT compiler did not restore the control verb head in case of *THEN
control verbs. This issue was found by Karl Skomski with a custom LLVM
fuzzer.
30. Error messages for syntax errors following \g and \k were giving inaccurate
offsets in the pattern.
31. Added a check for integer overflow in conditions (?(<digits>) and
(?(R<digits>). This omission was discovered by Karl Skomski with the LLVM
fuzzer.
32. Handling recursive references such as (?2) when the reference is to a group
later in the pattern uses code that is very hacked about and error-prone.
It has been re-written for PCRE2. Here in PCRE1, a check has been added to
give an internal error if it is obvious that compiling has gone wrong.
33. The JIT compiler should not check repeats after a {0,1} repeat byte code.
This issue was found by Karl Skomski with a custom LLVM fuzzer.
34. The JIT compiler should restore the control chain for empty possessive
repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer.
35. Match limit check added to JIT recursion. This issue was found by Karl
Skomski with a custom LLVM fuzzer.
36. Yet another case similar to 27 above has been circumvented by an
unconditional allocation of extra memory. This issue is fixed "properly" in
PCRE2 by refactoring the way references are handled. Wen Guanxing
from Venustech ADLAB discovered this bug.
37. Fix two assertion fails in JIT. These issues were found by Karl Skomski
with a custom LLVM fuzzer.
38. Fixed a corner case of range optimization in JIT.
39. An incorrect error "overran compiling workspace" was given if there were
exactly enough group forward references such that the last one extended
into the workspace safety margin. The next one would have expanded the
workspace. The test for overflow was not including the safety margin.
40. A match limit issue is fixed in JIT which was found by Karl Skomski
with a custom LLVM fuzzer.
41. Remove the use of /dev/null in testdata/testinput2, because it doesn't
work under Windows. (Why has it taken so long for anyone to notice?)
42. In a character class such as [\W\p{Any}] where both a negative-type escape
("not a word character") and a property escape were present, the property
escape was being ignored.
43. Fix crash caused by very long (*MARK) or (*THEN) names.
44. A sequence such as [[:punct:]b] that is, a POSIX character class followed
by a single ASCII character in a class item, was incorrectly compiled in
UCP mode. The POSIX class got lost, but only if the single character
followed it.
45. [:punct:] in UCP mode was matching some characters in the range 128-255
that should not have been matched.
46. If [:^ascii:] or [:^xdigit:] or [:^cntrl:] are present in a non-negated
class, all characters with code points greater than 255 are in the class.
When a Unicode property was also in the class (if PCRE_UCP is set, escapes
such as \w are turned into Unicode properties), wide characters were not
correctly handled, and could fail to match.
Version 8.37 28-April-2015
--------------------------

View File

@ -1,6 +1,14 @@
News about PCRE releases
------------------------
Release 8.38 23-November-2015
-----------------------------
This is bug-fix release. Note that this library (now called PCRE1) is now being
maintained for bug fixes only. New projects are advised to use the new PCRE2
libraries.
Release 8.37 28-April-2015
--------------------------

View File

@ -764,9 +764,9 @@ required. For details, please see this web site:
http://www.zaconsultants.net
There is also a mirror here:
http://www.vsoft-software.com/downloads.html
You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
executable, is in EBCDIC and native z/OS file formats and this is the
recommended download site.
==========================
Last Updated: 10 February 2015
Last Updated: 25 June 2015

View File

@ -512,6 +512,14 @@ echo "aaaaa" >>testtemp1grep
(cd $srcdir; $valgrind $pcregrep --line-offsets '(?<=\Ka)' $builddir/testtemp1grep) >>testtrygrep 2>&1
echo "RC=$?" >>testtrygrep
echo "---------------------------- Test 108 ------------------------------" >>testtrygrep
(cd $srcdir; $valgrind $pcregrep -lq PATTERN ./testdata/grepinput ./testdata/grepinputx) >>testtrygrep
echo "RC=$?" >>testtrygrep
echo "---------------------------- Test 109 -----------------------------" >>testtrygrep
(cd $srcdir; $valgrind $pcregrep -cq lazy ./testdata/grepinput*) >>testtrygrep
echo "RC=$?" >>testtrygrep
# Now compare the results.
$cf $srcdir/testdata/grepoutput testtrygrep

View File

@ -9,17 +9,17 @@ dnl The PCRE_PRERELEASE feature is for identifying release candidates. It might
dnl be defined as -RC2, for example. For real releases, it should be empty.
m4_define(pcre_major, [8])
m4_define(pcre_minor, [37])
m4_define(pcre_minor, [38])
m4_define(pcre_prerelease, [])
m4_define(pcre_date, [2015-04-28])
m4_define(pcre_date, [2015-11-23])
# NOTE: The CMakeLists.txt file searches for the above variables in the first
# 50 lines of this file. Please update that if the variables above are moved.
# Libtool shared library interface versions (current:revision:age)
m4_define(libpcre_version, [3:5:2])
m4_define(libpcre16_version, [2:5:2])
m4_define(libpcre32_version, [0:5:0])
m4_define(libpcre_version, [3:6:2])
m4_define(libpcre16_version, [2:6:2])
m4_define(libpcre32_version, [0:6:0])
m4_define(libpcreposix_version, [0:3:0])
m4_define(libpcrecpp_version, [0:1:0])

View File

@ -764,9 +764,9 @@ required. For details, please see this web site:
http://www.zaconsultants.net
There is also a mirror here:
http://www.vsoft-software.com/downloads.html
You may download PCRE from WWW.CBTTAPE.ORG, file 882.  Everything, source and
executable, is in EBCDIC and native z/OS file formats and this is the
recommended download site.
==========================
Last Updated: 10 February 2015
Last Updated: 25 June 2015

View File

@ -329,7 +329,8 @@ A second use of backslash provides a way of encoding non-printing characters
in patterns in a visible manner. There is no restriction on the appearance of
non-printing characters, apart from the binary zero that terminates a pattern,
but when a pattern is being prepared by text editing, it is often easier to use
one of the following escape sequences than the binary character it represents:
one of the following escape sequences than the binary character it represents.
In an ASCII or Unicode environment, these escapes are as follows:
<pre>
\a alarm, that is, the BEL character (hex 07)
\cx "control-x", where x is any ASCII character
@ -353,19 +354,33 @@ data item (byte or 16-bit value) following \c has a value greater than 127, a
compile-time error occurs. This locks out non-ASCII characters in all modes.
</P>
<P>
The \c facility was designed for use with ASCII characters, but with the
extension to Unicode it is even less useful than it once was. It is, however,
recognized when PCRE is compiled in EBCDIC mode, where data items are always
bytes. In this mode, all values are valid after \c. If the next character is a
lower case letter, it is converted to upper case. Then the 0xc0 bits of the
byte are inverted. Thus \cA becomes hex 01, as in ASCII (A is C1), but because
the EBCDIC letters are disjoint, \cZ becomes hex 29 (Z is E9), and other
characters also generate different values.
When PCRE is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
generate the appropriate EBCDIC code values. The \c escape is processed
as specified for Perl in the <b>perlebcdic</b> document. The only characters
that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
other character provokes a compile-time error. The sequence \@ encodes
character code 0; the letters (in either case) encode characters 1-26 (hex 01
to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
\? becomes either 255 (hex FF) or 95 (hex 5F).
</P>
<P>
Thus, apart from \?, these escapes generate the same character code values as
they do in an ASCII environment, though the meanings of the values mostly
differ. For example, \G always generates code value 7, which is BEL in ASCII
but DEL in EBCDIC.
</P>
<P>
The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
because 127 is not a control character in EBCDIC, Perl makes it generate the
APC character. Unfortunately, there are several variants of EBCDIC. In most of
them the APC character has the value 255 (hex FF), but in the one Perl calls
POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
values, PCRE makes \? generate 95; otherwise it generates 255.
</P>
<P>
After \0 up to two further octal digits are read. If there are fewer than two
digits, just those that are present are used. Thus the sequence \0\x\07
specifies two binary zeros followed by a BEL character (code value 7). Make
digits, just those that are present are used. Thus the sequence \0\x\015
specifies two binary zeros followed by a CR character (code value 13). Make
sure you supply two digits after the initial zero if the pattern character that
follows is itself an octal digit.
</P>
@ -3249,9 +3264,9 @@ Cambridge CB2 3QH, England.
</P>
<br><a name="SEC30" href="#TOC1">REVISION</a><br>
<P>
Last updated: 08 January 2014
Last updated: 14 June 2015
<br>
Copyright &copy; 1997-2014 University of Cambridge.
Copyright &copy; 1997-2015 University of Cambridge.
<br>
<p>
Return to the <a href="index.html">PCRE index page</a>.

File diff suppressed because it is too large Load Diff

View File

@ -1,4 +1,4 @@
.TH PCREPATTERN 3 "08 January 2014" "PCRE 8.35"
.TH PCREPATTERN 3 "14 June 2015" "PCRE 8.38"
.SH NAME
PCRE - Perl-compatible regular expressions
.SH "PCRE REGULAR EXPRESSION DETAILS"
@ -308,7 +308,8 @@ A second use of backslash provides a way of encoding non-printing characters
in patterns in a visible manner. There is no restriction on the appearance of
non-printing characters, apart from the binary zero that terminates a pattern,
but when a pattern is being prepared by text editing, it is often easier to use
one of the following escape sequences than the binary character it represents:
one of the following escape sequences than the binary character it represents.
In an ASCII or Unicode environment, these escapes are as follows:
.sp
\ea alarm, that is, the BEL character (hex 07)
\ecx "control-x", where x is any ASCII character
@ -331,18 +332,30 @@ but \ec{ becomes hex 3B ({ is 7B), and \ec; becomes hex 7B (; is 3B). If the
data item (byte or 16-bit value) following \ec has a value greater than 127, a
compile-time error occurs. This locks out non-ASCII characters in all modes.
.P
The \ec facility was designed for use with ASCII characters, but with the
extension to Unicode it is even less useful than it once was. It is, however,
recognized when PCRE is compiled in EBCDIC mode, where data items are always
bytes. In this mode, all values are valid after \ec. If the next character is a
lower case letter, it is converted to upper case. Then the 0xc0 bits of the
byte are inverted. Thus \ecA becomes hex 01, as in ASCII (A is C1), but because
the EBCDIC letters are disjoint, \ecZ becomes hex 29 (Z is E9), and other
characters also generate different values.
When PCRE is compiled in EBCDIC mode, \ea, \ee, \ef, \en, \er, and \et
generate the appropriate EBCDIC code values. The \ec escape is processed
as specified for Perl in the \fBperlebcdic\fP document. The only characters
that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ], ^, _, or ?. Any
other character provokes a compile-time error. The sequence \e@ encodes
character code 0; the letters (in either case) encode characters 1-26 (hex 01
to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
\e? becomes either 255 (hex FF) or 95 (hex 5F).
.P
Thus, apart from \e?, these escapes generate the same character code values as
they do in an ASCII environment, though the meanings of the values mostly
differ. For example, \eG always generates code value 7, which is BEL in ASCII
but DEL in EBCDIC.
.P
The sequence \e? generates DEL (127, hex 7F) in an ASCII environment, but
because 127 is not a control character in EBCDIC, Perl makes it generate the
APC character. Unfortunately, there are several variants of EBCDIC. In most of
them the APC character has the value 255 (hex FF), but in the one Perl calls
POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
values, PCRE makes \e? generate 95; otherwise it generates 255.
.P
After \e0 up to two further octal digits are read. If there are fewer than two
digits, just those that are present are used. Thus the sequence \e0\ex\e07
specifies two binary zeros followed by a BEL character (code value 7). Make
digits, just those that are present are used. Thus the sequence \e0\ex\e015
specifies two binary zeros followed by a CR character (code value 13). Make
sure you supply two digits after the initial zero if the pattern character that
follows is itself an octal digit.
.P
@ -3283,6 +3296,6 @@ Cambridge CB2 3QH, England.
.rs
.sp
.nf
Last updated: 08 January 2014
Copyright (c) 1997-2014 University of Cambridge.
Last updated: 14 June 2015
Copyright (c) 1997-2015 University of Cambridge.
.fi

View File

@ -174,7 +174,7 @@ static const short int escapes[] = {
-ESC_Z, CHAR_LEFT_SQUARE_BRACKET,
CHAR_BACKSLASH, CHAR_RIGHT_SQUARE_BRACKET,
CHAR_CIRCUMFLEX_ACCENT, CHAR_UNDERSCORE,
CHAR_GRAVE_ACCENT, 7,
CHAR_GRAVE_ACCENT, ESC_a,
-ESC_b, 0,
-ESC_d, ESC_e,
ESC_f, 0,
@ -202,9 +202,9 @@ static const short int escapes[] = {
/* 68 */ 0, 0, '|', ',', '%', '_', '>', '?',
/* 70 */ 0, 0, 0, 0, 0, 0, 0, 0,
/* 78 */ 0, '`', ':', '#', '@', '\'', '=', '"',
/* 80 */ 0, 7, -ESC_b, 0, -ESC_d, ESC_e, ESC_f, 0,
/* 80 */ 0, ESC_a, -ESC_b, 0, -ESC_d, ESC_e, ESC_f, 0,
/* 88 */-ESC_h, 0, 0, '{', 0, 0, 0, 0,
/* 90 */ 0, 0, -ESC_k, 'l', 0, ESC_n, 0, -ESC_p,
/* 90 */ 0, 0, -ESC_k, 0, 0, ESC_n, 0, -ESC_p,
/* 98 */ 0, ESC_r, 0, '}', 0, 0, 0, 0,
/* A0 */ 0, '~', -ESC_s, ESC_tee, 0,-ESC_v, -ESC_w, 0,
/* A8 */ 0,-ESC_z, 0, 0, 0, '[', 0, 0,
@ -219,6 +219,12 @@ static const short int escapes[] = {
/* F0 */ 0, 0, 0, 0, 0, 0, 0, 0,
/* F8 */ 0, 0, 0, 0, 0, 0, 0, 0
};
/* We also need a table of characters that may follow \c in an EBCDIC
environment for characters 0-31. */
static unsigned char ebcdic_escape_c[] = "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_";
#endif
@ -458,7 +464,7 @@ static const char error_texts[] =
"range out of order in character class\0"
"nothing to repeat\0"
/* 10 */
"operand of unlimited repeat could match the empty string\0" /** DEAD **/
"internal error: invalid forward reference offset\0"
"internal error: unexpected repeat\0"
"unrecognized character after (? or (?-\0"
"POSIX named classes are supported only within a class\0"
@ -527,7 +533,11 @@ static const char error_texts[] =
"different names for subpatterns of the same number are not allowed\0"
"(*MARK) must have an argument\0"
"this version of PCRE is not compiled with Unicode property support\0"
#ifndef EBCDIC
"\\c must be followed by an ASCII character\0"
#else
"\\c must be followed by a letter or one of [\\]^_?\0"
#endif
"\\k is not followed by a braced, angle-bracketed, or quoted name\0"
/* 70 */
"internal error: unknown opcode in find_fixedlength()\0"
@ -1425,7 +1435,16 @@ else
c ^= 0x40;
#else /* EBCDIC coding */
if (c >= CHAR_a && c <= CHAR_z) c += 64;
c ^= 0xC0;
if (c == CHAR_QUESTION_MARK)
c = ('\\' == 188 && '`' == 74)? 0x5f : 0xff;
else
{
for (i = 0; i < 32; i++)
{
if (c == ebcdic_escape_c[i]) break;
}
if (i < 32) c = i; else *errorcodeptr = ERR68;
}
#endif
break;
@ -1799,7 +1818,7 @@ for (;;)
case OP_ASSERTBACK:
case OP_ASSERTBACK_NOT:
do cc += GET(cc, 1); while (*cc == OP_ALT);
cc += PRIV(OP_lengths)[*cc];
cc += 1 + LINK_SIZE;
break;
/* Skip over things that don't match chars */
@ -2487,7 +2506,7 @@ for (code = first_significant_code(code + PRIV(OP_lengths)[*code], TRUE);
if (c == OP_BRA || c == OP_BRAPOS ||
c == OP_CBRA || c == OP_CBRAPOS ||
c == OP_ONCE || c == OP_ONCE_NC ||
c == OP_COND)
c == OP_COND || c == OP_SCOND)
{
BOOL empty_branch;
if (GET(code, 1) == 0) return TRUE; /* Hit unclosed bracket */
@ -3886,11 +3905,11 @@ didn't consider this to be a POSIX class. Likewise for [:1234:].
The problem in trying to be exactly like Perl is in the handling of escapes. We
have to be sure that [abc[:x\]pqr] is *not* treated as containing a POSIX
class, but [abc[:x\]pqr:]] is (so that an error can be generated). The code
below handles the special case of \], but does not try to do any other escape
processing. This makes it different from Perl for cases such as [:l\ower:]
where Perl recognizes it as the POSIX class "lower" but PCRE does not recognize
"l\ower". This is a lesser evil than not diagnosing bad classes when Perl does,
I think.
below handles the special cases \\ and \], but does not try to do any other
escape processing. This makes it different from Perl for cases such as
[:l\ower:] where Perl recognizes it as the POSIX class "lower" but PCRE does
not recognize "l\ower". This is a lesser evil than not diagnosing bad classes
when Perl does, I think.
A user pointed out that PCRE was rejecting [:a[:digit:]] whereas Perl was not.
It seems that the appearance of a nested POSIX class supersedes an apparent
@ -3917,21 +3936,16 @@ pcre_uchar terminator; /* Don't combine these lines; the Solaris cc */
terminator = *(++ptr); /* compiler warns about "non-constant" initializer. */
for (++ptr; *ptr != CHAR_NULL; ptr++)
{
if (*ptr == CHAR_BACKSLASH && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
if (*ptr == CHAR_BACKSLASH &&
(ptr[1] == CHAR_RIGHT_SQUARE_BRACKET ||
ptr[1] == CHAR_BACKSLASH))
ptr++;
else if (*ptr == CHAR_RIGHT_SQUARE_BRACKET) return FALSE;
else
else if ((*ptr == CHAR_LEFT_SQUARE_BRACKET && ptr[1] == terminator) ||
*ptr == CHAR_RIGHT_SQUARE_BRACKET) return FALSE;
else if (*ptr == terminator && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
{
if (*ptr == terminator && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
{
*endptr = ptr;
return TRUE;
}
if (*ptr == CHAR_LEFT_SQUARE_BRACKET &&
(ptr[1] == CHAR_COLON || ptr[1] == CHAR_DOT ||
ptr[1] == CHAR_EQUALS_SIGN) &&
check_posix_syntax(ptr, endptr))
return FALSE;
*endptr = ptr;
return TRUE;
}
}
return FALSE;
@ -3985,11 +3999,12 @@ have their offsets adjusted. That one of the jobs of this function. Before it
is called, the partially compiled regex must be temporarily terminated with
OP_END.
This function has been extended with the possibility of forward references for
recursions and subroutine calls. It must also check the list of such references
for the group we are dealing with. If it finds that one of the recursions in
the current group is on this list, it adjusts the offset in the list, not the
value in the reference (which is a group number).
This function has been extended to cope with forward references for recursions
and subroutine calls. It must check the list of such references for the
group we are dealing with. If it finds that one of the recursions in the
current group is on this list, it does not adjust the value in the reference
(which is a group number). After the group has been scanned, all the offsets in
the forward reference list for the group are adjusted.
Arguments:
group points to the start of the group
@ -4005,29 +4020,21 @@ static void
adjust_recurse(pcre_uchar *group, int adjust, BOOL utf, compile_data *cd,
size_t save_hwm_offset)
{
int offset;
pcre_uchar *hc;
pcre_uchar *ptr = group;
while ((ptr = (pcre_uchar *)find_recurse(ptr, utf)) != NULL)
{
int offset;
pcre_uchar *hc;
/* See if this recursion is on the forward reference list. If so, adjust the
reference. */
for (hc = (pcre_uchar *)cd->start_workspace + save_hwm_offset; hc < cd->hwm;
hc += LINK_SIZE)
{
offset = (int)GET(hc, 0);
if (cd->start_code + offset == ptr + 1)
{
PUT(hc, 0, offset + adjust);
break;
}
if (cd->start_code + offset == ptr + 1) break;
}
/* Otherwise, adjust the recursion offset if it's after the start of this
group. */
/* If we have not found this recursion on the forward reference list, adjust
the recursion's offset if it's after the start of this group. */
if (hc >= cd->hwm)
{
@ -4037,6 +4044,15 @@ while ((ptr = (pcre_uchar *)find_recurse(ptr, utf)) != NULL)
ptr += 1 + LINK_SIZE;
}
/* Now adjust all forward reference offsets for the group. */
for (hc = (pcre_uchar *)cd->start_workspace + save_hwm_offset; hc < cd->hwm;
hc += LINK_SIZE)
{
offset = (int)GET(hc, 0);
PUT(hc, 0, offset + adjust);
}
}
@ -4465,7 +4481,7 @@ const pcre_uchar *tempptr;
const pcre_uchar *nestptr = NULL;
pcre_uchar *previous = NULL;
pcre_uchar *previous_callout = NULL;
size_t save_hwm_offset = 0;
size_t item_hwm_offset = 0;
pcre_uint8 classbits[32];
/* We can fish out the UTF-8 setting once and for all into a BOOL, but we
@ -4623,8 +4639,7 @@ for (;; ptr++)
/* In the real compile phase, just check the workspace used by the forward
reference list. */
else if (cd->hwm > cd->start_workspace + cd->workspace_size -
WORK_SIZE_SAFETY_MARGIN)
else if (cd->hwm > cd->start_workspace + cd->workspace_size)
{
*errorcodeptr = ERR52;
goto FAILED;
@ -4767,6 +4782,7 @@ for (;; ptr++)
zeroreqchar = reqchar;
zeroreqcharflags = reqcharflags;
previous = code;
item_hwm_offset = cd->hwm - cd->start_workspace;
*code++ = ((options & PCRE_DOTALL) != 0)? OP_ALLANY: OP_ANY;
break;
@ -4818,6 +4834,7 @@ for (;; ptr++)
/* Handle a real character class. */
previous = code;
item_hwm_offset = cd->hwm - cd->start_workspace;
/* PCRE supports POSIX class stuff inside a class. Perl gives an error if
they are encountered at the top level, so we'll do that too. */
@ -4923,9 +4940,10 @@ for (;; ptr++)
(which is on the stack). We have to remember that there was XCLASS data,
however. */
if (class_uchardata > class_uchardata_base) xclass = TRUE;
if (lengthptr != NULL && class_uchardata > class_uchardata_base)
{
xclass = TRUE;
*lengthptr += (int)(class_uchardata - class_uchardata_base);
class_uchardata = class_uchardata_base;
}
@ -5028,10 +5046,26 @@ for (;; ptr++)
ptr = tempptr + 1;
continue;
/* For all other POSIX classes, no special action is taken in UCP
mode. Fall through to the non_UCP case. */
/* For the other POSIX classes (ascii, xdigit) we are going to fall
through to the non-UCP case and build a bit map for characters with
code points less than 256. If we are in a negated POSIX class
within a non-negated overall class, characters with code points
greater than 255 must all match. In the special case where we have
not yet generated any xclass data, and this is the final item in
the overall class, we need do nothing: later on, the opcode
OP_NCLASS will be used to indicate that characters greater than 255
are acceptable. If we have already seen an xclass item or one may
follow (we have to assume that it might if this is not the end of
the class), explicitly match all wide codepoints. */
default:
if (!negate_class && local_negate &&
(xclass || tempptr[2] != CHAR_RIGHT_SQUARE_BRACKET))
{
*class_uchardata++ = XCL_RANGE;
class_uchardata += PRIV(ord2utf)(0x100, class_uchardata);
class_uchardata += PRIV(ord2utf)(0x10ffff, class_uchardata);
}
break;
}
}
@ -5195,9 +5229,9 @@ for (;; ptr++)
cd, PRIV(vspace_list));
continue;
#ifdef SUPPORT_UCP
case ESC_p:
case ESC_P:
#ifdef SUPPORT_UCP
{
BOOL negated;
unsigned int ptype = 0, pdata = 0;
@ -5211,6 +5245,9 @@ for (;; ptr++)
class_has_8bitchar--; /* Undo! */
continue;
}
#else
*errorcodeptr = ERR45;
goto FAILED;
#endif
/* Unrecognized escapes are faulted if PCRE is running in its
strict mode. By default, for compatibility with Perl, they are
@ -5367,16 +5404,20 @@ for (;; ptr++)
CLASS_SINGLE_CHARACTER:
if (class_one_char < 2) class_one_char++;
/* If class_one_char is 1, we have the first single character in the
class, and there have been no prior ranges, or XCLASS items generated by
escapes. If this is the final character in the class, we can optimize by
turning the item into a 1-character OP_CHAR[I] if it's positive, or
OP_NOT[I] if it's negative. In the positive case, it can cause firstchar
to be set. Otherwise, there can be no first char if this item is first,
whatever repeat count may follow. In the case of reqchar, save the
previous value for reinstating. */
/* If xclass_has_prop is false and class_one_char is 1, we have the first
single character in the class, and there have been no prior ranges, or
XCLASS items generated by escapes. If this is the final character in the
class, we can optimize by turning the item into a 1-character OP_CHAR[I]
if it's positive, or OP_NOT[I] if it's negative. In the positive case, it
can cause firstchar to be set. Otherwise, there can be no first char if
this item is first, whatever repeat count may follow. In the case of
reqchar, save the previous value for reinstating. */
if (!inescq && class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
if (!inescq &&
#ifdef SUPPORT_UCP
!xclass_has_prop &&
#endif
class_one_char == 1 && ptr[1] == CHAR_RIGHT_SQUARE_BRACKET)
{
ptr++;
zeroreqchar = reqchar;
@ -5492,9 +5533,10 @@ for (;; ptr++)
actual compiled code. */
#ifdef SUPPORT_UTF
if (xclass && (!should_flip_negation || (options & PCRE_UCP) != 0))
if (xclass && (xclass_has_prop || !should_flip_negation ||
(options & PCRE_UCP) != 0))
#elif !defined COMPILE_PCRE8
if (xclass && !should_flip_negation)
if (xclass && (xclass_has_prop || !should_flip_negation))
#endif
#if defined SUPPORT_UTF || !defined COMPILE_PCRE8
{
@ -5930,7 +5972,7 @@ for (;; ptr++)
{
register int i;
int len = (int)(code - previous);
size_t base_hwm_offset = save_hwm_offset;
size_t base_hwm_offset = item_hwm_offset;
pcre_uchar *bralink = NULL;
pcre_uchar *brazeroptr = NULL;
@ -5985,7 +6027,7 @@ for (;; ptr++)
if (repeat_max <= 1) /* Covers 0, 1, and unlimited */
{
*code = OP_END;
adjust_recurse(previous, 1, utf, cd, save_hwm_offset);
adjust_recurse(previous, 1, utf, cd, item_hwm_offset);
memmove(previous + 1, previous, IN_UCHARS(len));
code++;
if (repeat_max == 0)
@ -6009,7 +6051,7 @@ for (;; ptr++)
{
int offset;
*code = OP_END;
adjust_recurse(previous, 2 + LINK_SIZE, utf, cd, save_hwm_offset);
adjust_recurse(previous, 2 + LINK_SIZE, utf, cd, item_hwm_offset);
memmove(previous + 2 + LINK_SIZE, previous, IN_UCHARS(len));
code += 2 + LINK_SIZE;
*previous++ = OP_BRAZERO + repeat_type;
@ -6254,6 +6296,12 @@ for (;; ptr++)
while (*scode == OP_ALT);
}
/* A conditional group with only one branch has an implicit empty
alternative branch. */
if (*bracode == OP_COND && bracode[GET(bracode,1)] != OP_ALT)
*bracode = OP_SCOND;
/* Handle possessive quantifiers. */
if (possessive_quantifier)
@ -6267,11 +6315,11 @@ for (;; ptr++)
{
int nlen = (int)(code - bracode);
*code = OP_END;
adjust_recurse(bracode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
adjust_recurse(bracode, 1 + LINK_SIZE, utf, cd, item_hwm_offset);
memmove(bracode + 1 + LINK_SIZE, bracode, IN_UCHARS(nlen));
code += 1 + LINK_SIZE;
nlen += 1 + LINK_SIZE;
*bracode = OP_BRAPOS;
*bracode = (*bracode == OP_COND)? OP_BRAPOS : OP_SBRAPOS;
*code++ = OP_KETRPOS;
PUTINC(code, 0, nlen);
PUT(bracode, 1, nlen);
@ -6401,7 +6449,7 @@ for (;; ptr++)
else
{
*code = OP_END;
adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, item_hwm_offset);
memmove(tempcode + 1 + LINK_SIZE, tempcode, IN_UCHARS(len));
code += 1 + LINK_SIZE;
len += 1 + LINK_SIZE;
@ -6450,7 +6498,7 @@ for (;; ptr++)
default:
*code = OP_END;
adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, save_hwm_offset);
adjust_recurse(tempcode, 1 + LINK_SIZE, utf, cd, item_hwm_offset);
memmove(tempcode + 1 + LINK_SIZE, tempcode, IN_UCHARS(len));
code += 1 + LINK_SIZE;
len += 1 + LINK_SIZE;
@ -6586,9 +6634,17 @@ for (;; ptr++)
goto FAILED;
}
setverb = *code++ = verbs[i].op_arg;
*code++ = arglen;
memcpy(code, arg, IN_UCHARS(arglen));
code += arglen;
if (lengthptr != NULL) /* In pass 1 just add in the length */
{ /* to avoid potential workspace */
*lengthptr += arglen; /* overflow. */
*code++ = 0;
}
else
{
*code++ = arglen;
memcpy(code, arg, IN_UCHARS(arglen));
code += arglen;
}
*code++ = 0;
}
@ -6623,7 +6679,7 @@ for (;; ptr++)
newoptions = options;
skipbytes = 0;
bravalue = OP_CBRA;
save_hwm_offset = cd->hwm - cd->start_workspace;
item_hwm_offset = cd->hwm - cd->start_workspace;
reset_bracount = FALSE;
/* Deal with the extended parentheses; all are introduced by '?', and the
@ -6641,6 +6697,7 @@ for (;; ptr++)
/* ------------------------------------------------------------ */
case CHAR_VERTICAL_LINE: /* Reset capture count for each branch */
reset_bracount = TRUE;
cd->dupgroups = TRUE; /* Record (?| encountered */
/* Fall through */
/* ------------------------------------------------------------ */
@ -6741,6 +6798,12 @@ for (;; ptr++)
{
while (IS_DIGIT(*ptr))
{
if (recno > INT_MAX / 10 - 1) /* Integer overflow */
{
while (IS_DIGIT(*ptr)) ptr++;
*errorcodeptr = ERR61;
goto FAILED;
}
recno = recno * 10 + (int)(*ptr - CHAR_0);
ptr++;
}
@ -6769,7 +6832,7 @@ for (;; ptr++)
ptr++;
}
namelen = (int)(ptr - name);
if (lengthptr != NULL) *lengthptr += IMM2_SIZE;
if (lengthptr != NULL) skipbytes += IMM2_SIZE;
}
/* Check the terminator */
@ -6875,6 +6938,11 @@ for (;; ptr++)
*errorcodeptr = ERR15;
goto FAILED;
}
if (recno > INT_MAX / 10 - 1) /* Integer overflow */
{
*errorcodeptr = ERR61;
goto FAILED;
}
recno = recno * 10 + name[i] - CHAR_0;
}
if (recno == 0) recno = RREF_ANY;
@ -7151,6 +7219,7 @@ for (;; ptr++)
if (lengthptr != NULL)
{
named_group *ng;
recno = 0;
if (namelen == 0)
{
@ -7168,20 +7237,6 @@ for (;; ptr++)
goto FAILED;
}
/* The name table does not exist in the first pass; instead we must
scan the list of names encountered so far in order to get the
number. If the name is not found, set the value to 0 for a forward
reference. */
ng = cd->named_groups;
for (i = 0; i < cd->names_found; i++, ng++)
{
if (namelen == ng->length &&
STRNCMP_UC_UC(name, ng->name, namelen) == 0)
break;
}
recno = (i < cd->names_found)? ng->number : 0;
/* Count named back references. */
if (!is_recurse) cd->namedrefcount++;
@ -7191,6 +7246,56 @@ for (;; ptr++)
16-bit data item. */
*lengthptr += IMM2_SIZE;
/* If this is a forward reference and we are within a (?|...) group,
the reference may end up as the number of a group which we are
currently inside, that is, it could be a recursive reference. In the
real compile this will be picked up and the reference wrapped with
OP_ONCE to make it atomic, so we must space in case this occurs. */
/* In fact, this can happen for a non-forward reference because
another group with the same number might be created later. This
issue is fixed "properly" in PCRE2. As PCRE1 is now in maintenance
only mode, we finesse the bug by allowing more memory always. */
*lengthptr += 2 + 2*LINK_SIZE;
/* It is even worse than that. The current reference may be to an
existing named group with a different number (so apparently not
recursive) but which later on is also attached to a group with the
current number. This can only happen if $(| has been previous
encountered. In that case, we allow yet more memory, just in case.
(Again, this is fixed "properly" in PCRE2. */
if (cd->dupgroups) *lengthptr += 4 + 4*LINK_SIZE;
/* Otherwise, check for recursion here. The name table does not exist
in the first pass; instead we must scan the list of names encountered
so far in order to get the number. If the name is not found, leave
the value of recno as 0 for a forward reference. */
else
{
ng = cd->named_groups;
for (i = 0; i < cd->names_found; i++, ng++)
{
if (namelen == ng->length &&
STRNCMP_UC_UC(name, ng->name, namelen) == 0)
{
open_capitem *oc;
recno = ng->number;
if (is_recurse) break;
for (oc = cd->open_caps; oc != NULL; oc = oc->next)
{
if (oc->number == recno)
{
oc->flag = TRUE;
break;
}
}
}
}
}
}
/* In the real compile, search the name table. We check the name
@ -7237,8 +7342,6 @@ for (;; ptr++)
for (i++; i < cd->names_found; i++)
{
if (STRCMP_UC_UC(slot + IMM2_SIZE, cslot + IMM2_SIZE) != 0) break;
count++;
cslot += cd->name_entry_size;
}
@ -7247,6 +7350,7 @@ for (;; ptr++)
{
if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE;
previous = code;
item_hwm_offset = cd->hwm - cd->start_workspace;
*code++ = ((options & PCRE_CASELESS) != 0)? OP_DNREFI : OP_DNREF;
PUT2INC(code, 0, index);
PUT2INC(code, 0, count);
@ -7284,9 +7388,14 @@ for (;; ptr++)
/* ------------------------------------------------------------ */
case CHAR_R: /* Recursion */
ptr++; /* Same as (?0) */
/* Fall through */
case CHAR_R: /* Recursion, same as (?0) */
recno = 0;
if (*(++ptr) != CHAR_RIGHT_PARENTHESIS)
{
*errorcodeptr = ERR29;
goto FAILED;
}
goto HANDLE_RECURSION;
/* ------------------------------------------------------------ */
@ -7323,7 +7432,15 @@ for (;; ptr++)
recno = 0;
while(IS_DIGIT(*ptr))
{
if (recno > INT_MAX / 10 - 1) /* Integer overflow */
{
while (IS_DIGIT(*ptr)) ptr++;
*errorcodeptr = ERR61;
goto FAILED;
}
recno = recno * 10 + *ptr++ - CHAR_0;
}
if (*ptr != (pcre_uchar)terminator)
{
@ -7360,6 +7477,7 @@ for (;; ptr++)
HANDLE_RECURSION:
previous = code;
item_hwm_offset = cd->hwm - cd->start_workspace;
called = cd->start_code;
/* When we are actually compiling, find the bracket that is being
@ -7561,7 +7679,11 @@ for (;; ptr++)
previous = NULL;
cd->iscondassert = FALSE;
}
else previous = code;
else
{
previous = code;
item_hwm_offset = cd->hwm - cd->start_workspace;
}
*code = bravalue;
tempcode = code;
@ -7809,7 +7931,7 @@ for (;; ptr++)
const pcre_uchar *p;
pcre_uint32 cf;
save_hwm_offset = cd->hwm - cd->start_workspace; /* Normally this is set when '(' is read */
item_hwm_offset = cd->hwm - cd->start_workspace; /* Normally this is set when '(' is read */
terminator = (*(++ptr) == CHAR_LESS_THAN_SIGN)?
CHAR_GREATER_THAN_SIGN : CHAR_APOSTROPHE;
@ -7838,7 +7960,7 @@ for (;; ptr++)
if (*p != (pcre_uchar)terminator)
{
*errorcodeptr = ERR57;
break;
goto FAILED;
}
ptr++;
goto HANDLE_NUMERICAL_RECURSION;
@ -7853,7 +7975,7 @@ for (;; ptr++)
ptr[1] != CHAR_APOSTROPHE && ptr[1] != CHAR_LEFT_CURLY_BRACKET))
{
*errorcodeptr = ERR69;
break;
goto FAILED;
}
is_recurse = FALSE;
terminator = (*(++ptr) == CHAR_LESS_THAN_SIGN)?
@ -7877,6 +7999,7 @@ for (;; ptr++)
HANDLE_REFERENCE:
if (firstcharflags == REQ_UNSET) firstcharflags = REQ_NONE;
previous = code;
item_hwm_offset = cd->hwm - cd->start_workspace;
*code++ = ((options & PCRE_CASELESS) != 0)? OP_REFI : OP_REF;
PUT2INC(code, 0, recno);
cd->backref_map |= (recno < 32)? (1 << recno) : 1;
@ -7906,6 +8029,7 @@ for (;; ptr++)
if (!get_ucp(&ptr, &negated, &ptype, &pdata, errorcodeptr))
goto FAILED;
previous = code;
item_hwm_offset = cd->hwm - cd->start_workspace;
*code++ = ((escape == ESC_p) != negated)? OP_PROP : OP_NOTPROP;
*code++ = ptype;
*code++ = pdata;
@ -7946,6 +8070,7 @@ for (;; ptr++)
{
previous = (escape > ESC_b && escape < ESC_Z)? code : NULL;
item_hwm_offset = cd->hwm - cd->start_workspace;
*code++ = (!utf && escape == ESC_C)? OP_ALLANY : escape;
}
}
@ -7989,6 +8114,7 @@ for (;; ptr++)
ONE_CHAR:
previous = code;
item_hwm_offset = cd->hwm - cd->start_workspace;
/* For caseless UTF-8 mode when UCP support is available, check whether
this character has more than one other case. If so, generate a special
@ -9164,6 +9290,7 @@ cd->names_found = 0;
cd->name_entry_size = 0;
cd->name_table = NULL;
cd->dupnames = FALSE;
cd->dupgroups = FALSE;
cd->namedrefcount = 0;
cd->start_code = cworkspace;
cd->hwm = cworkspace;
@ -9336,6 +9463,16 @@ if (cd->hwm > cd->start_workspace)
int offset, recno;
cd->hwm -= LINK_SIZE;
offset = GET(cd->hwm, 0);
/* Check that the hwm handling hasn't gone wrong. This whole area is
rewritten in PCRE2 because there are some obscure cases. */
if (offset == 0 || codestart[offset-1] != OP_RECURSE)
{
errorcode = ERR10;
break;
}
recno = GET(codestart, offset);
if (recno != prev_recno)
{
@ -9366,7 +9503,7 @@ used in this code because at least one compiler gives a warning about loss of
"const" attribute if the cast (pcre_uchar *)codestart is used directly in the
function call. */
if ((options & PCRE_NO_AUTO_POSSESS) == 0)
if (errorcode == 0 && (options & PCRE_NO_AUTO_POSSESS) == 0)
{
pcre_uchar *temp = (pcre_uchar *)codestart;
auto_possessify(temp, utf, cd);
@ -9380,7 +9517,7 @@ OP_RECURSE that are not fixed length get a diagnosic with a useful offset. The
exceptional ones forgo this. We scan the pattern to check that they are fixed
length, and set their lengths. */
if (cd->check_lookbehind)
if (errorcode == 0 && cd->check_lookbehind)
{
pcre_uchar *cc = (pcre_uchar *)codestart;
@ -9593,4 +9730,3 @@ return (pcre32 *)re;
}
/* End of pcre_compile.c */

View File

@ -6685,7 +6685,8 @@ if (md->offset_vector != NULL)
register int *iend = iptr - re->top_bracket;
if (iend < md->offset_vector + 2) iend = md->offset_vector + 2;
while (--iptr >= iend) *iptr = -1;
md->offset_vector[0] = md->offset_vector[1] = -1;
if (offsetcount > 0) md->offset_vector[0] = -1;
if (offsetcount > 1) md->offset_vector[1] = -1;
}
/* Set up the first character to match, if available. The first_char value is

View File

@ -984,7 +984,7 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
#ifndef EBCDIC
#define HSPACE_LIST \
CHAR_HT, CHAR_SPACE, 0xa0, \
CHAR_HT, CHAR_SPACE, CHAR_NBSP, \
0x1680, 0x180e, 0x2000, 0x2001, 0x2002, 0x2003, 0x2004, 0x2005, \
0x2006, 0x2007, 0x2008, 0x2009, 0x200A, 0x202f, 0x205f, 0x3000, \
NOTACHAR
@ -1010,7 +1010,7 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
#define HSPACE_BYTE_CASES \
case CHAR_HT: \
case CHAR_SPACE: \
case 0xa0 /* NBSP */
case CHAR_NBSP
#define HSPACE_CASES \
HSPACE_BYTE_CASES: \
@ -1037,11 +1037,12 @@ other. NOTE: The values also appear in pcre_jit_compile.c. */
/* ------ EBCDIC environments ------ */
#else
#define HSPACE_LIST CHAR_HT, CHAR_SPACE
#define HSPACE_LIST CHAR_HT, CHAR_SPACE, CHAR_NBSP, NOTACHAR
#define HSPACE_BYTE_CASES \
case CHAR_HT: \
case CHAR_SPACE
case CHAR_SPACE: \
case CHAR_NBSP
#define HSPACE_CASES HSPACE_BYTE_CASES
@ -1215,6 +1216,7 @@ same code point. */
#define CHAR_ESC '\047'
#define CHAR_DEL '\007'
#define CHAR_NBSP '\x41'
#define STR_ESC "\047"
#define STR_DEL "\007"
@ -1229,6 +1231,7 @@ a positive value. */
#define CHAR_NEL ((unsigned char)'\x85')
#define CHAR_ESC '\033'
#define CHAR_DEL '\177'
#define CHAR_NBSP ((unsigned char)'\xa0')
#define STR_LF "\n"
#define STR_NL STR_LF
@ -1606,6 +1609,7 @@ only. */
#define CHAR_VERTICAL_LINE '\174'
#define CHAR_RIGHT_CURLY_BRACKET '\175'
#define CHAR_TILDE '\176'
#define CHAR_NBSP ((unsigned char)'\xa0')
#define STR_HT "\011"
#define STR_VT "\013"
@ -1762,6 +1766,10 @@ only. */
/* Escape items that are just an encoding of a particular data value. */
#ifndef ESC_a
#define ESC_a CHAR_BEL
#endif
#ifndef ESC_e
#define ESC_e CHAR_ESC
#endif
@ -2446,6 +2454,7 @@ typedef struct compile_data {
BOOL had_pruneorskip; /* (*PRUNE) or (*SKIP) encountered */
BOOL check_lookbehind; /* Lookbehinds need later checking */
BOOL dupnames; /* Duplicate names exist */
BOOL dupgroups; /* Duplicate groups exist: (?| found */
BOOL iscondassert; /* Next assert is a condition */
int nltype; /* Newline type */
int nllen; /* Newline string length */

View File

@ -1064,6 +1064,7 @@ pcre_uchar *alternative;
pcre_uchar *end = NULL;
int private_data_ptr = *private_data_start;
int space, size, bracketlen;
BOOL repeat_check = TRUE;
while (cc < ccend)
{
@ -1071,9 +1072,10 @@ while (cc < ccend)
size = 0;
bracketlen = 0;
if (private_data_ptr > SLJIT_MAX_LOCAL_SIZE)
return;
break;
if (*cc == OP_ONCE || *cc == OP_ONCE_NC || *cc == OP_BRA || *cc == OP_CBRA || *cc == OP_COND)
if (repeat_check && (*cc == OP_ONCE || *cc == OP_ONCE_NC || *cc == OP_BRA || *cc == OP_CBRA || *cc == OP_COND))
{
if (detect_repeat(common, cc))
{
/* These brackets are converted to repeats, so no global
@ -1081,6 +1083,8 @@ while (cc < ccend)
if (cc >= end)
end = bracketend(cc);
}
}
repeat_check = TRUE;
switch(*cc)
{
@ -1136,6 +1140,13 @@ while (cc < ccend)
bracketlen = 1 + LINK_SIZE + IMM2_SIZE;
break;
case OP_BRAZERO:
case OP_BRAMINZERO:
case OP_BRAPOSZERO:
repeat_check = FALSE;
size = 1;
break;
CASE_ITERATOR_PRIVATE_DATA_1
space = 1;
size = -2;
@ -1162,12 +1173,17 @@ while (cc < ccend)
size = 1;
break;
CASE_ITERATOR_TYPE_PRIVATE_DATA_2B
case OP_TYPEUPTO:
if (cc[1 + IMM2_SIZE] != OP_ANYNL && cc[1 + IMM2_SIZE] != OP_EXTUNI)
space = 2;
size = 1 + IMM2_SIZE;
break;
case OP_TYPEMINUPTO:
space = 2;
size = 1 + IMM2_SIZE;
break;
case OP_CLASS:
case OP_NCLASS:
size += 1 + 32 / sizeof(pcre_uchar);
@ -1316,6 +1332,13 @@ while (cc < ccend)
cc += 1 + LINK_SIZE + IMM2_SIZE;
break;
case OP_THEN:
stack_restore = TRUE;
if (common->control_head_ptr != 0)
*needs_control_head = TRUE;
cc ++;
break;
default:
stack_restore = TRUE;
/* Fall through. */
@ -2220,6 +2243,7 @@ while (current != NULL)
SLJIT_ASSERT_STOP();
break;
}
SLJIT_ASSERT(current > (sljit_sw*)current[-1]);
current = (sljit_sw*)current[-1];
}
return -1;
@ -3209,7 +3233,7 @@ bytes[len] = byte;
bytes[0] = len;
}
static int scan_prefix(compiler_common *common, pcre_uchar *cc, pcre_uint32 *chars, pcre_uint8 *bytes, int max_chars)
static int scan_prefix(compiler_common *common, pcre_uchar *cc, pcre_uint32 *chars, pcre_uint8 *bytes, int max_chars, pcre_uint32 *rec_count)
{
/* Recursive function, which scans prefix literals. */
BOOL last, any, caseless;
@ -3227,9 +3251,14 @@ pcre_uchar othercase[1];
repeat = 1;
while (TRUE)
{
if (*rec_count == 0)
return 0;
(*rec_count)--;
last = TRUE;
any = FALSE;
caseless = FALSE;
switch (*cc)
{
case OP_CHARI:
@ -3291,7 +3320,7 @@ while (TRUE)
#ifdef SUPPORT_UTF
if (common->utf && HAS_EXTRALEN(*cc)) len += GET_EXTRALEN(*cc);
#endif
max_chars = scan_prefix(common, cc + len, chars, bytes, max_chars);
max_chars = scan_prefix(common, cc + len, chars, bytes, max_chars, rec_count);
if (max_chars == 0)
return consumed;
last = FALSE;
@ -3314,7 +3343,7 @@ while (TRUE)
alternative = cc + GET(cc, 1);
while (*alternative == OP_ALT)
{
max_chars = scan_prefix(common, alternative + 1 + LINK_SIZE, chars, bytes, max_chars);
max_chars = scan_prefix(common, alternative + 1 + LINK_SIZE, chars, bytes, max_chars, rec_count);
if (max_chars == 0)
return consumed;
alternative += GET(alternative, 1);
@ -3556,6 +3585,7 @@ int i, max, from;
int range_right = -1, range_len = 3 - 1;
sljit_ub *update_table = NULL;
BOOL in_range;
pcre_uint32 rec_count;
for (i = 0; i < MAX_N_CHARS; i++)
{
@ -3564,7 +3594,8 @@ for (i = 0; i < MAX_N_CHARS; i++)
bytes[i * MAX_N_BYTES] = 0;
}
max = scan_prefix(common, common->start, chars, bytes, MAX_N_CHARS);
rec_count = 10000;
max = scan_prefix(common, common->start, chars, bytes, MAX_N_CHARS, &rec_count);
if (max <= 1)
return FALSE;
@ -4311,8 +4342,10 @@ switch(length)
case 4:
if ((ranges[1] - ranges[0]) == (ranges[3] - ranges[2])
&& (ranges[0] | (ranges[2] - ranges[0])) == ranges[2]
&& (ranges[1] & (ranges[2] - ranges[0])) == 0
&& is_powerof2(ranges[2] - ranges[0]))
{
SLJIT_ASSERT((ranges[0] & (ranges[2] - ranges[0])) == 0 && (ranges[2] & ranges[3] & (ranges[2] - ranges[0])) != 0);
OP2(SLJIT_OR, TMP1, 0, TMP1, 0, SLJIT_IMM, ranges[2] - ranges[0]);
if (ranges[2] + 1 != ranges[3])
{
@ -4900,9 +4933,10 @@ else if ((cc[-1] & XCL_MAP) != 0)
if (!check_class_ranges(common, (const pcre_uint8 *)cc, FALSE, TRUE, list))
{
#ifdef COMPILE_PCRE8
SLJIT_ASSERT(common->utf);
jump = NULL;
if (common->utf)
#endif
jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255);
jump = CMP(SLJIT_GREATER, TMP1, 0, SLJIT_IMM, 255);
OP2(SLJIT_AND, TMP2, 0, TMP1, 0, SLJIT_IMM, 0x7);
OP2(SLJIT_LSHR, TMP1, 0, TMP1, 0, SLJIT_IMM, 3);
@ -4911,7 +4945,10 @@ else if ((cc[-1] & XCL_MAP) != 0)
OP2(SLJIT_AND | SLJIT_SET_E, SLJIT_UNUSED, 0, TMP1, 0, TMP2, 0);
add_jump(compiler, list, JUMP(SLJIT_NOT_ZERO));
JUMPHERE(jump);
#ifdef COMPILE_PCRE8
if (common->utf)
#endif
JUMPHERE(jump);
}
OP1(SLJIT_MOV, TMP1, 0, TMP3, 0);
@ -5219,7 +5256,7 @@ while (*cc != XCL_END)
OP_FLAGS(SLJIT_MOV, TMP2, 0, SLJIT_UNUSED, 0, SLJIT_LESS_EQUAL);
SET_CHAR_OFFSET(0);
OP2(SLJIT_SUB | SLJIT_SET_U, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0xff);
OP2(SLJIT_SUB | SLJIT_SET_U, SLJIT_UNUSED, 0, TMP1, 0, SLJIT_IMM, 0x7f);
OP_FLAGS(SLJIT_AND, TMP2, 0, TMP2, 0, SLJIT_LESS_EQUAL);
SET_TYPE_OFFSET(ucp_Pc);
@ -7665,6 +7702,10 @@ while (*cc != OP_KETRPOS)
OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(0), STR_PTR, 0);
}
/* Even if the match is empty, we need to reset the control head. */
if (needs_control_head)
OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
if (opcode == OP_SBRAPOS || opcode == OP_SCBRAPOS)
add_jump(compiler, &emptymatch, CMP(SLJIT_EQUAL, TMP1, 0, STR_PTR, 0));
@ -7692,6 +7733,10 @@ while (*cc != OP_KETRPOS)
OP1(SLJIT_MOV, SLJIT_MEM1(TMP2), (framesize + 1) * sizeof(sljit_sw), STR_PTR, 0);
}
/* Even if the match is empty, we need to reset the control head. */
if (needs_control_head)
OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
if (opcode == OP_SBRAPOS || opcode == OP_SCBRAPOS)
add_jump(compiler, &emptymatch, CMP(SLJIT_EQUAL, TMP1, 0, STR_PTR, 0));
@ -7704,9 +7749,6 @@ while (*cc != OP_KETRPOS)
}
}
if (needs_control_head)
OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), common->control_head_ptr, SLJIT_MEM1(STACK_TOP), STACK(stack));
JUMPTO(SLJIT_JUMP, loop);
flush_stubs(common);
@ -8441,8 +8483,7 @@ while (cc < ccend)
OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(1), STR_PTR, 0);
}
BACKTRACK_AS(braminzero_backtrack)->matchingpath = LABEL();
if (cc[1] > OP_ASSERTBACK_NOT)
count_match(common);
count_match(common);
break;
case OP_ONCE:
@ -9624,7 +9665,7 @@ static SLJIT_INLINE void compile_recurse(compiler_common *common)
DEFINE_COMPILER;
pcre_uchar *cc = common->start + common->currententry->start;
pcre_uchar *ccbegin = cc + 1 + LINK_SIZE + (*cc == OP_BRA ? 0 : IMM2_SIZE);
pcre_uchar *ccend = bracketend(cc);
pcre_uchar *ccend = bracketend(cc) - (1 + LINK_SIZE);
BOOL needs_control_head;
int framesize = get_framesize(common, cc, NULL, TRUE, &needs_control_head);
int private_data_size = get_private_data_copy_length(common, ccbegin, ccend, needs_control_head);
@ -9648,6 +9689,7 @@ set_jumps(common->currententry->calls, common->currententry->entry);
sljit_emit_fast_enter(compiler, TMP2, 0);
allocate_stack(common, private_data_size + framesize + alternativesize);
count_match(common);
OP1(SLJIT_MOV, SLJIT_MEM1(STACK_TOP), STACK(private_data_size + framesize + alternativesize - 1), TMP2, 0);
copy_private_data(common, ccbegin, ccend, TRUE, private_data_size + framesize + alternativesize, framesize + alternativesize, needs_control_head);
if (needs_control_head)
@ -9992,6 +10034,7 @@ OP1(SLJIT_MOV, TMP2, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, stack));
OP1(SLJIT_MOV_UI, TMP1, 0, SLJIT_MEM1(TMP1), SLJIT_OFFSETOF(jit_arguments, limit_match));
OP1(SLJIT_MOV, STACK_TOP, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(struct sljit_stack, base));
OP1(SLJIT_MOV, STACK_LIMIT, 0, SLJIT_MEM1(TMP2), SLJIT_OFFSETOF(struct sljit_stack, limit));
OP2(SLJIT_ADD, TMP1, 0, TMP1, 0, SLJIT_IMM, 1);
OP1(SLJIT_MOV, SLJIT_MEM1(SLJIT_SP), LIMIT_MATCH, TMP1, 0);
if (mode == JIT_PARTIAL_SOFT_COMPILE)

View File

@ -182,6 +182,7 @@ static struct regression_test_case regression_test_cases[] = {
{ CMUAP, 0, "\xf0\x90\x90\x80{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" },
{ CMUAP, 0, "\xf0\x90\x90\xa8{2}", "\xf0\x90\x90\x80#\xf0\x90\x90\xa8\xf0\x90\x90\x80" },
{ CMUAP, 0, "\xe1\xbd\xb8\xe1\xbf\xb8", "\xe1\xbf\xb8\xe1\xbd\xb8" },
{ MA, 0, "[3-57-9]", "5" },
/* Assertions. */
{ MUA, 0, "\\b[^A]", "A_B#" },

View File

@ -71,6 +71,7 @@ Arguments:
startcode pointer to start of the whole pattern's code
options the compiling options
recurses chain of recurse_check to catch mutual recursion
countptr pointer to call count (to catch over complexity)
Returns: the minimum length
-1 if \C in UTF-8 mode or (*ACCEPT) was encountered
@ -80,7 +81,8 @@ Returns: the minimum length
static int
find_minlength(const REAL_PCRE *re, const pcre_uchar *code,
const pcre_uchar *startcode, int options, recurse_check *recurses)
const pcre_uchar *startcode, int options, recurse_check *recurses,
int *countptr)
{
int length = -1;
/* PCRE_UTF16 has the same value as PCRE_UTF8. */
@ -90,6 +92,8 @@ recurse_check this_recurse;
register int branchlength = 0;
register pcre_uchar *cc = (pcre_uchar *)code + 1 + LINK_SIZE;
if ((*countptr)++ > 1000) return -1; /* too complex */
if (*code == OP_CBRA || *code == OP_SCBRA ||
*code == OP_CBRAPOS || *code == OP_SCBRAPOS) cc += IMM2_SIZE;
@ -131,7 +135,7 @@ for (;;)
case OP_SBRAPOS:
case OP_ONCE:
case OP_ONCE_NC:
d = find_minlength(re, cc, startcode, options, recurses);
d = find_minlength(re, cc, startcode, options, recurses, countptr);
if (d < 0) return d;
branchlength += d;
do cc += GET(cc, 1); while (*cc == OP_ALT);
@ -415,7 +419,8 @@ for (;;)
int dd;
this_recurse.prev = recurses;
this_recurse.group = cs;
dd = find_minlength(re, cs, startcode, options, &this_recurse);
dd = find_minlength(re, cs, startcode, options, &this_recurse,
countptr);
if (dd < d) d = dd;
}
}
@ -451,7 +456,8 @@ for (;;)
{
this_recurse.prev = recurses;
this_recurse.group = cs;
d = find_minlength(re, cs, startcode, options, &this_recurse);
d = find_minlength(re, cs, startcode, options, &this_recurse,
countptr);
}
}
}
@ -514,7 +520,7 @@ for (;;)
this_recurse.prev = recurses;
this_recurse.group = cs;
branchlength += find_minlength(re, cs, startcode, options,
&this_recurse);
&this_recurse, countptr);
}
}
cc += 1 + LINK_SIZE;
@ -1453,6 +1459,7 @@ pcre32_study(const pcre32 *external_re, int options, const char **errorptr)
#endif
{
int min;
int count = 0;
BOOL bits_set = FALSE;
pcre_uint8 start_bits[32];
PUBL(extra) *extra = NULL;
@ -1539,7 +1546,7 @@ if ((re->options & PCRE_ANCHORED) == 0 &&
/* Find the minimum length of subject string. */
switch(min = find_minlength(re, code, code, re->options, NULL))
switch(min = find_minlength(re, code, code, re->options, NULL, &count))
{
case -2: *errorptr = "internal error: missing capturing bracket"; return NULL;
case -3: *errorptr = "internal error: opcode not recognized"; return NULL;

View File

@ -246,7 +246,7 @@ while ((t = *data++) != XCL_END)
case PT_PXPUNCT:
if ((PRIV(ucp_gentype)[prop->chartype] == ucp_P ||
(c < 256 && PRIV(ucp_gentype)[prop->chartype] == ucp_S)) == isprop)
(c < 128 && PRIV(ucp_gentype)[prop->chartype] == ucp_S)) == isprop)
return !negated;
break;

View File

@ -1692,9 +1692,13 @@ while (ptr < endptr)
if (filenames == FN_NOMATCH_ONLY) return 1;
/* If all we want is a yes/no answer, stop now. */
if (quiet) return 0;
/* Just count if just counting is wanted. */
if (count_only) count++;
else if (count_only) count++;
/* When handling a binary file and binary-files==binary, the "binary"
variable will be set true (it's false in all other cases). In this
@ -1715,10 +1719,6 @@ while (ptr < endptr)
return 0;
}
/* Likewise, if all we want is a yes/no answer. */
else if (quiet) return 0;
/* The --only-matching option prints just the substring that matched,
and/or one or more captured portions of it, as long as these strings are
not empty. The --file-offsets and --line-offsets options output offsets for
@ -2089,7 +2089,7 @@ if (filenames == FN_NOMATCH_ONLY)
/* Print the match count if wanted */
if (count_only)
if (count_only && !quiet)
{
if (count > 0 || !omit_zero_count)
{

View File

@ -4621,9 +4621,9 @@ while (!done)
else switch ((c = *p++))
{
case 'a': c = 7; break;
case 'a': c = CHAR_BEL; break;
case 'b': c = '\b'; break;
case 'e': c = 27; break;
case 'e': c = CHAR_ESC; break;
case 'f': c = '\f'; break;
case 'n': c = '\n'; break;
case 'r': c = '\r'; break;

View File

@ -751,3 +751,7 @@ RC=0
2:3,1
2:4,1
RC=0
---------------------------- Test 108 ------------------------------
RC=0
---------------------------- Test 109 -----------------------------
RC=0

View File

@ -5730,4 +5730,7 @@ AbcdCBefgBhiBqz
"(?1)(?#?'){8}(a)"
baaaaaaaaac
"(?|(\k'Pm')|(?'Pm'))"
abcd
/-- End of testinput1 --/

View File

@ -136,4 +136,6 @@ is required for these tests. --/
/((?+1)(\1))/B
/.((?2)(?R)\1)()/B
/-- End of testinput11 --/

File diff suppressed because one or more lines are too long

View File

@ -340,4 +340,6 @@ not matter. --/
/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/BZ
/(?'ABC'[bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar](*THEN:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/
/-- End of testinput14 --/

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1502,4 +1502,55 @@
/\C\X*QT/8
Ӆ\x0aT
/[\pS#moq]/
=
/[[:punct:]]/8W
\xc2\xb4
\x{b4}
/[[:^ascii:]]/8W
\x{100}
\x{200}
\x{300}
\x{37e}
a
9
g
/[[:^ascii:]\w]/8W
a
9
g
\x{100}
\x{200}
\x{300}
\x{37e}
/[\w[:^ascii:]]/8W
a
9
g
\x{100}
\x{200}
\x{300}
\x{37e}
/[^[:ascii:]\W]/8W
a
9
g
\x{100}
\x{200}
\x{300}
\x{37e}
/[[:^ascii:]a]/8W
a
9
g
\x{100}
\x{200}
\x{37e}
/-- End of testinput6 --/

View File

@ -838,4 +838,19 @@ of case for anything other than the ASCII letters. --/
/^s?c/mi8I
scat
/[\W\p{Any}]/BZ
abc
123
/[\W\pL]/BZ
abc
** Failers
123
/a[[:punct:]b]/WBZ
/a[[:punct:]b]/8WBZ
/a[b[:punct:]]/8WBZ
/-- End of testinput7 --/

View File

@ -29,13 +29,16 @@ in EBCDIC, but can be specified as escapes. --/
/^A\ˆ/
A B
A\x41B
/-- Test \H --/
/^A\È/
AB
A\x42B
** Fail
A B
A\x41B
/-- Test \R --/

View File

@ -9429,4 +9429,9 @@ No match
0: aaaaaaaaa
1: a
"(?|(\k'Pm')|(?'Pm'))"
abcd
0:
1:
/-- End of testinput1 --/

View File

@ -231,7 +231,7 @@ Memory allocation (code space): 73
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 61
Memory allocation (code space): 77
------------------------------------------------------------------
0 24 Bra
2 5 CBra 1
@ -650,18 +650,18 @@ Memory allocation (code space): 14
/[[:^alpha:][:^cntrl:]]+/8WB
------------------------------------------------------------------
0 26 Bra
2 [ -~\x80-\xff\P{L}]++
26 26 Ket
28 End
0 30 Bra
2 [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
30 30 Ket
32 End
------------------------------------------------------------------
/[[:^cntrl:][:^alpha:]]+/8WB
------------------------------------------------------------------
0 26 Bra
2 [ -~\x80-\xff\P{L}]++
26 26 Ket
28 End
0 30 Bra
2 [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
30 30 Ket
32 End
------------------------------------------------------------------
/[[:alpha:]]+/8WB
@ -748,4 +748,21 @@ Memory allocation (code space): 14
22 End
------------------------------------------------------------------
/.((?2)(?R)\1)()/B
------------------------------------------------------------------
0 23 Bra
2 Any
3 13 Once
5 9 CBra 1
8 18 Recurse
10 0 Recurse
12 \1
14 9 Ket
16 13 Ket
18 3 CBra 2
21 3 Ket
23 23 Ket
25 End
------------------------------------------------------------------
/-- End of testinput11 --/

View File

@ -231,7 +231,7 @@ Memory allocation (code space): 155
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 125
Memory allocation (code space): 157
------------------------------------------------------------------
0 24 Bra
2 5 CBra 1
@ -650,18 +650,18 @@ Memory allocation (code space): 28
/[[:^alpha:][:^cntrl:]]+/8WB
------------------------------------------------------------------
0 18 Bra
2 [ -~\x80-\xff\P{L}]++
18 18 Ket
20 End
0 21 Bra
2 [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
21 21 Ket
23 End
------------------------------------------------------------------
/[[:^cntrl:][:^alpha:]]+/8WB
------------------------------------------------------------------
0 18 Bra
2 [ -~\x80-\xff\P{L}]++
18 18 Ket
20 End
0 21 Bra
2 [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
21 21 Ket
23 End
------------------------------------------------------------------
/[[:alpha:]]+/8WB
@ -748,4 +748,21 @@ Memory allocation (code space): 28
22 End
------------------------------------------------------------------
/.((?2)(?R)\1)()/B
------------------------------------------------------------------
0 23 Bra
2 Any
3 13 Once
5 9 CBra 1
8 18 Recurse
10 0 Recurse
12 \1
14 9 Ket
16 13 Ket
18 3 CBra 2
21 3 Ket
23 23 Ket
25 End
------------------------------------------------------------------
/-- End of testinput11 --/

View File

@ -231,7 +231,7 @@ Memory allocation (code space): 45
------------------------------------------------------------------
/(?P<a>a)...(?P=a)bbb(?P>a)d/BM
Memory allocation (code space): 38
Memory allocation (code space): 50
------------------------------------------------------------------
0 30 Bra
3 7 CBra 1
@ -650,18 +650,18 @@ Memory allocation (code space): 10
/[[:^alpha:][:^cntrl:]]+/8WB
------------------------------------------------------------------
0 44 Bra
3 [ -~\x80-\xff\P{L}]++
44 44 Ket
47 End
0 51 Bra
3 [ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
51 51 Ket
54 End
------------------------------------------------------------------
/[[:^cntrl:][:^alpha:]]+/8WB
------------------------------------------------------------------
0 44 Bra
3 [ -~\x80-\xff\P{L}]++
44 44 Ket
47 End
0 51 Bra
3 [ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
51 51 Ket
54 End
------------------------------------------------------------------
/[[:alpha:]]+/8WB
@ -748,4 +748,21 @@ Memory allocation (code space): 10
34 End
------------------------------------------------------------------
/.((?2)(?R)\1)()/B
------------------------------------------------------------------
0 35 Bra
3 Any
4 20 Once
7 14 CBra 1
12 27 Recurse
15 0 Recurse
18 \1
21 14 Ket
24 20 Ket
27 5 CBra 2
32 5 Ket
35 35 Ket
38 End
------------------------------------------------------------------
/-- End of testinput11 --/

File diff suppressed because one or more lines are too long

View File

@ -527,4 +527,6 @@ Failed: character value in \u.... sequence is too large at offset 6
End
------------------------------------------------------------------
/(?'ABC'[bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar]([bar](*THEN:AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/
/-- End of testinput14 --/

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -2469,4 +2469,92 @@ No match
Ӆ\x0aT
No match
/[\pS#moq]/
=
0: =
/[[:punct:]]/8W
\xc2\xb4
No match
\x{b4}
No match
/[[:^ascii:]]/8W
\x{100}
0: \x{100}
\x{200}
0: \x{200}
\x{300}
0: \x{300}
\x{37e}
0: \x{37e}
a
No match
9
No match
g
No match
/[[:^ascii:]\w]/8W
a
0: a
9
0: 9
g
0: g
\x{100}
0: \x{100}
\x{200}
0: \x{200}
\x{300}
0: \x{300}
\x{37e}
0: \x{37e}
/[\w[:^ascii:]]/8W
a
0: a
9
0: 9
g
0: g
\x{100}
0: \x{100}
\x{200}
0: \x{200}
\x{300}
0: \x{300}
\x{37e}
0: \x{37e}
/[^[:ascii:]\W]/8W
a
No match
9
No match
g
No match
\x{100}
0: \x{100}
\x{200}
0: \x{200}
\x{300}
No match
\x{37e}
No match
/[[:^ascii:]a]/8W
a
0: a
9
No match
g
No match
\x{100}
0: \x{100}
\x{200}
0: \x{200}
\x{37e}
0: \x{37e}
/-- End of testinput6 --/

View File

@ -949,7 +949,7 @@ No match
/[[:^alpha:][:^cntrl:]]+/8WBZ
------------------------------------------------------------------
Bra
[ -~\x80-\xff\P{L}]++
[ -~\x80-\xff\P{L}\x{100}-\x{10ffff}]++
Ket
End
------------------------------------------------------------------
@ -961,7 +961,7 @@ No match
/[[:^cntrl:][:^alpha:]]+/8WBZ
------------------------------------------------------------------
Bra
[ -~\x80-\xff\P{L}]++
[ -~\x80-\xff\x{100}-\x{10ffff}\P{L}]++
Ket
End
------------------------------------------------------------------
@ -2295,4 +2295,57 @@ Need char = 'c' (caseless)
scat
0: sc
/[\W\p{Any}]/BZ
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{Any}]
Ket
End
------------------------------------------------------------------
abc
0: a
123
0: 1
/[\W\pL]/BZ
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{L}]
Ket
End
------------------------------------------------------------------
abc
0: a
** Failers
0: *
123
No match
/a[[:punct:]b]/WBZ
------------------------------------------------------------------
Bra
a
[b[:punct:]]
Ket
End
------------------------------------------------------------------
/a[[:punct:]b]/8WBZ
------------------------------------------------------------------
Bra
a
[b[:punct:]]
Ket
End
------------------------------------------------------------------
/a[b[:punct:]]/8WBZ
------------------------------------------------------------------
Bra
a
[b[:punct:]]
Ket
End
------------------------------------------------------------------
/-- End of testinput7 --/

View File

@ -41,16 +41,22 @@ No match
/^A\ˆ/
A B
0: A\x20
A\x41B
0: AA
/-- Test \H --/
/^A\È/
AB
0: AB
A\x42B
0: AB
** Fail
No match
A B
No match
A\x41B
No match
/-- Test \R --/