QRegularExpression: allow users to skip the UTF-16 check of the subject string
PCRE does not handle invalid UTF-16 sequences. For this reason we always check a subject string's UTF-16 validity before attempting any match over it (actually we let PCRE do that). The only exception so far has been global matching -- once the first match was done, we skipped re-doing the check over and over again the same string (PCRE actually checks the /entire/ string, not only the part it uses for matching). Still, users had no way to skip this check if they were 100% sure the string was a valid UTF-16 string. This commit introduces a way for them to skip the check. Change-Id: Iea352c06f531aa2153863b3a1681acaab7ac375c Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This commit is contained in:
parent
4532669285
commit
fd80cad07e
@ -763,6 +763,13 @@ QT_BEGIN_NAMESPACE
|
|||||||
The match is constrained to start exactly at the offset passed to
|
The match is constrained to start exactly at the offset passed to
|
||||||
match() in order to be successful, even if the pattern string does not
|
match() in order to be successful, even if the pattern string does not
|
||||||
contain any metacharacter that anchors the match at that point.
|
contain any metacharacter that anchors the match at that point.
|
||||||
|
|
||||||
|
\value DontCheckSubjectStringMatchOption
|
||||||
|
The subject string is not checked for UTF-16 validity before
|
||||||
|
attempting a match. Use this option with extreme caution, as
|
||||||
|
attempting to match an invalid string may crash the program and/or
|
||||||
|
constitute a security issue. This enum value has been introduced in
|
||||||
|
Qt 5.4.
|
||||||
*/
|
*/
|
||||||
|
|
||||||
// after how many usages we optimize the regexp
|
// after how many usages we optimize the regexp
|
||||||
@ -1221,7 +1228,8 @@ static int pcre16SafeExec(const pcre16 *code, const pcre16_extra *extra,
|
|||||||
options \a matchOptions and returns the QRegularExpressionMatchPrivate of
|
options \a matchOptions and returns the QRegularExpressionMatchPrivate of
|
||||||
the result. It also advances a match if a previous result is given as \a
|
the result. It also advances a match if a previous result is given as \a
|
||||||
previous. The \a subject string goes a Unicode validity check if
|
previous. The \a subject string goes a Unicode validity check if
|
||||||
\a checkSubjectString is CheckSubjectString (PCRE doesn't like illegal
|
\a checkSubjectString is CheckSubjectString and the match options don't
|
||||||
|
include DontCheckSubjectStringMatchOption (PCRE doesn't like illegal
|
||||||
UTF-16 sequences).
|
UTF-16 sequences).
|
||||||
|
|
||||||
Advancing a match is a tricky algorithm. If the previous match matched a
|
Advancing a match is a tricky algorithm. If the previous match matched a
|
||||||
@ -1290,8 +1298,10 @@ QRegularExpressionMatchPrivate *QRegularExpressionPrivate::doMatch(const QString
|
|||||||
else if (matchType == QRegularExpression::PartialPreferFirstMatch)
|
else if (matchType == QRegularExpression::PartialPreferFirstMatch)
|
||||||
pcreOptions |= PCRE_PARTIAL_HARD;
|
pcreOptions |= PCRE_PARTIAL_HARD;
|
||||||
|
|
||||||
if (checkSubjectStringOption == DontCheckSubjectString)
|
if (checkSubjectStringOption == DontCheckSubjectString
|
||||||
|
|| matchOptions & QRegularExpression::DontCheckSubjectStringMatchOption) {
|
||||||
pcreOptions |= PCRE_NO_UTF16_CHECK;
|
pcreOptions |= PCRE_NO_UTF16_CHECK;
|
||||||
|
}
|
||||||
|
|
||||||
bool previousMatchWasEmpty = false;
|
bool previousMatchWasEmpty = false;
|
||||||
if (previous && previous->hasMatch &&
|
if (previous && previous->hasMatch &&
|
||||||
|
@ -110,7 +110,8 @@ public:
|
|||||||
|
|
||||||
enum MatchOption {
|
enum MatchOption {
|
||||||
NoMatchOption = 0x0000,
|
NoMatchOption = 0x0000,
|
||||||
AnchoredMatchOption = 0x0001
|
AnchoredMatchOption = 0x0001,
|
||||||
|
DontCheckSubjectStringMatchOption = 0x0002
|
||||||
};
|
};
|
||||||
Q_DECLARE_FLAGS(MatchOptions, MatchOption)
|
Q_DECLARE_FLAGS(MatchOptions, MatchOption)
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user