Include the QRegularExpression porting docs in Qt 6 porting guide
The instructions for porting away from QRegExp to QRegularExpression in the Qt 6 porting guide were mostly copied from the similar docs for QRegExp, which are moved to doc/global/includes/corelib/port-from-qregexp.qdocinc. The later now covers everything that the docs from porting guide did and doesn't have the issues listed in QTBUG-89702. Remove the old docs and include the docs from doc/global/includes instead. Task-number: QTBUG-89702 Change-Id: Ifdb79d5775bc0cadd02c21299d58adb27ae13337 Reviewed-by: Edward Welbourne <edward.welbourne@qt.io> (cherry picked from commit 93f7291387c03367e828b16299ddcbaf1f804e25) Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>
This commit is contained in:
parent
26c7a1b52e
commit
a7cb167a7d
@ -528,252 +528,13 @@
|
|||||||
|
|
||||||
\section2 The QRegularExpression class
|
\section2 The QRegularExpression class
|
||||||
|
|
||||||
In Qt6, all methods taking the \c QRegExp got removed from our code-base.
|
In Qt 6, the \c QRegExp type has been retired to the Qt5Compat module
|
||||||
Therefore it is very likely that you will have to port your application or
|
and all Qt APIs using it have been removed from other modules.
|
||||||
library to \l QRegularExpression.
|
Client code which used it can be ported to use \l QRegularExpression
|
||||||
|
in its place. As \l QRegularExpression is present already in Qt 5,
|
||||||
|
this can be done and tested before migration to Qt 6.
|
||||||
|
|
||||||
\l QRegularExpression implements Perl-compatible regular expressions. It
|
\include corelib/port-from-qregexp.qdocinc porting-to-qregularexpression
|
||||||
fully supports Unicode. For an overview of the regular expression syntax
|
|
||||||
supported by \l QRegularExpression, please refer to the aforementioned
|
|
||||||
pcrepattern(3) man page. A regular expression is made up of two things: a
|
|
||||||
pattern string and a set of pattern options that change the meaning of the
|
|
||||||
pattern string.
|
|
||||||
|
|
||||||
There are some subtle differences between \l QRegularExpression and \c
|
|
||||||
QRegExp that will be explained by this document to ease the porting effort.
|
|
||||||
|
|
||||||
\l QRegularExpression is more strict when it comes to the syntax of the
|
|
||||||
regular expression. Therefore it is always good to check the expression
|
|
||||||
for \l {QRegularExpression::isValid}{validity}.
|
|
||||||
|
|
||||||
\l QRegularExpression can almost always be declared const (except when the
|
|
||||||
pattern changes), while \c QRegExp almost never could be.
|
|
||||||
|
|
||||||
There is no replacement for the \l {QRegExp::CaretMode}{CaretMode}
|
|
||||||
enumeration. The \l {QRegularExpression::AnchoredMatchOption} match option
|
|
||||||
can be used to emulate the QRegExp::CaretAtOffset behavior. There is no
|
|
||||||
equivalent for the other QRegExp::CaretMode modes.
|
|
||||||
|
|
||||||
\l QRegularExpression supports only Perl-compatible regular expressions.
|
|
||||||
Still, it does not support all the features available in Perl-compatible
|
|
||||||
regular expressions. The most notable one is the fact that duplicated names
|
|
||||||
for capturing groups are not supported, and using them can lead to
|
|
||||||
undefined behavior. This may change in a future version of Qt.
|
|
||||||
|
|
||||||
\section3 Wildcard matching
|
|
||||||
|
|
||||||
There is no direct way to do wildcard matching in \l QRegularExpression.
|
|
||||||
However, the \l {QRegularExpression::wildcardToRegularExpression} method
|
|
||||||
is provided to translate glob patterns into a Perl-compatible regular
|
|
||||||
expression that can be used for that purpose.
|
|
||||||
|
|
||||||
For example, if you have code like
|
|
||||||
|
|
||||||
\code
|
|
||||||
QRegExp wildcard("*.txt");
|
|
||||||
wildcard.setPatternSyntax(QRegExp::Wildcard);
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
you can rewrite it as
|
|
||||||
|
|
||||||
\code
|
|
||||||
auto wildcard = QRegularExpression(QRegularExpression::wildcardToRegularExpression("*.txt"));
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
Please note though that not all shell like wildcard pattern might be
|
|
||||||
translated in a way you would expect it. The following example code will
|
|
||||||
silently break if simply converted using the above mentioned function:
|
|
||||||
|
|
||||||
\code *
|
|
||||||
const QString fp1("C:/Users/dummy/files/content.txt");
|
|
||||||
const QString fp2("/home/dummy/files/content.txt");
|
|
||||||
|
|
||||||
QRegExp re1("\1/files/*");
|
|
||||||
re1.setPatternSyntax(QRegExp::Wildcard);
|
|
||||||
... = re1.exactMatch(fp1); // returns true
|
|
||||||
... = re1.exactMatch(fp2); // returns true
|
|
||||||
|
|
||||||
// but converted with QRegularExpression::wildcardToRegularExpression()
|
|
||||||
|
|
||||||
QRegularExpression re2(QRegularExpression::wildcardToRegularExpression("\1/files/*"));
|
|
||||||
... = re2.match(fp1).hasMatch(); // returns false
|
|
||||||
... = re2.match(fp2).hasMatch(); // returns false
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
\section3 Searching forward
|
|
||||||
|
|
||||||
Forward searching inside a string was usually implemented with a loop using
|
|
||||||
\c {QRegExp::indexIn} and a growing offset, but can now be easily implemented
|
|
||||||
with \l QRegularExpressionMatchIterator or \l {QString::indexOf}.
|
|
||||||
|
|
||||||
For example, if you have code like
|
|
||||||
|
|
||||||
\code
|
|
||||||
QString subject("the quick fox");
|
|
||||||
|
|
||||||
int offset = 0;
|
|
||||||
QRegExp re("(\\w+)");
|
|
||||||
while ((offset = re.indexIn(subject, offset)) != -1) {
|
|
||||||
offset += re.matchedLength();
|
|
||||||
// ...
|
|
||||||
}
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
you can rewrite it as
|
|
||||||
|
|
||||||
\code
|
|
||||||
QRegularExpression re("(\\w+)");
|
|
||||||
QString subject("the quick fox");
|
|
||||||
|
|
||||||
QRegularExpressionMatchIterator i = re.globalMatch(subject);
|
|
||||||
while (i.hasNext()) {
|
|
||||||
QRegularExpressionMatch match = i.next();
|
|
||||||
// ...
|
|
||||||
}
|
|
||||||
|
|
||||||
// or alternatively using QString::indexOf
|
|
||||||
|
|
||||||
qsizetype from = 0;
|
|
||||||
QRegularExpressionMatch match;
|
|
||||||
while ((from = subject.indexOf(re, from, &match)) != -1) {
|
|
||||||
from += match.capturedLength();
|
|
||||||
// ...
|
|
||||||
}
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
\section3 Searching backwards
|
|
||||||
|
|
||||||
Backwards searching inside a string was usually often implemented as a loop
|
|
||||||
over \c {QRegExp::lastIndexIn}, but can now be easily implemented using
|
|
||||||
\l {QString::lastIndexOf} and \l {QRegularExpressionMatch}.
|
|
||||||
|
|
||||||
\note \l QRegularExpressionMatchIterator is not capable of performing a
|
|
||||||
backwards search.
|
|
||||||
|
|
||||||
For example, if you have code like
|
|
||||||
|
|
||||||
\code
|
|
||||||
int offset = -1;
|
|
||||||
QString subject("Lorem ipsum dolor sit amet, consetetur sadipscing.");
|
|
||||||
|
|
||||||
QRegExp re("\\s+([ids]\\w+)");
|
|
||||||
while ((offset = re.lastIndexIn(subject, offset)) != -1) {
|
|
||||||
--offset;
|
|
||||||
// ...
|
|
||||||
}
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
you can rewrite it as
|
|
||||||
|
|
||||||
\code
|
|
||||||
qsizetype from = -1;
|
|
||||||
QString subject("Lorem ipsum dolor sit amet, consetetur sadipscing.");
|
|
||||||
|
|
||||||
QRegularExpressionMatch match;
|
|
||||||
QRegularExpression re("\\s+([ids]\\w+)");
|
|
||||||
while ((from = subject.lastIndexOf(re, from, &match)) != -1) {
|
|
||||||
--from;
|
|
||||||
// ...
|
|
||||||
}
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
\section3 exactMatch vs. match.hasMatch
|
|
||||||
|
|
||||||
\c {QRegExp::exactMatch} served two purposes: it exactly matched a regular
|
|
||||||
expression against a subject string, and it implemented partial matching.
|
|
||||||
Exact matching indicates whether the regular expression matches the entire
|
|
||||||
subject string. For example:
|
|
||||||
|
|
||||||
\code
|
|
||||||
QString source("abc123");
|
|
||||||
|
|
||||||
QRegExp("\\d+").exactMatch(source); // returns false
|
|
||||||
QRegExp("[a-z]+\\d+").exactMatch(source); // returns true
|
|
||||||
|
|
||||||
QRegularExpression("\\d+").match(source).hasMatch(); // returns true
|
|
||||||
QRegularExpression("[a-z]+\\d+").match(source).hasMatch(); // returns true
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
Exact matching is not reflected in \l QRegularExpression. If you want to be
|
|
||||||
sure that the subject string matches the regular expression exactly, you
|
|
||||||
can wrap the pattern using the \l {QRegularExpression::anchoredPattern}
|
|
||||||
function:
|
|
||||||
|
|
||||||
\code
|
|
||||||
QString source("abc123");
|
|
||||||
|
|
||||||
QString pattern("\\d+");
|
|
||||||
QRegularExpression(pattern).match(source).hasMatch(); // returns true
|
|
||||||
|
|
||||||
pattern = QRegularExpression::anchoredPattern(pattern);
|
|
||||||
QRegularExpression(pattern).match(source).hasMatch(); // returns false
|
|
||||||
\endcode
|
|
||||||
|
|
||||||
\section3 Minimal matching
|
|
||||||
|
|
||||||
\c QRegExp::setMinimal() implemented minimal matching by simply reversing
|
|
||||||
the greediness of the quantifiers (\c QRegExp did not support lazy
|
|
||||||
quantifiers, like *?, +?, etc.). QRegularExpression instead does support
|
|
||||||
greedy, lazy and possessive quantifiers. The \l
|
|
||||||
{QRegularExpression::InvertedGreedinessOption} pattern option can be useful
|
|
||||||
to emulate the effects of \c QRegExp::setMinimal(): if enabled, it inverts
|
|
||||||
the greediness of quantifiers (greedy ones become lazy and vice versa).
|
|
||||||
|
|
||||||
\section3 Different pattern syntax
|
|
||||||
|
|
||||||
Porting a regular expression from \c QRegExp to \l QRegularExpression may
|
|
||||||
require changes to the pattern itself. Therefore it is recommended to check
|
|
||||||
the pattern used with the \l {QRegularExpression::isValid} method. This is
|
|
||||||
especially important for user provided pattern or pattern not controlled by
|
|
||||||
the developer.
|
|
||||||
|
|
||||||
In other cases, a pattern ported from \c QRegExp to \l QRegularExpression may
|
|
||||||
silently change semantics. Therefore, it is necessary to review the patterns
|
|
||||||
used. The most notable cases of silent incompatibility are:
|
|
||||||
|
|
||||||
\list
|
|
||||||
\li Curly braces are needed in order to use a hexadecimal escape like \c
|
|
||||||
{\xHHHH} with more than 2 digits. A pattern like \c {\x2022} needs
|
|
||||||
to be ported to \c {\x{2022}}, or it will match a space \c {(0x20)}
|
|
||||||
followed by the string \c {"22"}. In general, it is highly recommended
|
|
||||||
to always use curly braces with the \c {\x} escape, no matter the
|
|
||||||
amount of digits specified.
|
|
||||||
|
|
||||||
\li A \c{0-to-n} quantification like \c {{,n}} needs to be ported to
|
|
||||||
\c {{0,n}} to preserve semantics. Otherwise, a pattern such as
|
|
||||||
\c {\d{,3}} would actually match a digit followed by the exact
|
|
||||||
string \c {"{,3}"}.
|
|
||||||
\endlist
|
|
||||||
|
|
||||||
\section3 Partial Matching
|
|
||||||
|
|
||||||
When using \c QRegExp::exactMatch(), if an exact match was not found, one
|
|
||||||
could still find out how much of the subject string was matched by the
|
|
||||||
regular expression by calling \c QRegExp::matchedLength(). If the returned
|
|
||||||
length was equal to the subject string's length, then one could conclude
|
|
||||||
that a partial match was found.
|
|
||||||
\l QRegularExpression supports partial matching explicitly by means of the
|
|
||||||
appropriate \l {QRegularExpression::MatchType}.
|
|
||||||
|
|
||||||
\section3 Global matching
|
|
||||||
|
|
||||||
Due to limitations of the \c QRegExp API it was impossible to implement
|
|
||||||
global matching correctly (that is, like Perl does). In particular, patterns
|
|
||||||
that can match zero characters (like "a*") are problematic. \l
|
|
||||||
{QRegularExpression::wildcardToRegularExpression} implements Perl global
|
|
||||||
match correctly, and the returned iterator can be used to examine each
|
|
||||||
result.
|
|
||||||
|
|
||||||
\section3 Unicode properties support
|
|
||||||
|
|
||||||
When using \c QRegExp, character classes such as \c{\w}, \c{\d}, etc. match
|
|
||||||
characters with the corresponding Unicode property: for instance, \c{\d}
|
|
||||||
matches any character with the Unicode Nd (decimal digit) property. Those
|
|
||||||
character classes only match ASCII characters by default. When using \l
|
|
||||||
QRegularExpression: for instance, \c{\d} matches exactly a character in the
|
|
||||||
0-9 ASCII range. It is possible to change this behavior by using the \l
|
|
||||||
{QRegularExpression::UseUnicodePropertiesOption}
|
|
||||||
pattern option.
|
|
||||||
|
|
||||||
\section2 The QRegExp class
|
\section2 The QRegExp class
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user