Include the QRegularExpression porting docs in Qt 6 porting guide

The instructions for porting away from QRegExp to QRegularExpression in
the Qt 6 porting guide were mostly copied from the similar docs for
QRegExp, which are moved to
doc/global/includes/corelib/port-from-qregexp.qdocinc. The later now
covers everything that the docs from porting guide did and doesn't have
the issues listed in QTBUG-89702.

Remove the old docs and include the docs from doc/global/includes
instead.

Task-number: QTBUG-89702
Change-Id: Ifdb79d5775bc0cadd02c21299d58adb27ae13337
Reviewed-by: Edward Welbourne <edward.welbourne@qt.io>
(cherry picked from commit 93f7291387c03367e828b16299ddcbaf1f804e25)
Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>
This commit is contained in:
Sona Kurazyan 2022-07-05 16:21:41 +02:00 committed by Qt Cherry-pick Bot
parent 26c7a1b52e
commit a7cb167a7d

View File

@ -528,252 +528,13 @@
\section2 The QRegularExpression class
In Qt6, all methods taking the \c QRegExp got removed from our code-base.
Therefore it is very likely that you will have to port your application or
library to \l QRegularExpression.
In Qt 6, the \c QRegExp type has been retired to the Qt5Compat module
and all Qt APIs using it have been removed from other modules.
Client code which used it can be ported to use \l QRegularExpression
in its place. As \l QRegularExpression is present already in Qt 5,
this can be done and tested before migration to Qt 6.
\l QRegularExpression implements Perl-compatible regular expressions. It
fully supports Unicode. For an overview of the regular expression syntax
supported by \l QRegularExpression, please refer to the aforementioned
pcrepattern(3) man page. A regular expression is made up of two things: a
pattern string and a set of pattern options that change the meaning of the
pattern string.
There are some subtle differences between \l QRegularExpression and \c
QRegExp that will be explained by this document to ease the porting effort.
\l QRegularExpression is more strict when it comes to the syntax of the
regular expression. Therefore it is always good to check the expression
for \l {QRegularExpression::isValid}{validity}.
\l QRegularExpression can almost always be declared const (except when the
pattern changes), while \c QRegExp almost never could be.
There is no replacement for the \l {QRegExp::CaretMode}{CaretMode}
enumeration. The \l {QRegularExpression::AnchoredMatchOption} match option
can be used to emulate the QRegExp::CaretAtOffset behavior. There is no
equivalent for the other QRegExp::CaretMode modes.
\l QRegularExpression supports only Perl-compatible regular expressions.
Still, it does not support all the features available in Perl-compatible
regular expressions. The most notable one is the fact that duplicated names
for capturing groups are not supported, and using them can lead to
undefined behavior. This may change in a future version of Qt.
\section3 Wildcard matching
There is no direct way to do wildcard matching in \l QRegularExpression.
However, the \l {QRegularExpression::wildcardToRegularExpression} method
is provided to translate glob patterns into a Perl-compatible regular
expression that can be used for that purpose.
For example, if you have code like
\code
QRegExp wildcard("*.txt");
wildcard.setPatternSyntax(QRegExp::Wildcard);
\endcode
you can rewrite it as
\code
auto wildcard = QRegularExpression(QRegularExpression::wildcardToRegularExpression("*.txt"));
\endcode
Please note though that not all shell like wildcard pattern might be
translated in a way you would expect it. The following example code will
silently break if simply converted using the above mentioned function:
\code *
const QString fp1("C:/Users/dummy/files/content.txt");
const QString fp2("/home/dummy/files/content.txt");
QRegExp re1("\1/files/*");
re1.setPatternSyntax(QRegExp::Wildcard);
... = re1.exactMatch(fp1); // returns true
... = re1.exactMatch(fp2); // returns true
// but converted with QRegularExpression::wildcardToRegularExpression()
QRegularExpression re2(QRegularExpression::wildcardToRegularExpression("\1/files/*"));
... = re2.match(fp1).hasMatch(); // returns false
... = re2.match(fp2).hasMatch(); // returns false
\endcode
\section3 Searching forward
Forward searching inside a string was usually implemented with a loop using
\c {QRegExp::indexIn} and a growing offset, but can now be easily implemented
with \l QRegularExpressionMatchIterator or \l {QString::indexOf}.
For example, if you have code like
\code
QString subject("the quick fox");
int offset = 0;
QRegExp re("(\\w+)");
while ((offset = re.indexIn(subject, offset)) != -1) {
offset += re.matchedLength();
// ...
}
\endcode
you can rewrite it as
\code
QRegularExpression re("(\\w+)");
QString subject("the quick fox");
QRegularExpressionMatchIterator i = re.globalMatch(subject);
while (i.hasNext()) {
QRegularExpressionMatch match = i.next();
// ...
}
// or alternatively using QString::indexOf
qsizetype from = 0;
QRegularExpressionMatch match;
while ((from = subject.indexOf(re, from, &match)) != -1) {
from += match.capturedLength();
// ...
}
\endcode
\section3 Searching backwards
Backwards searching inside a string was usually often implemented as a loop
over \c {QRegExp::lastIndexIn}, but can now be easily implemented using
\l {QString::lastIndexOf} and \l {QRegularExpressionMatch}.
\note \l QRegularExpressionMatchIterator is not capable of performing a
backwards search.
For example, if you have code like
\code
int offset = -1;
QString subject("Lorem ipsum dolor sit amet, consetetur sadipscing.");
QRegExp re("\\s+([ids]\\w+)");
while ((offset = re.lastIndexIn(subject, offset)) != -1) {
--offset;
// ...
}
\endcode
you can rewrite it as
\code
qsizetype from = -1;
QString subject("Lorem ipsum dolor sit amet, consetetur sadipscing.");
QRegularExpressionMatch match;
QRegularExpression re("\\s+([ids]\\w+)");
while ((from = subject.lastIndexOf(re, from, &match)) != -1) {
--from;
// ...
}
\endcode
\section3 exactMatch vs. match.hasMatch
\c {QRegExp::exactMatch} served two purposes: it exactly matched a regular
expression against a subject string, and it implemented partial matching.
Exact matching indicates whether the regular expression matches the entire
subject string. For example:
\code
QString source("abc123");
QRegExp("\\d+").exactMatch(source); // returns false
QRegExp("[a-z]+\\d+").exactMatch(source); // returns true
QRegularExpression("\\d+").match(source).hasMatch(); // returns true
QRegularExpression("[a-z]+\\d+").match(source).hasMatch(); // returns true
\endcode
Exact matching is not reflected in \l QRegularExpression. If you want to be
sure that the subject string matches the regular expression exactly, you
can wrap the pattern using the \l {QRegularExpression::anchoredPattern}
function:
\code
QString source("abc123");
QString pattern("\\d+");
QRegularExpression(pattern).match(source).hasMatch(); // returns true
pattern = QRegularExpression::anchoredPattern(pattern);
QRegularExpression(pattern).match(source).hasMatch(); // returns false
\endcode
\section3 Minimal matching
\c QRegExp::setMinimal() implemented minimal matching by simply reversing
the greediness of the quantifiers (\c QRegExp did not support lazy
quantifiers, like *?, +?, etc.). QRegularExpression instead does support
greedy, lazy and possessive quantifiers. The \l
{QRegularExpression::InvertedGreedinessOption} pattern option can be useful
to emulate the effects of \c QRegExp::setMinimal(): if enabled, it inverts
the greediness of quantifiers (greedy ones become lazy and vice versa).
\section3 Different pattern syntax
Porting a regular expression from \c QRegExp to \l QRegularExpression may
require changes to the pattern itself. Therefore it is recommended to check
the pattern used with the \l {QRegularExpression::isValid} method. This is
especially important for user provided pattern or pattern not controlled by
the developer.
In other cases, a pattern ported from \c QRegExp to \l QRegularExpression may
silently change semantics. Therefore, it is necessary to review the patterns
used. The most notable cases of silent incompatibility are:
\list
\li Curly braces are needed in order to use a hexadecimal escape like \c
{\xHHHH} with more than 2 digits. A pattern like \c {\x2022} needs
to be ported to \c {\x{2022}}, or it will match a space \c {(0x20)}
followed by the string \c {"22"}. In general, it is highly recommended
to always use curly braces with the \c {\x} escape, no matter the
amount of digits specified.
\li A \c{0-to-n} quantification like \c {{,n}} needs to be ported to
\c {{0,n}} to preserve semantics. Otherwise, a pattern such as
\c {\d{,3}} would actually match a digit followed by the exact
string \c {"{,3}"}.
\endlist
\section3 Partial Matching
When using \c QRegExp::exactMatch(), if an exact match was not found, one
could still find out how much of the subject string was matched by the
regular expression by calling \c QRegExp::matchedLength(). If the returned
length was equal to the subject string's length, then one could conclude
that a partial match was found.
\l QRegularExpression supports partial matching explicitly by means of the
appropriate \l {QRegularExpression::MatchType}.
\section3 Global matching
Due to limitations of the \c QRegExp API it was impossible to implement
global matching correctly (that is, like Perl does). In particular, patterns
that can match zero characters (like "a*") are problematic. \l
{QRegularExpression::wildcardToRegularExpression} implements Perl global
match correctly, and the returned iterator can be used to examine each
result.
\section3 Unicode properties support
When using \c QRegExp, character classes such as \c{\w}, \c{\d}, etc. match
characters with the corresponding Unicode property: for instance, \c{\d}
matches any character with the Unicode Nd (decimal digit) property. Those
character classes only match ASCII characters by default. When using \l
QRegularExpression: for instance, \c{\d} matches exactly a character in the
0-9 ASCII range. It is possible to change this behavior by using the \l
{QRegularExpression::UseUnicodePropertiesOption}
pattern option.
\include corelib/port-from-qregexp.qdocinc porting-to-qregularexpression
\section2 The QRegExp class