Handle trailing cruft consistently in Qt::DateFormat parsing

Previously the ISO time format would tolerate trailing cruft at the
end in various cases even though there might be an offset specifier
after the time, which should *not* be separated from it by anything
(not even the spaces we originally planned to still tolerate).

The RFC date format is forgiving about space, as is suitable for
parsing of RFC-822 headers, but the other formats should match the
handling in QDateTimeParser, which rejects any dangling cruft.

At the same time, since this required a re-write of
fromIsoTimeString() in any case, add support for the ISO format that
gives the hour a fractional part and skips minutes and
seconds. Previously we only had support for fractional minutes (with
no seconds). The hour without even a fractional part is also valid.

Reworked the documentation of Qt::DateFormat as it was wrong in
places, inconsistent in its formatting and incomplete. Adjusted some
tests to match the new behavior. A fraction separator with no
following digits should have been recognized as an error previously
and now is.

[ChangeLog][QtCore][QDateTime] The ISODate and ISODateWithMs formats
now reject trailing cruft (including spaces) at the end of a time
string. They also gain support for parsing hour-only formats,
including the hour-with-fractional-part format.

Task-number: QTBUG-86133
Change-Id: I38ad1479ae033407f7df97ffbeb7c4bcd463d04a
Reviewed-by: Andrei Golubev <andrei.golubev@qt.io>
Reviewed-by: Paul Wicking <paul.wicking@qt.io>
Reviewed-by: Thiago Macieira <thiago.macieira@intel.com>
This commit is contained in:
Edward Welbourne 2020-09-18 17:26:37 +02:00
parent c3cd760303
commit 4e675cb85e
3 changed files with 126 additions and 101 deletions

View File

@ -664,40 +664,66 @@
/*!
\enum Qt::DateFormat
\value TextDate The default Qt format, which includes the day and month name,
the day number in the month, and the year in full. The day and month names will
be short names in English. This is basically equivalent to using the date format
string, "ddd MMM d yyyy". See QDate::toString() for more information.
\value TextDate The default Qt format, which includes the day and month
name, the day number in the month, and the year in full. The day and month
names will be short names in English (C locale). This effectively uses, for
a date, format \c{ddd MMM d yyyy}, for a time \c{HH:mm:ss} and combines
these as \c{ddd MMM d HH:mm:ss yyyy} for a date-time, with an optional
suffix indicating time-zone or offset from UTC, where relevant. A fractional
part is also recognized on the seconds of a time part, as \c{HH:mm:ss.zzz},
when reading from a string.
\value ISODate \l{ISO 8601} extended format: either \c{yyyy-MM-dd} for dates or
\c{yyyy-MM-ddTHH:mm:ss} (e.g. 2017-07-24T15:46:29), or with a time-zone
suffix (Z for UTC otherwise an offset as [+|-]HH:mm) where appropriate
for combined dates and times.
\value ISODateWithMs \l{ISO 8601} extended format: uses \c{yyyy-MM-dd} for
dates, \c{HH:mm:ss.zzz} for times or \c{yyyy-MM-ddTHH:mm:ss.zzz}
(e.g. 2017-07-24T15:46:29.739) for combined dates and times, optionally with
a time-zone suffix (Z for UTC otherwise an offset as ±HH:mm) where
appropriate. When parsed, a single space, \c{' '}, may be used in place of
the \c{'T'} separator between date and time; no other spacing characters are
permitted. This format also accepts \c{HH:mm} and plain \c{HH} formats for
the time part, either of which may include a fractional part, \c{HH:mm.zzz}
or \c{HH.zzz}, applied to the last field present (hour or minute).
\value ISODateWithMs \l{ISO 8601} extended format, including milliseconds if applicable.
\value ISODate \l{ISO 8601} extended format, as for \c ISODateWithMs, but
omitting the milliseconds (\c{.zzz}) part when converting to a string. There
is no difference when reading from a string: if a fractional part is present
on the last time field, either format will accept it.
\value RFC2822Date \l{RFC 2822}, \l{RFC 850} and \l{RFC 1036} format:
either \c{[ddd,] dd MMM yyyy [HH:mm[:ss]][ ±tzoff]}
or \c{ddd MMM dd[ HH:mm:ss] yyyy[ ±tzoff]} are recognized for combined dates
and times, where \c{tzoff} is a timezone offset in \c{HHmm} format. For
dates and times separately, the same formats are matched and the unwanted
parts are ignored. In particular, note that a time is not recognized without
an accompanying date. When converting dates to string form,
format \c{dd MMM yyyy} is used, for times the format is \c{HH:mm:ss}. For
combined date and time, these are combined
\value RFC2822Date \l{RFC 2822}, \l{RFC 850} and \l{RFC 1036} format: when
converting dates to string form, format \c{dd MMM yyyy} is used, for times
the format is \c{HH:mm:ss}. For combined date and time, these are combined
as \c{dd MMM yyyy HH:mm:ss ±tzoff} (omitting the optional leading day of the
week from the first format recognized).
week from the first format recognized). When reading from a string either
\c{[ddd,] dd MMM yyyy [HH:mm[:ss]][ ±tzoff]} or \c{ddd MMM dd[ HH:mm:ss]
yyyy[ ±tzoff]} will be recognized for combined dates and times, where
\c{tzoff} is a timezone offset in \c{HHmm} format. Arbitrary spacing may
appear before or after the text and any non-empty spacing may replace the
spaces in this format. For dates and times separately, the same formats are
matched and the unwanted parts are ignored. In particular, note that a time
is not recognized without an accompanying date.
\note For \c ISODate formats, each \c y, \c M and \c d represents a single
digit of the year, month, and day used to specify the date. Each \c H, \c m,
and \c s represents a single digit of the hour (up to 24), minute and second
used to specify the time. The presence of a literal \c T character is used
to separate the date and time when both are specified. For the \c
RFC2822Date format, MMM stands for the first three letters of the month name
in English, the other format characters have the same meaning as for the
ISODate format.
*/
used to specify the time. A \c{.zzz} stands for a fractional part suffix on
the preceding field, which may be separated from that field either by a
comma \c{','} or the dot \c{'.'} shown. Precision beyond milliseconds is
accepted but discarded, rounding to the nearest millisecond or, when
rounding fractional seconds up would change the second field, rounded
down. The presence of a literal \c T character is used to separate the date
and time when both are specified. For the \c TextDate and \c RFC2822Date
formats, \c{ddd} stands for the first three letters of the name of the day
of the week and \c{MMM} stands for the first three letters of the month
name. The names of days and months are always in English (C locale)
regardless of user preferences or system settings. The other format
characters have the same meaning as for the ISODate format. Parts of the
format enclosed in square brackets \c{[...]} are optional; the square
brackets do not form part of the format. The plus-or-minus character \c{'±'}
here stands for either sign character, \c{'-'} for minus or \c{'+'} for
plus.
\sa QDate::toString(), QTime::toString(), QDateTime::toString(),
QDate::fromString(), QTime::fromString(), QDateTime::fromString()
*/
/*!
\enum Qt::TimeSpec

View File

@ -55,6 +55,7 @@
#include "private/qcore_mac_p.h"
#endif
#include "private/qgregoriancalendar_p.h"
#include "private/qstringiterator_p.h"
#if QT_CONFIG(timezone)
#include "private/qtimezoneprivate_p.h"
#endif
@ -1019,7 +1020,8 @@ static QString toStringTextDate(QDate date)
const QLatin1Char sp(' ');
return QLocale::c().dayName(cal.dayOfWeek(date), QLocale::ShortFormat) + sp
+ cal.monthName(QLocale::c(), parts.month, parts.year, QLocale::ShortFormat)
+ sp + QString::number(parts.day) + sp + QString::number(parts.year);
// Documented to use 4-digit year
+ sp + QString::asprintf("%d %04d", parts.day, parts.year);
}
}
return QString();
@ -1428,22 +1430,23 @@ qint64 QDate::daysTo(QDate d) const
#if QT_CONFIG(datestring) // depends on, so implies, textdate
namespace {
struct ParsedInt { int value = 0; bool ok = false; };
struct ParsedInt { qulonglong value = 0; bool ok = false; };
/*
/internal
Read an int that must be the whole text. QStringView ::toInt() will ignore
spaces happily; but ISO date format should not.
Read a whole number that must be the whole text. QStringView::toULongLong()
will happily ignore spaces and accept signs; but various date formats'
fields (e.g. all in ISO) should not.
*/
ParsedInt readInt(QStringView text)
{
ParsedInt result;
for (const auto &ch : text) {
if (ch.isSpace())
for (QStringIterator it(text); it.hasNext();) {
if (!QChar::isDigit(it.next()))
return result;
}
result.value = QLocale::c().toInt(text, &result.ok);
result.value = text.toULongLong(&result.ok);
return result;
}
@ -2097,86 +2100,83 @@ static QTime fromIsoTimeString(QStringView string, Qt::DateFormat format, bool *
{
if (isMidnight24)
*isMidnight24 = false;
// Match /\d\d(:\d\d(:\d\d)?)?([,.]\d+)?/ as "HH[:mm[:ss]][.zzz]"
// The fractional part, if present, is in the same units as the field it follows.
// TextDate restricts fractional parts to the seconds field.
QStringView tail;
const int dot = string.indexOf(u'.'), comma = string.indexOf(u',');
if (dot != -1) {
tail = string.sliced(dot + 1);
if (tail.indexOf(u'.') != -1) // Forbid second dot:
return QTime();
string = string.first(dot);
} else if (comma != -1) {
tail = string.sliced(comma + 1);
string = string.first(comma);
}
if (tail.indexOf(u',') != -1) // Forbid comma after first dot-or-comma:
return QTime();
const ParsedInt frac = readInt(tail);
// There must be *some* digits in a fractional part; and it must be all digits:
if (tail.isEmpty() ? dot != -1 || comma != -1 : !frac.ok)
return QTime();
Q_ASSERT(frac.ok ^ tail.isEmpty());
double fraction = frac.ok ? frac.value * std::pow(0.1, tail.size()) : 0.0;
const int size = string.size();
if (size < 5 || string.at(2) != QLatin1Char(':'))
if (size < 2 || size > 8)
return QTime();
ParsedInt hour = readInt(string.mid(0, 2));
ParsedInt minute = readInt(string.mid(3, 2));
if (!hour.ok || !minute.ok)
ParsedInt hour = readInt(string.first(2));
if (!hour.ok)
return QTime();
// FIXME: ISO 8601 allows [,.]\d+ after hour, just as it does after minute
int second = 0;
int msec = 0;
if (size == 5) {
// HH:mm format
second = 0;
msec = 0;
} else if (string.at(5) == QLatin1Char(',') || string.at(5) == QLatin1Char('.')) {
if (format == Qt::TextDate)
ParsedInt minute;
if (string.size() > 2) {
if (string[2] == u':' && string.size() > 4)
minute = readInt(string.sliced(3, 2));
if (!minute.ok)
return QTime();
// ISODate HH:mm.ssssss format
// We only want 5 digits worth of fraction of minute. This follows the existing
// behavior that determines how milliseconds are read; 4 millisecond digits are
// read and then rounded to 3. If we read at most 5 digits for fraction of minute,
// the maximum amount of millisecond digits it will expand to once converted to
// seconds is 4. E.g. 12:34,99999 will expand to 12:34:59.9994. The milliseconds
// will then be rounded up AND clamped to 999.
const QStringView minuteFractionStr = string.mid(6, qMin(qsizetype(5), string.size() - 6));
const ParsedInt parsed = readInt(minuteFractionStr);
if (!parsed.ok)
return QTime();
const float secondWithMs
= double(parsed.value) * 60 / (std::pow(double(10), minuteFractionStr.size()));
second = std::floor(secondWithMs);
const float secondFraction = secondWithMs - second;
msec = qMin(qRound(secondFraction * 1000.0), 999);
} else if (string.at(5) == QLatin1Char(':')) {
// HH:mm:ss or HH:mm:ss.zzz
const ParsedInt parsed = readInt(string.mid(6, qMin(qsizetype(2), string.size() - 6)));
if (!parsed.ok)
return QTime();
second = parsed.value;
if (size <= 8) {
// No fractional part to read
} else if (string.at(8) == QLatin1Char(',') || string.at(8) == QLatin1Char('.')) {
QStringView msecStr(string.mid(9, qMin(qsizetype(4), string.size() - 9)));
bool ok = true;
// Can't use readInt() here, as we *do* allow trailing space - but not leading:
if (!msecStr.isEmpty() && !msecStr.at(0).isDigit())
return QTime();
msecStr = msecStr.trimmed();
int msecInt = msecStr.isEmpty() ? 0 : QLocale::c().toInt(msecStr, &ok);
if (!ok)
return QTime();
const double secondFraction(msecInt / (std::pow(double(10), msecStr.size())));
msec = qMin(qRound(secondFraction * 1000.0), 999);
} else {
#if QT_VERSION >= QT_VERSION_CHECK(6,0,0) // behavior change
// Stray cruft after date-time: tolerate trailing space, but nothing else.
for (const auto &ch : string.mid(8)) {
if (!ch.isSpace())
return QTime();
}
#endif
}
} else {
} else if (format == Qt::TextDate) { // Requires minutes
return QTime();
} else if (frac.ok) {
Q_ASSERT(!(fraction < 0.0) && fraction < 1.0);
fraction *= 60;
minute.value = qulonglong(fraction);
fraction -= minute.value;
}
const bool isISODate = format == Qt::ISODate || format == Qt::ISODateWithMs;
if (isISODate && hour.value == 24 && minute.value == 0 && second == 0 && msec == 0) {
ParsedInt second;
if (string.size() > 5) {
if (string[5] == u':' && string.size() == 8)
second = readInt(string.sliced(6, 2));
if (!second.ok)
return QTime();
} else if (frac.ok) {
if (format == Qt::TextDate) // Doesn't allow fraction of minutes
return QTime();
Q_ASSERT(!(fraction < 0.0) && fraction < 1.0);
fraction *= 60;
second.value = qulonglong(fraction);
fraction -= second.value;
}
Q_ASSERT(!(fraction < 0.0) && fraction < 1.0);
// Round millis to nearest (unlike minutes and seconds, rounded down),
// but clip to 999 (historical behavior):
const int msec = frac.ok ? qMin(qRound(1000 * fraction), 999) : 0;
// For ISO date format, 24:0:0 means 0:0:0 on the next day:
if ((format == Qt::ISODate || format == Qt::ISODateWithMs)
&& hour.value == 24 && minute.value == 0 && second.value == 0 && msec == 0) {
if (isMidnight24)
*isMidnight24 = true;
hour.value = 0;
}
return QTime(hour.value, minute.value, second, msec);
return QTime(hour.value, minute.value, second.value, msec);
}
/*!

View File

@ -2211,8 +2211,7 @@ void tst_QDateTime::fromStringDateFormat_data()
// Test Qt::ISODate format.
QTest::newRow("trailing space") // QTBUG-80445
<< QString("2000-01-02 03:04:05.678 ")
<< Qt::ISODate << QDateTime(QDate(2000, 1, 2), QTime(3, 4, 5, 678));
<< QString("2000-01-02 03:04:05.678 ") << Qt::ISODate << QDateTime();
// Invalid spaces (but keeping field widths correct):
QTest::newRow("space before millis")
@ -2327,8 +2326,8 @@ void tst_QDateTime::fromStringDateFormat_data()
<< Qt::ISODate << QDateTime(QDate(2012, 1, 1), QTime(8, 0, 0, 333), Qt::LocalTime);
QTest::newRow("ISO .00009 of a second (period)") << QString::fromLatin1("2012-01-01T08:00:00.00009")
<< Qt::ISODate << QDateTime(QDate(2012, 1, 1), QTime(8, 0, 0, 0), Qt::LocalTime);
QTest::newRow("ISO no fract specified") << QString::fromLatin1("2012-01-01T08:00:00.")
<< Qt::ISODate << QDateTime(QDate(2012, 1, 1), QTime(8, 0, 0, 0), Qt::LocalTime);
QTest::newRow("ISO no fraction specified")
<< QString::fromLatin1("2012-01-01T08:00:00.") << Qt::ISODate << QDateTime();
// Test invalid characters (should ignore invalid characters at end of string).
QTest::newRow("ISO invalid character at end") << QString::fromLatin1("2012-01-01T08:00:00!")
<< Qt::ISODate << QDateTime();