Incorporate the sub-class-of info when deriving mimetypes

The tika mimetype database contains multiple rules that would fit to a
valid file (e.g. svg file that is svg and xml) with the same priority
(50 in case of svg and xml). The choice is thus ambiguous which leads to
regressions in the reported mimetype.

In order to break ambiguity, we look now also at the sub-class-of
element and we prefer sub-classes as they are more narrow and detailed
than the super-class.
The recommended checking order of freedesktop.org suggests that this is
the correct thing to do: "If any of the mimetypes resulting from a glob
match is equal to or a subclass of the result from the magic sniffing,
use this as the result." However, this does not fit perfectly to the
case of the bug report because both results come from magic sniffing.

If two rules match and have the same priority, without one being a
sub-class of the other, there is still an ambiguity. In that case we
now print a warning about the ambiguity.

The patch adds a test for the previously ambiguous case. There is no
test for the warning on ambiguity, because such a test file would be
difficult to generate and is probably not worth the effort.

Fixes: QTBUG-133221
Pick-to: 6.9 6.8
Change-Id: I1817ec4da947cd91729d0ce35defc9f63cd784d9
Reviewed-by: Mate Barany <mate.barany@qt.io>
(cherry picked from commit 25df8042a4ac23e32037dfc1c20c599992febd66)
Reviewed-by: Qt Cherry-pick Bot <cherrypick_bot@qt-project.org>
This commit is contained in:
Matthias Rauter 2025-06-03 13:57:35 +02:00 committed by Qt Cherry-pick Bot
parent c9d597f5db
commit 3e0bcaeea2
2 changed files with 20 additions and 3 deletions

View File

@ -718,10 +718,24 @@ void QMimeXMLProvider::findByMagic(const QByteArray &data, QMimeMagicResult &res
for (const QMimeMagicRuleMatcher &matcher : std::as_const(m_magicMatchers)) {
if (matcher.matches(data)) {
const int priority = matcher.priority();
if (priority > result.accuracy) {
result.accuracy = priority;
result.candidate = matcher.mimetype();
if (priority < result.accuracy)
continue;
if (priority == result.accuracy) {
if (m_db->inherits(result.candidate, matcher.mimetype()))
continue;
if (!m_db->inherits(matcher.mimetype(), result.candidate)) {
// Two or more magic rules matching, both with the same priority but not
// connected with one another should not happen:
qWarning("QMimeXMLProvider: MimeType is ambiguous between %ls and %ls",
qUtf16Printable(result.candidate),
qUtf16Printable(matcher.mimetype()));
continue;
}
}
result.accuracy = priority;
result.candidate = matcher.mimetype();
}
}
}

View File

@ -646,6 +646,9 @@ void tst_QMimeDatabase::mimeTypeForData_data()
else
QTest::newRow("diff_space") << QByteArray("diff ") << "text/x-diff";
QTest::newRow("unknown") << QByteArray("\001abc?}") << "application/octet-stream";
QTest::newRow("ambigous svg/xml") << QByteArray(R"(<?xml version="1.0"?>
<svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
</svg>)") << "image/svg+xml";
}
void tst_QMimeDatabase::mimeTypeForData()