rhi: d3d11: Make the "pipeline" cache save/load functional

There are no pipelines here of course. That's only for Vulkan.
But the QRhi APIs provide a common interface for retrieving the
serialized blob and pre-seeding the cache with a blob. The OpenGL
backend already implements that interface via GL program binaries.

We can do something similar with D3D, but it's a lot simpler: we
just need to include the bytecode from HLSL->DXBC compilation (i.e.
the result of D3DCompile() calls) and pick up the already present
bytecode and skip the D3DCompile() call when applicable.

Thus the mechanism is now available for Vulkan, OpenGL, and D3D11
as well.

Has no effect whatsoever if EnablePipelineCacheLoadSave is not set at
QRhi create() time.

Also update the related docs.

Task-number: QTBUG-103802
Change-Id: I91f1fb1f471bc7c654e26886a37c283066e842a8
Reviewed-by: Andy Nichols <andy.nichols@qt.io>
This commit is contained in:
Laszlo Agocs 2022-05-24 20:49:23 +02:00
parent 7908b0cea6
commit 663b375373
3 changed files with 272 additions and 34 deletions

View File

@ -433,16 +433,24 @@ Q_LOGGING_CATEGORY(QRHI_LOG_INFO, "qt.rhi.general")
\value EnablePipelineCacheDataSave Enables retrieving the pipeline cache
contents, where applicable. When not set, pipelineCacheData() will return
an empty blob always. Opting in is relevant in particular with backends
where additional, potentially time consuming work is needed to maintain the
data structures with the serialized, binary versions of shader programs. An
example is OpenGL, where the "pipeline cache" is simulated by retrieving
and loading shader program binaries. With backends where retrieving and
restoring the pipeline cache contents is not supported, the flag has no
effect. With some backends (such as, OpenGL) there are additional,
disk-based caching mechanisms for shader binaries. Writing to those may get
disabled whenever this flag is set since storing program binaries (OpenGL)
to multiple caches is not sensible.
an empty blob always. With backends where retrieving and restoring the
pipeline cache contents is not supported, the flag has no effect and the
serialized cache data is always empty. The flag provides an opt-in
mechanism because the cost of maintaining the related data structures is
not insignificant with some backends. With Vulkan this feature maps
directly to VkPipelineCache, vkGetPipelineCacheData and
VkPipelineCacheCreateInfo::pInitialData. With D3D11 there is no real
pipline cache, but the results of HLSL->DXBC compilations are stored and
can be serialized/deserialized via this mechanism. This allows skipping the
time consuming D3DCompile() in future runs of the applications for shaders
that come with HLSL source instead of offline pre-compiled bytecode. This
can provide a huge boost in startup and load times, if there is a lot of
HLSL source compilation happening. With OpenGL the "pipeline cache" is
simulated by retrieving and loading shader program binaries (if supported
by the driver). With OpenGL there are additional, disk-based caching
mechanisms for shader/program binaries provided by Qt. Writing to those may
get disabled whenever this flag is set since storing program binaries to
multiple caches is not sensible.
*/
/*!
@ -6860,7 +6868,9 @@ bool QRhi::isDeviceLost() const
By saving and then, in subsequent runs of the same application, reloading
the cache data, pipeline and shader creation times can potentially be
accelerated.
reduced. What exactly the cache and its serialized version includes is not
specified, is always specific to the backend used, and in some cases also
dependent on the particular implementation of the graphics API.
When the PipelineCacheDataLoadSave is reported as unsupported, the returned
QByteArray is empty.
@ -6869,15 +6879,20 @@ bool QRhi::isDeviceLost() const
create(), the returned QByteArray may be empty, even when the
PipelineCacheDataLoadSave feature is supported.
When the returned data is non-empty, it is always specific to the QRhi
backend, the graphics device, and the driver implementation in use. QRhi
When the returned data is non-empty, it is always specific to the Qt
version and QRhi backend. In addition, in some cases there is a strong
dependency to the graphics device and the exact driver version used. QRhi
takes care of adding the appropriate header and safeguards that ensure that
the data can always be passed safely to setPipelineCacheData().
the data can always be passed safely to setPipelineCacheData(), therefore
attempting to load data from a run on another version of a driver will be
handled safely and gracefully.
\note Calling releaseCachedResources() may, depending on the backend, clear
the pipeline data collected. A subsequent call to this function may then
not return any data.
See EnablePipelineCacheDataSave for further details about this feature.
\sa setPipelineCacheData(), create(), isFeatureSupported()
*/
QByteArray QRhi::pipelineCacheData()
@ -6891,13 +6906,14 @@ QByteArray QRhi::pipelineCacheData()
When the PipelineCacheDataLoadSave is reported as unsupported, the function
is safe to call, but has no effect.
The blob returned by pipelineCacheData() is always specific to a QRhi
backend, a graphics device, and a given version of the graphics driver.
QRhi takes care of adding the appropriate header and safeguards that ensure
that the data can always be passed safely to this function. If there is a
mismatch, e.g. because the driver has been upgraded to a newer version, or
because the data was generated from a different QRhi backend, a warning is
printed and \a data is safely ignored.
The blob returned by pipelineCacheData() is always specific to the Qt
version, the QRhi backend, and, in some cases, also to the graphics device,
and a given version of the graphics driver. QRhi takes care of adding the
appropriate header and safeguards that ensure that the data can always be
passed safely to this function. If there is a mismatch, e.g. because the
driver has been upgraded to a newer version, or because the data was
generated from a different QRhi backend, a warning is printed and \a data
is safely ignored.
With Vulkan, this maps directly to VkPipelineCache. Calling this function
creates a new Vulkan pipeline cache object, with its initial data sourced
@ -6905,11 +6921,27 @@ QByteArray QRhi::pipelineCacheData()
created QRhiGraphicsPipeline and QRhiComputePipeline objects, thus
accelerating, potentially, the pipeline creation.
With other APIs there is no real pipeline cache, but they may provide a
cache with bytecode from shader compilations (D3D) or program binaries
(OpenGL). In applications that perform a lot of shader compilation from
source at run time this can provide a significant boost in subsequent runs
if the "pipeline cache" is pre-seeded from an earlier run using this
function.
\note QRhi cannot give any guarantees that \a data has an effect on the
pipeline and shader creation performance. With APIs like Vulkan, it is up
to the driver to decide if \a data is used for some purpose, or if it is
ignored.
See EnablePipelineCacheDataSave for further details about this feature.
\note This mechanism offered by QRhi is independent of the drivers' own
internal caching mechanism, if any. This means that, depending on the
graphics API and its implementation, the exact effects of retrieving and
then reloading \a data are not predictable. Improved performance may not be
visible at all in case other caching mechanisms outside of Qt's control are
already active.
\sa pipelineCacheData(), isFeatureSupported()
*/
void QRhi::setPipelineCacheData(const QByteArray &data)

View File

@ -161,7 +161,7 @@ static IDXGIFactory1 *createDXGIFactory1()
bool QRhiD3D11::create(QRhi::Flags flags)
{
Q_UNUSED(flags);
rhiFlags = flags;
uint devFlags = 0;
if (debugLayer)
@ -538,7 +538,7 @@ bool QRhiD3D11::isFeatureSupported(QRhi::Feature feature) const
case QRhi::ReadBackAnyTextureFormat:
return true;
case QRhi::PipelineCacheDataLoadSave:
return false;
return true;
case QRhi::ImageDataStride:
return true;
case QRhi::RenderBufferImport:
@ -628,6 +628,7 @@ bool QRhiD3D11::makeThreadLocalNativeContextCurrent()
void QRhiD3D11::releaseCachedResources()
{
clearShaderCache();
m_bytecodeCache.clear();
}
bool QRhiD3D11::isDeviceLost() const
@ -635,14 +636,159 @@ bool QRhiD3D11::isDeviceLost() const
return deviceLost;
}
struct QD3D11PipelineCacheDataHeader
{
quint32 rhiId;
quint32 arch;
// no need for driver specifics
quint32 count;
quint32 dataSize;
};
QByteArray QRhiD3D11::pipelineCacheData()
{
return QByteArray();
QByteArray data;
if (m_bytecodeCache.isEmpty())
return data;
QD3D11PipelineCacheDataHeader header;
memset(&header, 0, sizeof(header));
header.rhiId = pipelineCacheRhiId();
header.arch = quint32(sizeof(void*));
header.count = m_bytecodeCache.count();
const size_t dataOffset = sizeof(header);
size_t dataSize = 0;
for (auto it = m_bytecodeCache.cbegin(), end = m_bytecodeCache.cend(); it != end; ++it) {
BytecodeCacheKey key = it.key();
QByteArray bytecode = it.value();
dataSize +=
sizeof(quint32) + key.sourceHash.size()
+ sizeof(quint32) + key.target.size()
+ sizeof(quint32) + key.entryPoint.size()
+ sizeof(quint32) // compileFlags
+ sizeof(quint32) + bytecode.size();
}
QByteArray buf(dataOffset + dataSize, Qt::Uninitialized);
char *p = buf.data() + dataOffset;
for (auto it = m_bytecodeCache.cbegin(), end = m_bytecodeCache.cend(); it != end; ++it) {
BytecodeCacheKey key = it.key();
QByteArray bytecode = it.value();
quint32 i = key.sourceHash.size();
memcpy(p, &i, 4);
p += 4;
memcpy(p, key.sourceHash.constData(), key.sourceHash.size());
p += key.sourceHash.size();
i = key.target.size();
memcpy(p, &i, 4);
p += 4;
memcpy(p, key.target.constData(), key.target.size());
p += key.target.size();
i = key.entryPoint.size();
memcpy(p, &i, 4);
p += 4;
memcpy(p, key.entryPoint.constData(), key.entryPoint.size());
p += key.entryPoint.size();
quint32 f = key.compileFlags;
memcpy(p, &f, 4);
p += 4;
i = bytecode.size();
memcpy(p, &i, 4);
p += 4;
memcpy(p, bytecode.constData(), bytecode.size());
p += bytecode.size();
}
Q_ASSERT(p == buf.data() + dataOffset + dataSize);
header.dataSize = quint32(dataSize);
memcpy(buf.data(), &header, sizeof(header));
return buf;
}
void QRhiD3D11::setPipelineCacheData(const QByteArray &data)
{
Q_UNUSED(data);
if (data.isEmpty())
return;
const size_t headerSize = sizeof(QD3D11PipelineCacheDataHeader);
if (data.size() < qsizetype(headerSize)) {
qWarning("setPipelineCacheData: Invalid blob size (header incomplete)");
return;
}
const size_t dataOffset = headerSize;
QD3D11PipelineCacheDataHeader header;
memcpy(&header, data.constData(), headerSize);
const quint32 rhiId = pipelineCacheRhiId();
if (header.rhiId != rhiId) {
qWarning("setPipelineCacheData: The data is for a different QRhi version or backend (%u, %u)",
rhiId, header.rhiId);
return;
}
const quint32 arch = quint32(sizeof(void*));
if (header.arch != arch) {
qWarning("setPipelineCacheData: Architecture does not match (%u, %u)",
arch, header.arch);
return;
}
if (header.count == 0)
return;
if (data.size() < qsizetype(dataOffset + header.dataSize)) {
qWarning("setPipelineCacheData: Invalid blob size (data incomplete)");
return;
}
m_bytecodeCache.clear();
const char *p = data.constData() + dataOffset;
for (quint32 i = 0; i < header.count; ++i) {
quint32 len = 0;
memcpy(&len, p, 4);
p += 4;
QByteArray sourceHash(len, Qt::Uninitialized);
memcpy(sourceHash.data(), p, len);
p += len;
memcpy(&len, p, 4);
p += 4;
QByteArray target(len, Qt::Uninitialized);
memcpy(target.data(), p, len);
p += len;
memcpy(&len, p, 4);
p += 4;
QByteArray entryPoint(len, Qt::Uninitialized);
memcpy(entryPoint.data(), p, len);
p += len;
quint32 flags;
memcpy(&flags, p, 4);
p += 4;
memcpy(&len, p, 4);
p += 4;
QByteArray bytecode(len, Qt::Uninitialized);
memcpy(bytecode.data(), p, len);
p += len;
BytecodeCacheKey cacheKey;
cacheKey.sourceHash = sourceHash;
cacheKey.target = target;
cacheKey.entryPoint = entryPoint;
cacheKey.compileFlags = flags;
m_bytecodeCache.insert(cacheKey, bytecode);
}
qCDebug(QRHI_LOG_INFO, "Seeded bytecode cache with %d shaders", int(m_bytecodeCache.count()));
}
QRhiRenderBuffer *QRhiD3D11::createRenderBuffer(QRhiRenderBuffer::Type type, const QSize &pixelSize,
@ -4002,8 +4148,16 @@ static pD3DCompile resolveD3DCompile()
return nullptr;
}
static QByteArray compileHlslShaderSource(const QShader &shader, QShader::Variant shaderVariant, UINT flags,
QString *error, QShaderKey *usedShaderKey)
static inline QByteArray sourceHash(const QByteArray &source)
{
// taken from the GL backend, use the same mechanism to get a key
QCryptographicHash keyBuilder(QCryptographicHash::Sha1);
keyBuilder.addData(source);
return keyBuilder.result().toHex();
}
QByteArray QRhiD3D11::compileHlslShaderSource(const QShader &shader, QShader::Variant shaderVariant, uint flags,
QString *error, QShaderKey *usedShaderKey)
{
QShaderKey key = { QShader::DxbcShader, 50, shaderVariant };
QShaderCode dxbc = shader.shader(key);
@ -4020,6 +4174,9 @@ static QByteArray compileHlslShaderSource(const QShader &shader, QShader::Varian
return QByteArray();
}
if (usedShaderKey)
*usedShaderKey = key;
const char *target;
switch (shader.stage()) {
case QShader::VertexStage:
@ -4045,6 +4202,17 @@ static QByteArray compileHlslShaderSource(const QShader &shader, QShader::Varian
return QByteArray();
}
BytecodeCacheKey cacheKey;
if (rhiFlags.testFlag(QRhi::EnablePipelineCacheDataSave)) {
cacheKey.sourceHash = sourceHash(hlslSource.shader());
cacheKey.target = target;
cacheKey.entryPoint = hlslSource.entryPoint();
cacheKey.compileFlags = flags;
auto cacheIt = m_bytecodeCache.constFind(cacheKey);
if (cacheIt != m_bytecodeCache.constEnd())
return cacheIt.value();
}
static const pD3DCompile d3dCompile = resolveD3DCompile();
if (d3dCompile == nullptr) {
qWarning("Unable to resolve function D3DCompile()");
@ -4066,13 +4234,14 @@ static QByteArray compileHlslShaderSource(const QShader &shader, QShader::Varian
return QByteArray();
}
if (usedShaderKey)
*usedShaderKey = key;
QByteArray result;
result.resize(int(bytecode->GetBufferSize()));
memcpy(result.data(), bytecode->GetBufferPointer(), size_t(result.size()));
bytecode->Release();
if (rhiFlags.testFlag(QRhi::EnablePipelineCacheDataSave))
m_bytecodeCache.insert(cacheKey, result);
return result;
}
@ -4180,8 +4349,8 @@ bool QD3D11GraphicsPipeline::create()
if (m_flags.testFlag(CompileShadersWithDebugInfo))
compileFlags |= D3DCOMPILE_DEBUG;
const QByteArray bytecode = compileHlslShaderSource(shaderStage.shader(), shaderStage.shaderVariant(), compileFlags,
&error, &shaderKey);
const QByteArray bytecode = rhiD->compileHlslShaderSource(shaderStage.shader(), shaderStage.shaderVariant(), compileFlags,
&error, &shaderKey);
if (bytecode.isEmpty()) {
qWarning("HLSL shader compilation failed: %s", qPrintable(error));
return false;
@ -4315,8 +4484,8 @@ bool QD3D11ComputePipeline::create()
if (m_flags.testFlag(CompileShadersWithDebugInfo))
compileFlags |= D3DCOMPILE_DEBUG;
const QByteArray bytecode = compileHlslShaderSource(m_shaderStage.shader(), m_shaderStage.shaderVariant(), compileFlags,
&error, &shaderKey);
const QByteArray bytecode = rhiD->compileHlslShaderSource(m_shaderStage.shader(), m_shaderStage.shaderVariant(), compileFlags,
&error, &shaderKey);
if (bytecode.isEmpty()) {
qWarning("HLSL compute shader compilation failed: %s", qPrintable(error));
return false;

View File

@ -679,7 +679,10 @@ public:
void finishActiveReadbacks();
void reportLiveObjects(ID3D11Device *device);
void clearShaderCache();
QByteArray compileHlslShaderSource(const QShader &shader, QShader::Variant shaderVariant, uint flags,
QString *error, QShaderKey *usedShaderKey);
QRhi::Flags rhiFlags;
bool debugLayer = false;
bool importedDeviceAndContext = false;
ID3D11Device *dev = nullptr;
@ -751,11 +754,45 @@ public:
void releaseResources();
void activate();
} deviceCurse;
// This is what gets exposed as the "pipeline cache", not that that concept
// applies anyway. Here we are just storing the DX bytecode for a shader so
// we can skip the HLSL->DXBC compilation when the QShader has HLSL source
// code and the same shader source has already been compiled before.
// m_shaderCache seemingly does the same, but this here does not care about
// the ID3D11*Shader, this is just about the bytecode and about allowing
// the data to be serialized to persistent storage and then reloaded in
// future runs of the app, or when creating another QRhi, etc.
struct BytecodeCacheKey {
QByteArray sourceHash;
QByteArray target;
QByteArray entryPoint;
uint compileFlags;
};
QHash<BytecodeCacheKey, QByteArray> m_bytecodeCache;
};
Q_DECLARE_TYPEINFO(QRhiD3D11::TextureReadback, Q_RELOCATABLE_TYPE);
Q_DECLARE_TYPEINFO(QRhiD3D11::BufferReadback, Q_RELOCATABLE_TYPE);
inline bool operator==(const QRhiD3D11::BytecodeCacheKey &a, const QRhiD3D11::BytecodeCacheKey &b) noexcept
{
return a.sourceHash == b.sourceHash
&& a.target == b.target
&& a.entryPoint == b.entryPoint
&& a.compileFlags == b.compileFlags;
}
inline bool operator!=(const QRhiD3D11::BytecodeCacheKey &a, const QRhiD3D11::BytecodeCacheKey &b) noexcept
{
return !(a == b);
}
inline size_t qHash(const QRhiD3D11::BytecodeCacheKey &k, size_t seed = 0) noexcept
{
return qHash(k.sourceHash, seed) ^ qHash(k.target) ^ qHash(k.entryPoint) ^ k.compileFlags;
}
QT_END_NAMESPACE
#endif