rhi: d3d11: Make the "pipeline" cache save/load functional

There are no pipelines here of course. That's only for Vulkan. But the QRhi APIs provide a common interface for retrieving the serialized blob and pre-seeding the cache with a blob. The OpenGL backend already implements that interface via GL program binaries. We can do something similar with D3D, but it's a lot simpler: we just need to include the bytecode from HLSL->DXBC compilation (i.e. the result of D3DCompile() calls) and pick up the already present bytecode and skip the D3DCompile() call when applicable. Thus the mechanism is now available for Vulkan, OpenGL, and D3D11 as well. Has no effect whatsoever if EnablePipelineCacheLoadSave is not set at QRhi create() time. Also update the related docs. Task-number: QTBUG-103802 Change-Id: I91f1fb1f471bc7c654e26886a37c283066e842a8 Reviewed-by: Andy Nichols <andy.nichols@qt.io>
2022-05-24 20:49:23 +02:00 · 2022-05-24 20:49:23 +02:00 · 663b375373
commit 663b375373
parent 7908b0cea6
3 changed files with 272 additions and 34 deletions
--- a/src/gui/rhi/qrhi.cpp
+++ b/src/gui/rhi/qrhi.cpp
@ -433,16 +433,24 @@ Q_LOGGING_CATEGORY(QRHI_LOG_INFO, "qt.rhi.general")

    \value EnablePipelineCacheDataSave Enables retrieving the pipeline cache
    contents, where applicable. When not set, pipelineCacheData() will return
-    an empty blob always. Opting in is relevant in particular with backends
-    where additional, potentially time consuming work is needed to maintain the
-    data structures with the serialized, binary versions of shader programs. An
-    example is OpenGL, where the "pipeline cache" is simulated by retrieving
-    and loading shader program binaries. With backends where retrieving and
-    restoring the pipeline cache contents is not supported, the flag has no
-    effect. With some backends (such as, OpenGL) there are additional,
-    disk-based caching mechanisms for shader binaries. Writing to those may get
-    disabled whenever this flag is set since storing program binaries (OpenGL)
-    to multiple caches is not sensible.
+    an empty blob always. With backends where retrieving and restoring the
+    pipeline cache contents is not supported, the flag has no effect and the
+    serialized cache data is always empty. The flag provides an opt-in
+    mechanism because the cost of maintaining the related data structures is
+    not insignificant with some backends. With Vulkan this feature maps
+    directly to VkPipelineCache, vkGetPipelineCacheData and
+    VkPipelineCacheCreateInfo::pInitialData. With D3D11 there is no real
+    pipline cache, but the results of HLSL->DXBC compilations are stored and
+    can be serialized/deserialized via this mechanism. This allows skipping the
+    time consuming D3DCompile() in future runs of the applications for shaders
+    that come with HLSL source instead of offline pre-compiled bytecode. This
+    can provide a huge boost in startup and load times, if there is a lot of
+    HLSL source compilation happening. With OpenGL the "pipeline cache" is
+    simulated by retrieving and loading shader program binaries (if supported
+    by the driver). With OpenGL there are additional, disk-based caching
+    mechanisms for shader/program binaries provided by Qt. Writing to those may
+    get disabled whenever this flag is set since storing program binaries to
+    multiple caches is not sensible.
 */

 /*!
@ -6860,7 +6868,9 @@ bool QRhi::isDeviceLost() const

    By saving and then, in subsequent runs of the same application, reloading
    the cache data, pipeline and shader creation times can potentially be
-    accelerated.
+    reduced. What exactly the cache and its serialized version includes is not
+    specified, is always specific to the backend used, and in some cases also
+    dependent on the particular implementation of the graphics API.

    When the PipelineCacheDataLoadSave is reported as unsupported, the returned
    QByteArray is empty.
@ -6869,15 +6879,20 @@ bool QRhi::isDeviceLost() const
    create(), the returned QByteArray may be empty, even when the
    PipelineCacheDataLoadSave feature is supported.

-    When the returned data is non-empty, it is always specific to the QRhi
-    backend, the graphics device, and the driver implementation in use. QRhi
+    When the returned data is non-empty, it is always specific to the Qt
+    version and QRhi backend. In addition, in some cases there is a strong
+    dependency to the graphics device and the exact driver version used. QRhi
    takes care of adding the appropriate header and safeguards that ensure that
-    the data can always be passed safely to setPipelineCacheData().
+    the data can always be passed safely to setPipelineCacheData(), therefore
+    attempting to load data from a run on another version of a driver will be
+    handled safely and gracefully.

    \note Calling releaseCachedResources() may, depending on the backend, clear
    the pipeline data collected. A subsequent call to this function may then
    not return any data.

+    See EnablePipelineCacheDataSave for further details about this feature.
+
    \sa setPipelineCacheData(), create(), isFeatureSupported()
 */
 QByteArray QRhi::pipelineCacheData()
@ -6891,13 +6906,14 @@ QByteArray QRhi::pipelineCacheData()
    When the PipelineCacheDataLoadSave is reported as unsupported, the function
    is safe to call, but has no effect.

-    The blob returned by pipelineCacheData() is always specific to a QRhi
-    backend, a graphics device, and a given version of the graphics driver.
-    QRhi takes care of adding the appropriate header and safeguards that ensure
-    that the data can always be passed safely to this function. If there is a
-    mismatch, e.g. because the driver has been upgraded to a newer version, or
-    because the data was generated from a different QRhi backend, a warning is
-    printed and \a data is safely ignored.
+    The blob returned by pipelineCacheData() is always specific to the Qt
+    version, the QRhi backend, and, in some cases, also to the graphics device,
+    and a given version of the graphics driver. QRhi takes care of adding the
+    appropriate header and safeguards that ensure that the data can always be
+    passed safely to this function. If there is a mismatch, e.g. because the
+    driver has been upgraded to a newer version, or because the data was
+    generated from a different QRhi backend, a warning is printed and \a data
+    is safely ignored.

    With Vulkan, this maps directly to VkPipelineCache. Calling this function
    creates a new Vulkan pipeline cache object, with its initial data sourced
@ -6905,11 +6921,27 @@ QByteArray QRhi::pipelineCacheData()
    created QRhiGraphicsPipeline and QRhiComputePipeline objects, thus
    accelerating, potentially, the pipeline creation.

+    With other APIs there is no real pipeline cache, but they may provide a
+    cache with bytecode from shader compilations (D3D) or program binaries
+    (OpenGL). In applications that perform a lot of shader compilation from
+    source at run time this can provide a significant boost in subsequent runs
+    if the "pipeline cache" is pre-seeded from an earlier run using this
+    function.
+
    \note QRhi cannot give any guarantees that \a data has an effect on the
    pipeline and shader creation performance. With APIs like Vulkan, it is up
    to the driver to decide if \a data is used for some purpose, or if it is
    ignored.

+    See EnablePipelineCacheDataSave for further details about this feature.
+
+    \note This mechanism offered by QRhi is independent of the drivers' own
+    internal caching mechanism, if any. This means that, depending on the
+    graphics API and its implementation, the exact effects of retrieving and
+    then reloading \a data are not predictable. Improved performance may not be
+    visible at all in case other caching mechanisms outside of Qt's control are
+    already active.
+
    \sa pipelineCacheData(), isFeatureSupported()
 */
 void QRhi::setPipelineCacheData(const QByteArray &data)
--- a/src/gui/rhi/qrhid3d11.cpp
+++ b/src/gui/rhi/qrhid3d11.cpp
@ -161,7 +161,7 @@ static IDXGIFactory1 *createDXGIFactory1()

 bool QRhiD3D11::create(QRhi::Flags flags)
 {
-    Q_UNUSED(flags);
+    rhiFlags = flags;

    uint devFlags = 0;
    if (debugLayer)
@ -538,7 +538,7 @@ bool QRhiD3D11::isFeatureSupported(QRhi::Feature feature) const
    case QRhi::ReadBackAnyTextureFormat:
        return true;
    case QRhi::PipelineCacheDataLoadSave:
-        return false;
+        return true;
    case QRhi::ImageDataStride:
        return true;
    case QRhi::RenderBufferImport:
@ -628,6 +628,7 @@ bool QRhiD3D11::makeThreadLocalNativeContextCurrent()
 void QRhiD3D11::releaseCachedResources()
 {
    clearShaderCache();
+    m_bytecodeCache.clear();
 }

 bool QRhiD3D11::isDeviceLost() const
@ -635,14 +636,159 @@ bool QRhiD3D11::isDeviceLost() const
    return deviceLost;
 }

+struct QD3D11PipelineCacheDataHeader
+{
+    quint32 rhiId;
+    quint32 arch;
+    // no need for driver specifics
+    quint32 count;
+    quint32 dataSize;
+};
+
 QByteArray QRhiD3D11::pipelineCacheData()
 {
-    return QByteArray();
+    QByteArray data;
+    if (m_bytecodeCache.isEmpty())
+        return data;
+
+    QD3D11PipelineCacheDataHeader header;
+    memset(&header, 0, sizeof(header));
+    header.rhiId = pipelineCacheRhiId();
+    header.arch = quint32(sizeof(void*));
+    header.count = m_bytecodeCache.count();
+
+    const size_t dataOffset = sizeof(header);
+    size_t dataSize = 0;
+    for (auto it = m_bytecodeCache.cbegin(), end = m_bytecodeCache.cend(); it != end; ++it) {
+        BytecodeCacheKey key = it.key();
+        QByteArray bytecode = it.value();
+        dataSize +=
+                  sizeof(quint32) + key.sourceHash.size()
+                + sizeof(quint32) + key.target.size()
+                + sizeof(quint32) + key.entryPoint.size()
+                + sizeof(quint32) // compileFlags
+                + sizeof(quint32) + bytecode.size();
+    }
+
+    QByteArray buf(dataOffset + dataSize, Qt::Uninitialized);
+    char *p = buf.data() + dataOffset;
+    for (auto it = m_bytecodeCache.cbegin(), end = m_bytecodeCache.cend(); it != end; ++it) {
+        BytecodeCacheKey key = it.key();
+        QByteArray bytecode = it.value();
+
+        quint32 i = key.sourceHash.size();
+        memcpy(p, &i, 4);
+        p += 4;
+        memcpy(p, key.sourceHash.constData(), key.sourceHash.size());
+        p += key.sourceHash.size();
+
+        i = key.target.size();
+        memcpy(p, &i, 4);
+        p += 4;
+        memcpy(p, key.target.constData(), key.target.size());
+        p += key.target.size();
+
+        i = key.entryPoint.size();
+        memcpy(p, &i, 4);
+        p += 4;
+        memcpy(p, key.entryPoint.constData(), key.entryPoint.size());
+        p += key.entryPoint.size();
+
+        quint32 f = key.compileFlags;
+        memcpy(p, &f, 4);
+        p += 4;
+
+        i = bytecode.size();
+        memcpy(p, &i, 4);
+        p += 4;
+        memcpy(p, bytecode.constData(), bytecode.size());
+        p += bytecode.size();
+    }
+    Q_ASSERT(p == buf.data() + dataOffset + dataSize);
+
+    header.dataSize = quint32(dataSize);
+    memcpy(buf.data(), &header, sizeof(header));
+
+    return buf;
 }

 void QRhiD3D11::setPipelineCacheData(const QByteArray &data)
 {
-    Q_UNUSED(data);
+    if (data.isEmpty())
+        return;
+
+    const size_t headerSize = sizeof(QD3D11PipelineCacheDataHeader);
+    if (data.size() < qsizetype(headerSize)) {
+        qWarning("setPipelineCacheData: Invalid blob size (header incomplete)");
+        return;
+    }
+    const size_t dataOffset = headerSize;
+    QD3D11PipelineCacheDataHeader header;
+    memcpy(&header, data.constData(), headerSize);
+
+    const quint32 rhiId = pipelineCacheRhiId();
+    if (header.rhiId != rhiId) {
+        qWarning("setPipelineCacheData: The data is for a different QRhi version or backend (%u, %u)",
+                 rhiId, header.rhiId);
+        return;
+    }
+    const quint32 arch = quint32(sizeof(void*));
+    if (header.arch != arch) {
+        qWarning("setPipelineCacheData: Architecture does not match (%u, %u)",
+                 arch, header.arch);
+        return;
+    }
+    if (header.count == 0)
+        return;
+
+    if (data.size() < qsizetype(dataOffset + header.dataSize)) {
+        qWarning("setPipelineCacheData: Invalid blob size (data incomplete)");
+        return;
+    }
+
+    m_bytecodeCache.clear();
+
+    const char *p = data.constData() + dataOffset;
+    for (quint32 i = 0; i < header.count; ++i) {
+        quint32 len = 0;
+        memcpy(&len, p, 4);
+        p += 4;
+        QByteArray sourceHash(len, Qt::Uninitialized);
+        memcpy(sourceHash.data(), p, len);
+        p += len;
+
+        memcpy(&len, p, 4);
+        p += 4;
+        QByteArray target(len, Qt::Uninitialized);
+        memcpy(target.data(), p, len);
+        p += len;
+
+        memcpy(&len, p, 4);
+        p += 4;
+        QByteArray entryPoint(len, Qt::Uninitialized);
+        memcpy(entryPoint.data(), p, len);
+        p += len;
+
+        quint32 flags;
+        memcpy(&flags, p, 4);
+        p += 4;
+
+        memcpy(&len, p, 4);
+        p += 4;
+        QByteArray bytecode(len, Qt::Uninitialized);
+        memcpy(bytecode.data(), p, len);
+        p += len;
+
+        BytecodeCacheKey cacheKey;
+        cacheKey.sourceHash = sourceHash;
+        cacheKey.target = target;
+        cacheKey.entryPoint = entryPoint;
+        cacheKey.compileFlags = flags;
+
+        m_bytecodeCache.insert(cacheKey, bytecode);
+    }
+
+    qCDebug(QRHI_LOG_INFO, "Seeded bytecode cache with %d shaders", int(m_bytecodeCache.count()));
 }

 QRhiRenderBuffer *QRhiD3D11::createRenderBuffer(QRhiRenderBuffer::Type type, const QSize &pixelSize,
@ -4002,8 +4148,16 @@ static pD3DCompile resolveD3DCompile()
    return nullptr;
 }

-static QByteArray compileHlslShaderSource(const QShader &shader, QShader::Variant shaderVariant, UINT flags,
-                                          QString *error, QShaderKey *usedShaderKey)
+static inline QByteArray sourceHash(const QByteArray &source)
+{
+    // taken from the GL backend, use the same mechanism to get a key
+    QCryptographicHash keyBuilder(QCryptographicHash::Sha1);
+    keyBuilder.addData(source);
+    return keyBuilder.result().toHex();
+}
+
+QByteArray QRhiD3D11::compileHlslShaderSource(const QShader &shader, QShader::Variant shaderVariant, uint flags,
+                                              QString *error, QShaderKey *usedShaderKey)
 {
    QShaderKey key = { QShader::DxbcShader, 50, shaderVariant };
    QShaderCode dxbc = shader.shader(key);
@ -4020,6 +4174,9 @@ static QByteArray compileHlslShaderSource(const QShader &shader, QShader::Varian
        return QByteArray();
    }

+    if (usedShaderKey)
+        *usedShaderKey = key;
+
    const char *target;
    switch (shader.stage()) {
    case QShader::VertexStage:
@ -4045,6 +4202,17 @@ static QByteArray compileHlslShaderSource(const QShader &shader, QShader::Varian
        return QByteArray();
    }

+    BytecodeCacheKey cacheKey;
+    if (rhiFlags.testFlag(QRhi::EnablePipelineCacheDataSave)) {
+        cacheKey.sourceHash = sourceHash(hlslSource.shader());
+        cacheKey.target = target;
+        cacheKey.entryPoint = hlslSource.entryPoint();
+        cacheKey.compileFlags = flags;
+        auto cacheIt = m_bytecodeCache.constFind(cacheKey);
+        if (cacheIt != m_bytecodeCache.constEnd())
+            return cacheIt.value();
+    }
+
    static const pD3DCompile d3dCompile = resolveD3DCompile();
    if (d3dCompile == nullptr) {
        qWarning("Unable to resolve function D3DCompile()");
@ -4066,13 +4234,14 @@ static QByteArray compileHlslShaderSource(const QShader &shader, QShader::Varian
        return QByteArray();
    }

-    if (usedShaderKey)
-        *usedShaderKey = key;
-
    QByteArray result;
    result.resize(int(bytecode->GetBufferSize()));
    memcpy(result.data(), bytecode->GetBufferPointer(), size_t(result.size()));
    bytecode->Release();
+
+    if (rhiFlags.testFlag(QRhi::EnablePipelineCacheDataSave))
+        m_bytecodeCache.insert(cacheKey, result);
+
    return result;
 }

@ -4180,8 +4349,8 @@ bool QD3D11GraphicsPipeline::create()
            if (m_flags.testFlag(CompileShadersWithDebugInfo))
                compileFlags |= D3DCOMPILE_DEBUG;

-            const QByteArray bytecode = compileHlslShaderSource(shaderStage.shader(), shaderStage.shaderVariant(), compileFlags,
-                                                                &error, &shaderKey);
+            const QByteArray bytecode = rhiD->compileHlslShaderSource(shaderStage.shader(), shaderStage.shaderVariant(), compileFlags,
+                                                                      &error, &shaderKey);
            if (bytecode.isEmpty()) {
                qWarning("HLSL shader compilation failed: %s", qPrintable(error));
                return false;
@ -4315,8 +4484,8 @@ bool QD3D11ComputePipeline::create()
        if (m_flags.testFlag(CompileShadersWithDebugInfo))
            compileFlags |= D3DCOMPILE_DEBUG;

-        const QByteArray bytecode = compileHlslShaderSource(m_shaderStage.shader(), m_shaderStage.shaderVariant(), compileFlags,
-                                                            &error, &shaderKey);
+        const QByteArray bytecode = rhiD->compileHlslShaderSource(m_shaderStage.shader(), m_shaderStage.shaderVariant(), compileFlags,
+                                                                  &error, &shaderKey);
        if (bytecode.isEmpty()) {
            qWarning("HLSL compute shader compilation failed: %s", qPrintable(error));
            return false;
--- a/src/gui/rhi/qrhid3d11_p_p.h
+++ b/src/gui/rhi/qrhid3d11_p_p.h
@ -679,7 +679,10 @@ public:
    void finishActiveReadbacks();
    void reportLiveObjects(ID3D11Device *device);
    void clearShaderCache();
+    QByteArray compileHlslShaderSource(const QShader &shader, QShader::Variant shaderVariant, uint flags,
+                                       QString *error, QShaderKey *usedShaderKey);

+    QRhi::Flags rhiFlags;
    bool debugLayer = false;
    bool importedDeviceAndContext = false;
    ID3D11Device *dev = nullptr;
@ -751,11 +754,45 @@ public:
        void releaseResources();
        void activate();
    } deviceCurse;
+
+    // This is what gets exposed as the "pipeline cache", not that that concept
+    // applies anyway. Here we are just storing the DX bytecode for a shader so
+    // we can skip the HLSL->DXBC compilation when the QShader has HLSL source
+    // code and the same shader source has already been compiled before.
+    // m_shaderCache seemingly does the same, but this here does not care about
+    // the ID3D11*Shader, this is just about the bytecode and about allowing
+    // the data to be serialized to persistent storage and then reloaded in
+    // future runs of the app, or when creating another QRhi, etc.
+    struct BytecodeCacheKey {
+        QByteArray sourceHash;
+        QByteArray target;
+        QByteArray entryPoint;
+        uint compileFlags;
+    };
+    QHash<BytecodeCacheKey, QByteArray> m_bytecodeCache;
 };

 Q_DECLARE_TYPEINFO(QRhiD3D11::TextureReadback, Q_RELOCATABLE_TYPE);
 Q_DECLARE_TYPEINFO(QRhiD3D11::BufferReadback, Q_RELOCATABLE_TYPE);

+inline bool operator==(const QRhiD3D11::BytecodeCacheKey &a, const QRhiD3D11::BytecodeCacheKey &b) noexcept
+{
+    return a.sourceHash == b.sourceHash
+            && a.target == b.target
+            && a.entryPoint == b.entryPoint
+            && a.compileFlags == b.compileFlags;
+}
+
+inline bool operator!=(const QRhiD3D11::BytecodeCacheKey &a, const QRhiD3D11::BytecodeCacheKey &b) noexcept
+{
+    return !(a == b);
+}
+
+inline size_t qHash(const QRhiD3D11::BytecodeCacheKey &k, size_t seed = 0) noexcept
+{
+    return qHash(k.sourceHash, seed) ^ qHash(k.target) ^ qHash(k.entryPoint) ^ k.compileFlags;
+}
+
 QT_END_NAMESPACE

 #endif