Argdown Document

Shader Compilation

[Have `createShaderModule`]: WebGPU should have an API to create a shader module independently of pipelines.

<vkCreateShaderModule>: matches Vulkan's `vkCreateShaderModule`

<Parsing and validation>: allows us to parse WGSL, build IR, and validate it. This guarantees that the only errors that could happen later on with a shader are about the shader interface.

<Raw non-Vulkan>: on non-Vulkan, there are no low-level objects to create at this stage.

[`GPUShaderModule` should contain `MTLLibrary`]: WebGPU, Metal, Vulkan, and D3D12 all have 2-stage shader compilation pipelines. Vulkan matches WebGPU in this API. `MTLLibrary` is the product of the first stage in Metal, however WebGPU implementations can not produce it in `createShaderModule`, because it requires more information about the pipeline. Same applies to HLSL/DXIL.

<newLibraryWithSource>: matches Metal's `newLibraryWithSource`. When Metal developers write shaders, they have to define the structure of argument buffers inside the shader code. This roughly matches WebGPU's `GPUPipelineLayout`.

<D3DCompile>: matches D3D's `D3DCompile` - would produce HLSL/DXIL in the first stage.

[Have `createShaderModule`]

[Pipeline layouts in `createShaderModule`]

<Metal's `createShaderModule` is slow>: `createShaderModule` is slow, thus it makes sense to do it early before the pipelines are created. Roughly speaking, it takes as much time to generate `MTLLibrary` as it does to create an `MTLRenderPipelineState`.

@litherum showed the numbers from the Metal Performance Shaders running matrix multiplication.

@kvark showed the numbers from Dota2 shaders transpiling to Metal in a Vulkan Portability implementation.

@Kangz showed the numbers with forced driver cache invalidation.

[`GPUShaderModule` should contain `MTLLibrary`]

<Full pipeline descriptor>: In order to create `MTLLibrary` or DXIL from WGSL code, we'd need most of the `GPURenderPipelineDescriptor` data.

<Need layout>: knowing the pipeline layout is required to assign the proper resource binding indices, as well as define Metal argument buffers.

[Pipeline layouts in `createShaderModule`]

<Need sample mask>: On Metal, we have to implement the per-pipeline sample mask by modifying the `[[sample_mask]]` fragment shader output in code. Doing this unconditionally would disable early-Z tests, bearing a high cost on performance. So we should only inject this code in shaders that rely on the pipeline sample mask, and the value of it being non-trivial.

<Need depth bias>: On drivers/platforms that don't have a proper depth bias state, we could emulate it by injecting code into vertex shaders.

<Need vertex input>: Dawn and wgpu want to have an ability to implement out-of-bound checks for vertex attributes via programmable pulling on Metal. This requires vertex shader code injected.

<Need unknown>: We don't know what other pipeline states we'd need in the future in order to modify/inject shader code, related to possible HW and driver bug workarounds.

[Pipeline layouts in `createShaderModule`]

[Have `precompile` API]

[Pipeline layouts in `createShaderModule`]: Have an ability to provide the pipeline layouts into `createShaderModule`, per entry point.

[Keep `createShaderModule`]

[Keep `createShaderModule`]: Keep the API for module creation as is, mostly matching Vulkan.

[Pipeline layouts in `createShaderModule`]

[Have `createShaderModule`]

<Caches>: The combination of driver cache and browser cache can drastically reduce the number of cases where we'd have to create a new `MTLLibrary`.

[Keep `createShaderModule`]

<Target LLVM IR>: The implementations could target AIR and DXIL directly, in which case the cost of producing it would be reduced compare to the current model that has a text format MSL or HLSL as an intermediate.

[Keep `createShaderModule`]

<Multiple entry points>: An ideal Metal application would re-use `MTLLibrary` for multiple different entry points in the pipelines it creates. That is unlike WebGPU, where an implementation can only create `MTLLibrary` when it has a specific entry point selected, and doesn't know about any other entry points.

[`GPUShaderModule` should contain `MTLLibrary`]

<All at once>: Given a list of entry points with all the associated pipeline information, an implementation can create fewer `MTLLibrary` and `DXIL` objects.

[Have `precompile` API]

[Have `createReadyPipelines`]

<Combinations>: Users may not know ahead of time which vertex shaders are used with which fragment shaders. In Metal, they can still mix and match `MTLLibrary` use for pipeline creation.

[Have `createReadyPipelines`]

[Have `precompile` API]: Add some API on the `GPUShaderModule` to tell it ahead of time the bits of the pipeline that the contained entry points will be created with.

<Precompile spec>: it's difficult to specify this and set up the proper expectations for the users, as in: what would be the difference between hinting and not hinting, depending on the target platform.

[Have `createReadyPipelines`]: WebGPU should have a plural `createReadyPipelines` asynchronous method.