Shader Compilation
[Have `createShaderModule`]: WebGPU should have an API to create a shader module independently of pipelines.+
<Parsing and validation>: allows us to parse WGSL, build IR, and validate it. This guarantees that the only errors that could happen later on with a shader are about the shader interface. -
<Raw non-Vulkan>: on non-Vulkan, there are no low-level objects to create at this stage. [`GPUShaderModule` should contain `MTLLibrary`]: WebGPU, Metal, Vulkan, and D3D12 all have 2-stage shader compilation pipelines. Vulkan matches WebGPU in this API. `MTLLibrary` is the product of the first stage in Metal, however WebGPU implementations can not produce it in `createShaderModule`, because it requires more information about the pipeline. Same applies to HLSL/DXIL.+
<newLibraryWithSource>: matches Metal's `newLibraryWithSource`. When Metal developers write shaders, they have to define the structure of argument buffers inside the shader code. This roughly matches WebGPU's `GPUPipelineLayout`. +
<D3DCompile>: matches D3D's `D3DCompile` - would produce HLSL/DXIL in the first stage. <Metal's `createShaderModule` is slow>: `createShaderModule` is slow, thus it makes sense to do it early before the pipelines are created. Roughly speaking, it takes as much time to generate `MTLLibrary` as it does to create an `MTLRenderPipelineState`.+
@litherum showed the numbers from the Metal Performance Shaders running matrix multiplication. +
@kvark showed the numbers from Dota2 shaders transpiling to Metal in a Vulkan Portability implementation. <Full pipeline descriptor>: In order to create `MTLLibrary` or DXIL from WGSL code, we'd need most of the `GPURenderPipelineDescriptor` data.+
<Need layout>: knowing the pipeline layout is required to assign the proper resource binding indices, as well as define Metal argument buffers. +
<Need sample mask>: On Metal, we have to implement the per-pipeline sample mask by modifying the `[[sample_mask]]` fragment shader output in code. Doing this unconditionally would disable early-Z tests, bearing a high cost on performance. So we should only inject this code in shaders that rely on the pipeline sample mask, and the value of it being non-trivial. +
<Need depth bias>: On drivers/platforms that don't have a proper depth bias state, we could emulate it by injecting code into vertex shaders. +
<Need vertex input>: Dawn and wgpu want to have an ability to implement out-of-bound checks for vertex attributes via programmable pulling on Metal. This requires vertex shader code injected. +
<Need unknown>: We don't know what other pipeline states we'd need in the future in order to modify/inject shader code, related to possible HW and driver bug workarounds. [Pipeline layouts in `createShaderModule`]: Have an ability to provide the pipeline layouts into `createShaderModule`, per entry point. [Keep `createShaderModule`]: Keep the API for module creation as is, mostly matching Vulkan. <Caches>: The combination of driver cache and browser cache can drastically reduce the number of cases where we'd have to create a new `MTLLibrary`. <Target LLVM IR>: The implementations could target AIR and DXIL directly, in which case the cost of producing it would be reduced compare to the current model that has a text format MSL or HLSL as an intermediate. <Multiple entry points>: An ideal Metal application would re-use `MTLLibrary` for multiple different entry points in the pipelines it creates. That is unlike WebGPU, where an implementation can only create `MTLLibrary` when it has a specific entry point selected, and doesn't know about any other entry points. <All at once>: Given a list of entry points with all the associated pipeline information, an implementation can create fewer `MTLLibrary` and `DXIL` objects. <Combinations>: Users may not know ahead of time which vertex shaders are used with which fragment shaders. In Metal, they can still mix and match `MTLLibrary` use for pipeline creation. [Have `precompile` API]: Add some API on the `GPUShaderModule` to tell it ahead of time the bits of the pipeline that the contained entry points will be created with.-
<Precompile spec>: it's difficult to specify this and set up the proper expectations for the users, as in: what would be the difference between hinting and not hinting, depending on the target platform. [Have `createReadyPipelines`]: WebGPU should have a plural `createReadyPipelines` asynchronous method.