The very first entry in this blog was about a prototype I made for the next generation of Web 3D API - WebMetal. It was my first experimental take on the problem of building a successor to WebGL, my first dive into the Web APIs in general, and it was all implemented at nights during my first few months at Mozilla. This prototype needed to be done before the Vancouver Khronos face-to-face (F2F, december 2016), where the new working groups would gather: one called “3D Portability” and another “WebGL Next”. They had a lot of shared members - people who’d later transition to W3C and start the WebGPU community group there. I think the Metal-based Web API was quite reasonable, and I even showed it to a few members, without making a big deal out of it. It turned out, Apple also had a very similar prototype ready, which later was made into a formal proposal.
In the meantime, I was getting to know the collegues at Mozilla, who tried to cautiously warn me about the dangers of either going with a weakly specified API (such as Metal), or designing one from scratch. The idea of taking an existing Khronos-made API and adopting it for the Web (just like with WebGL) was then voiced again and amplified during our gatherings with Mozilla members at GDC 2017. It felt like a healthy consensus, even though I wasn’t sure it was the right solution. I didn’t know Vulkan at the time, after all. Besides, I didn’t want to throw the prototype to trash, it cost me quite a few nights. But I was talking to people who were very convincing, who brought us WebGL, and who I wanted to play along with. So I decided to compromise, trying to convince myself and the world that WebVulkan was the way to go.
Age of Vulkan
My WebMetal prototype got scraped. I made an official proposal called Obsidian and wrote about the Web Platform here. It was hard: gathering feedback from both within Mozilla and all the external contacts I had. We even argued with @zeux, who also thought that Metal was a better target for the Web, and I effectively opposed the very idea my first prototype was built around.
At the same time, gfx-rs library, which I was a co-author of, was being rewritten by @msiglreith to target the low-level APIs efficiently. Problem was - building APIs is hard, and we didn’t know how it should look like. I figured that it needed to be very explicit anyway, and we might as well just align with Vulkan. This would allow us to piggy-back on years of research by Khronos working groups. Also, we could get away with minimal documentation, since we’d refer people to the Vulkan specification and tutorials. And so it was decided - gfx-rs would become Vulkan-like, but in Rust, and called
The mentioned “WebGL-Next” transformed into “WebGPU” at W3C, but the “3D Portability” turned into something entirely different. Driven by the same feedback (“don’t create a new API!”), it re-focused on Vulkan and became the Vulkan Portability Technical Subgroup. The goals for this group were exactly the same as our new goals for
gfx-hal, so I was happy to be a part of it, bringing a lot of feedback from our experiments, and driving the subgroup’s investigations to some extent. It was only a matter of time before we wrapped
gfx-hal into a C-API layer that looked like a Vulkan driver to the outside world. Thus, gfx-portability was born, and later joined the race to produce a Vulkan Portability implementation.
gfx-portability had quite a few wins. We pioneered running the CTS on Metal, we demonstrated it to be faster than MoltenVK on Dota2 and then later on Dolphin emulator. We analyzed the overhead of this path. But the overall momentum behind
gfx-rs project was slowing down. Building any usable API on top of it was just too difficult, and without such API the Vulkan-like power of
gfx-hal was unreachable to Rust developers.
While all of this was happening, I was learning more nasty details about Vulkan. First, the swapchain was a huge pain for
gfx-hal. Vulkan’s swapchain model is just too permissive, and it’s extremely hard to layer on top of other APIs. In the end, we ditched it, and came up with something much more restrictive, like Metal’s API. A lot of other small details were also not layering well, such as
VkDescriptorPool. It turned out that DX12 was more restricted: only one
D3D12DescriptorHeap of a kind could be active at any time. Trying to dig into reasons for this lead me to a horrifying discovery: Vulkan doesn’t map to hardware as well as I expected. Some IHVs had to implement non-trivial descriptor management in the drivers to work with Vulkan model. But… it’s low level, well specified, and designed by the best, right? How can it be sub-optimal?
Rise of WebGPU
On the Web side, the “Obsidian” proposal didn’t get any traction. The newly formed W3C group decided that they had enough power to design a new API instead, one that would be made for the Web, but with native use in mind. The design process was very slow, especially because half of the members, including myself, were trying to get the low-level explicitness into it, like the pipeline barriers. But in the end, it didn’t make sense to do. Blindly trusting a Khronos API would lead us nowhere. So eventually, the new WebGPU API started to make shape. It appeared to be a mix of Metal with the binding model of Vulkan. Eventually I started wgpu project to implement it in Rust on top of
gfx-hal. The idea was that it will be used in Gecko and Servo for implementing WebGPU, and even if
gfx-hal underneath turns out to be problematic, we could just switch it to pure Vulkan in a matter of days.
Interestingly, wgpu-rs turned out to be a perfect high-level wrapper over
gfx-hal. It was actually usable to write for, and it was supported by a real specification and a ton of R&D. It spawned a large ecosystem of libraries and applications that is still growing today. For many users, Web was not interesting at all, but WebGPU on native made total sense.
gfx-hal was still playing an important role in WebGPU discussions. It steered our position towards Vulkan-like behavior. Getting anything matching Vulkan was effectively free for us (e.g.
vkCmdBlitImage, which WebGPU doesn’t have). Getting anything different, like GPURenderBundle, was a pain, since it required us to punch holes and plant non-Vulkan logic into
wgpu became the dominant user of
gfx-hal, guiding its development. Still, having it in a separate repository, with standalone API, testing suite, issues and users, was not helping the development.
gfx-hal grew a large code base that was only partially used by
wgpu. When the question stood up about moving
wgpu, I figured that the old API boundary was not worth saving. That incentivized me to write a new hardware abstraction layer for
wgpu-hal for short), based on the ideas and code from
gfx-hal. This new layer is clearly driven by
wgpu, it’s very small and efficient. It doesn’t have any of the workarounds that
gfx-hal needs because of its Vulkan roots.
Similar battles were fought on the shading language side, but support for SPIR-V (aka Vulkan path) was much stronger at that time. I wanted to free
gfx-hal from SPIRV-Cross dependency, so I attempted to adopt rspirv as a basis for our intermediate representation. This would effectively be a SPIR-V IR, translating into everything we need. But it got very hard to do, both logistically (building on top of auto-generated code) and conceptually (SPIR-V doesn’t make the best IR, read more about the horrors).
After years of discussions, Google made the Tint proposal, and the group dispatched on an adventure to build a new shading language WGSL, accepting the proposal as a starting point. Google dropped the attempt to bring
SPIRV-Cross into good shape, and wrote the shader translator from scratch, which is still called tint. At the same time, I stopped messing with
rspirv and started the naga project.
naga’s intermediate representation (IR) largely follows WGSL itself. Writing the SPIR-V frontend is still one of the most complex tasks in the project.
naga matured, I integrated it into
gfx-hal to replace
On Metal, testing through running Dota2 once again (via
gfx-portability on Metal) revealed a lot of issues. It took a few months to polish the implementation with help from the growing community. Soon,
naga was ready to shows its teeth in the new shader portability benchmark.
For the last 5 years, my work was largely aligned with Khronos tech, such as Vulkan and SPIR-V. This alignment didn’t work well in the end.
gfx-portability are not widely used today.
wgpu got its own hardware abstraction layer. And
naga treats SPIR-V as weird frontend and backend targets.
Considering that group of Mozillians who reached consensus to try bringing Vulkan to the Web, most no longer work at Mozilla. Voicing an idealistic opinion is one thing, but actually investing to support it is much more difficult.
My advice for the future self is to avoid blindly putting trust elsewhere, be it Khronos groups, or people who I work with. I want to believe that open standards are the best, and still do, but that wish impaired my judgement. Knowing when to trust yourself is a good skill.