1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
/*!
Backend for [HLSL][hlsl] (High-Level Shading Language).

# Supported shader model versions:
- 5.0
- 5.1
- 6.0

# Layout of values in `uniform` buffers

WGSL's ["Internal Layout of Values"][ilov] rules specify how each WGSL
type should be stored in `uniform` and `storage` buffers. The HLSL we
generate must access values in that form, even when it is not what
HLSL would use normally.

The rules described here only apply to WGSL `uniform` variables. WGSL
`storage` buffers are translated as HLSL `ByteAddressBuffers`, for
which we generate `Load` and `Store` method calls with explicit byte
offsets. WGSL pipeline inputs must be scalars or vectors; they cannot
be matrices, which is where the interesting problems arise.

## Row- and column-major ordering for matrices

WGSL specifies that matrices in uniform buffers are stored in
column-major order. This matches HLSL's default, so one might expect
things to be straightforward. Unfortunately, WGSL and HLSL disagree on
what indexing a matrix means: in WGSL, `m[i]` retrieves the `i`'th
*column* of `m`, whereas in HLSL it retrieves the `i`'th *row*. We
want to avoid translating `m[i]` into some complicated reassembly of a
vector from individually fetched components, so this is a problem.

However, with a bit of trickery, it is possible to use HLSL's `m[i]`
as the translation of WGSL's `m[i]`:

- We declare all matrices in uniform buffers in HLSL with the
  `row_major` qualifier, and transpose the row and column counts: a
  WGSL `mat3x4<f32>`, say, becomes an HLSL `row_major float3x4`. (Note
  that WGSL and HLSL type names put the row and column in reverse
  order.) Since the HLSL type is the transpose of how WebGPU directs
  the user to store the data, HLSL will load all matrices transposed.

- Since matrices are transposed, an HLSL indexing expression retrieves
  the "columns" of the intended WGSL value, as desired.

- For vector-matrix multiplication, since `mul(transpose(m), v)` is
  equivalent to `mul(v, m)` (note the reversal of the arguments), and
  `mul(v, transpose(m))` is equivalent to `mul(m, v)`, we can
  translate WGSL `m * v` and `v * m` to HLSL by simply reversing the
  arguments to `mul`.

## Padding in two-row matrices

An HLSL `row_major floatKx2` matrix has padding between its rows that
the WGSL `matKx2<f32>` matrix it represents does not. HLSL stores all
matrix rows [aligned on 16-byte boundaries][16bb], whereas WGSL says
that the columns of a `matKx2<f32>` need only be [aligned as required
for `vec2<f32>`][ilov], which is [eight-byte alignment][8bb].

To compensate for this, any time a `matKx2<f32>` appears in a WGSL
`uniform` variable, whether directly as the variable's type or as part
of a struct/array, we actually emit `K` separate `float2` members, and
assemble/disassemble the matrix from its columns (in WGSL; rows in
HLSL) upon load and store.

For example, the following WGSL struct type:

```ignore
struct Baz {
        m: mat3x2<f32>,
}
```

is rendered as the HLSL struct type:

```ignore
struct Baz {
    float2 m_0; float2 m_1; float2 m_2;
};
```

The `wrapped_struct_matrix` functions in `help.rs` generate HLSL
helper functions to access such members, converting between the stored
form and the HLSL matrix types appropriately. For example, for reading
the member `m` of the `Baz` struct above, we emit:

```ignore
float3x2 GetMatmOnBaz(Baz obj) {
    return float3x2(obj.m_0, obj.m_1, obj.m_2);
}
```

We also emit an analogous `Set` function, as well as functions for
accessing individual columns by dynamic index.

[hlsl]: https://docs.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl
[ilov]: https://gpuweb.github.io/gpuweb/wgsl/#internal-value-layout
[16bb]: https://github.com/microsoft/DirectXShaderCompiler/wiki/Buffer-Packing#constant-buffer-packing
[8bb]: https://gpuweb.github.io/gpuweb/wgsl/#alignment-and-size
*/

mod conv;
mod help;
mod keywords;
mod storage;
mod writer;

use std::fmt::Error as FmtError;
use thiserror::Error;

use crate::{back, proc};

#[derive(Clone, Debug, Default, PartialEq, Eq, Hash)]
#[cfg_attr(feature = "serialize", derive(serde::Serialize))]
#[cfg_attr(feature = "deserialize", derive(serde::Deserialize))]
pub struct BindTarget {
    pub space: u8,
    pub register: u32,
    /// If the binding is an unsized binding array, this overrides the size.
    pub binding_array_size: Option<u32>,
}

// Using `BTreeMap` instead of `HashMap` so that we can hash itself.
pub type BindingMap = std::collections::BTreeMap<crate::ResourceBinding, BindTarget>;

/// A HLSL shader model version.
#[allow(non_snake_case, non_camel_case_types)]
#[derive(Copy, Clone, Debug, Hash, Eq, PartialEq, PartialOrd)]
#[cfg_attr(feature = "serialize", derive(serde::Serialize))]
#[cfg_attr(feature = "deserialize", derive(serde::Deserialize))]
pub enum ShaderModel {
    V5_0,
    V5_1,
    V6_0,
}

impl ShaderModel {
    pub const fn to_str(self) -> &'static str {
        match self {
            Self::V5_0 => "5_0",
            Self::V5_1 => "5_1",
            Self::V6_0 => "6_0",
        }
    }
}

impl crate::ShaderStage {
    pub const fn to_hlsl_str(self) -> &'static str {
        match self {
            Self::Vertex => "vs",
            Self::Fragment => "ps",
            Self::Compute => "cs",
        }
    }
}

impl crate::ImageDimension {
    const fn to_hlsl_str(self) -> &'static str {
        match self {
            Self::D1 => "1D",
            Self::D2 => "2D",
            Self::D3 => "3D",
            Self::Cube => "Cube",
        }
    }
}

/// Shorthand result used internally by the backend
type BackendResult = Result<(), Error>;

#[derive(Clone, Debug, PartialEq, thiserror::Error)]
#[cfg_attr(feature = "serialize", derive(serde::Serialize))]
#[cfg_attr(feature = "deserialize", derive(serde::Deserialize))]
pub enum EntryPointError {
    #[error("mapping of {0:?} is missing")]
    MissingBinding(crate::ResourceBinding),
}

/// Configuration used in the [`Writer`].
#[derive(Clone, Debug, Hash, PartialEq, Eq)]
#[cfg_attr(feature = "serialize", derive(serde::Serialize))]
#[cfg_attr(feature = "deserialize", derive(serde::Deserialize))]
pub struct Options {
    /// The hlsl shader model to be used
    pub shader_model: ShaderModel,
    /// Map of resources association to binding locations.
    pub binding_map: BindingMap,
    /// Don't panic on missing bindings, instead generate any HLSL.
    pub fake_missing_bindings: bool,
    /// Add special constants to `SV_VertexIndex` and `SV_InstanceIndex`,
    /// to make them work like in Vulkan/Metal, with help of the host.
    pub special_constants_binding: Option<BindTarget>,
    /// Bind target of the push constant buffer
    pub push_constants_target: Option<BindTarget>,
    /// Should workgroup variables be zero initialized (by polyfilling)?
    pub zero_initialize_workgroup_memory: bool,
}

impl Default for Options {
    fn default() -> Self {
        Options {
            shader_model: ShaderModel::V5_1,
            binding_map: BindingMap::default(),
            fake_missing_bindings: true,
            special_constants_binding: None,
            push_constants_target: None,
            zero_initialize_workgroup_memory: true,
        }
    }
}

impl Options {
    fn resolve_resource_binding(
        &self,
        res_binding: &crate::ResourceBinding,
    ) -> Result<BindTarget, EntryPointError> {
        match self.binding_map.get(res_binding) {
            Some(target) => Ok(target.clone()),
            None if self.fake_missing_bindings => Ok(BindTarget {
                space: res_binding.group as u8,
                register: res_binding.binding,
                binding_array_size: None,
            }),
            None => Err(EntryPointError::MissingBinding(res_binding.clone())),
        }
    }
}

/// Reflection info for entry point names.
#[derive(Default)]
pub struct ReflectionInfo {
    /// Mapping of the entry point names.
    ///
    /// Each item in the array corresponds to an entry point index. The real entry point name may be different if one of the
    /// reserved words are used.
    ///
    /// Note: Some entry points may fail translation because of missing bindings.
    pub entry_point_names: Vec<Result<String, EntryPointError>>,
}

#[derive(Error, Debug)]
pub enum Error {
    #[error(transparent)]
    IoError(#[from] FmtError),
    #[error("A scalar with an unsupported width was requested: {0:?}")]
    UnsupportedScalar(crate::Scalar),
    #[error("{0}")]
    Unimplemented(String), // TODO: Error used only during development
    #[error("{0}")]
    Custom(String),
}

#[derive(Default)]
struct Wrapped {
    array_lengths: crate::FastHashSet<help::WrappedArrayLength>,
    image_queries: crate::FastHashSet<help::WrappedImageQuery>,
    constructors: crate::FastHashSet<help::WrappedConstructor>,
    struct_matrix_access: crate::FastHashSet<help::WrappedStructMatrixAccess>,
    mat_cx2s: crate::FastHashSet<help::WrappedMatCx2>,
}

impl Wrapped {
    fn clear(&mut self) {
        self.array_lengths.clear();
        self.image_queries.clear();
        self.constructors.clear();
        self.struct_matrix_access.clear();
        self.mat_cx2s.clear();
    }
}

pub struct Writer<'a, W> {
    out: W,
    names: crate::FastHashMap<proc::NameKey, String>,
    namer: proc::Namer,
    /// HLSL backend options
    options: &'a Options,
    /// Information about entry point arguments and result types.
    entry_point_io: Vec<writer::EntryPointInterface>,
    /// Set of expressions that have associated temporary variables
    named_expressions: crate::NamedExpressions,
    wrapped: Wrapped,

    /// A reference to some part of a global variable, lowered to a series of
    /// byte offset calculations.
    ///
    /// See the [`storage`] module for background on why we need this.
    ///
    /// Each [`SubAccess`] in the vector is a lowering of some [`Access`] or
    /// [`AccessIndex`] expression to the level of byte strides and offsets. See
    /// [`SubAccess`] for details.
    ///
    /// This field is a member of [`Writer`] solely to allow re-use of
    /// the `Vec`'s dynamic allocation. The value is no longer needed
    /// once HLSL for the access has been generated.
    ///
    /// [`Storage`]: crate::AddressSpace::Storage
    /// [`SubAccess`]: storage::SubAccess
    /// [`Access`]: crate::Expression::Access
    /// [`AccessIndex`]: crate::Expression::AccessIndex
    temp_access_chain: Vec<storage::SubAccess>,
    need_bake_expressions: back::NeedBakeExpressions,
}