Given the following shader:
struct S1 {
int Indices[6];
};
struct S2 {
int A;
int B[2];
int C;
};
StructuredBuffer<S1> In : register(t1);
RWStructuredBuffer<S1> Out : register(u0);
[numthreads(1,1,1)]
void main(uint GI : SV_GroupIndex) {
S2 Tmp = {1,2,3,4};
S1 Input = In[0];
S1 O = { Tmp.B[Input.Indices[0]],
Tmp.B[Input.Indices[1]],
Tmp.B[Input.Indices[2]],
Tmp.B[Input.Indices[3]],
Tmp.B[Input.Indices[4]],
Tmp.B[Input.Indices[5]] };
Out[0] = O;
}
If the values at In[0] are [-2, -1, 0, 1, 2, 3], what gets written to Out[0]?
A. [0, 1, 2, 3, 4, 0]
B. [0, 0, 2, 3, 0, 0]
C. [undefined, 1, 2, 3, 4, undefined]
D. [2, 2, 2, 3, 2, 2]
E. [0, 0, 2, 3, 2, 2]
F. [undefined, undefined, 2, 3, undefined, undefined]
Answer!
F!
I really admire anyone who thinks this should be “C” because it means you both understand C and embrace its true horror… That said, in HLSL out-of-bounds array accesses are always undefined behavior.
Understanding why this is undefined and the possible behaviors requires a little understanding of how GPUs manage thread-local memory. Basically every thread-local variable defined in a shader program is stored in registers rather than memory. One of the side-effects of this is that you’re not guaranteed that the memory layout that the structure would have is preserved.
DXIL specifically requires scalarization, so the members of a structure are pulled apart into separate scalar values. Arrays are only preserved if they are dynamically indexed.
While it seems like the option C is reasonable because it matches C++, the lack of guarantees about structure layout changes the likely outcomes for out of bound indexing. In practice I’ve observed three different behaviors.
WARP produces the output “E”. Which seems to return 0 for negative indexing, and the base address for OOB positive indices.
Vulkan produces “B”, which IMO is the most sane option given the constraints, where all OOB accesses return 0.
Metal (via the Metal Shader Converter) produces “D” which returns the base array address for OOB indices. IMO this is the second-best option because rather than returning 0 for OOB accesses, it just returns the first element. Totally safe, and not unintuitive.