Shader Optimization
This page gives you an overview of how to optimize shaders.
General Speed of Built-in Functions
Shader hardware will do multiplication, addition and MAD in one cycle.
Generally, only exp2
, log2
, inversesqrt
, sin
, cos
and rcp
can be assumed to be implemented in hardware, every other function is made up of those parts. These are called "special functions" and they are slower than arithmetic. They are generally assumed to take 4 cycles
SFU
On Nvidia, special functions are run on a separate lane, so they do not cost anything if mixed in with arithmetic, but they still run at 1/4 rate (1/8 on old fermi cards) so using multiple special functions in a row will still cost the same as on more classical cards.
Implementation of Built-in Functions
a / b == a * rcp(b)
1./a == rcp(a)
sqrt(a) == 1.0 / inversesqrt(a)
pow(a, b) == exp2(log2(a) * b)
exp(a) == exp2(a * constant) // constant == log2(M_E)
normalize(a) == a * inversesqrt(dot(a,a))
mix(a, b, c) == (b-a) * c + a
Vectors and Matrices
Vectors are a collection of multiple scalars, the cost every operation on them is multiplied by the number of components of the vector. This means that vec3 * vec3
is 3x more expensive than float * float
, and vec3 * float
is as expensive as vec3 * vec3
.
Matrix multiplications are not the same as simple vector / scalar multiplications, they are way more expensive:
vec2 * mat2
is 4 cyclesvec3 * mat3
is 9 cyclesvec4 * mat4
is 16 cyclesmat2 * mat2
is 8 cyclesmat3 * mat3
is 27 cyclesmat4 * mat4
is 64 cycles!
Identities
exp(a+b) == exp(a) * exp(b)
pow(pow(a,b),c) == pow(a, b*c)
a / pow(b, c) == a * pow(b,-c)
log(a) + log(b) == log(a*b)
log(a/b) == log(a) - log(b)
log(pow(a,b)) == b * log(a)
log(sqrt(a)) = log(a) * 0.5
cross(a, cross(b, c)) = b * dot(a,c) - c * dot(a,b)