Disclaimer:
The explanations here are very math heavy and assume some linear algebra knowledge, but aren’t always required to use their respective concepts and are usually skippable.
TBN/Tangent-Bitangent-Normal matrix
(Bitangent is also known as “binormal”, although that is a misnomer)
// Creates a TBN matrix from a normal and a tangent
mat3 tbnNormalTangent(vec3 normal, vec3 tangent) {
// For DirectX normal mapping you want to switch the order of these
vec3 bitangent = cross(normal, tangent);
return mat3(tangent, bitangent, normal);
}
// Creates a TBN matrix from just a normal
// The tangent version is needed for normal mapping because
// of face rotation
mat3 tbnNormal(vec3 normal) {
// This could be
// normalize(vec3(normal.y - normal.z, -normal.x, normal.x))
vec3 tangent = normalize(cross(normal, vec3(0, 1, 1)));
return tbnNormalTangent(normal, tangent);
}
Explanation
The TBN matrix transforms positions into the space where the normal vector points backwards, the tangent vector points to the right and the bitangent vector points upwards.
For this explanation, it’s important to know that the normal vector points in the +Z direction (in OpenGL this is out of the screen), the tangent vector points in the +X direction and the bitangent vector points in the +Y direction.
Since OpenGL matrices are column-major, the matrix above comes out as
Where is the tangent, is the bitangent and is the normal. If we multiply the standard basis vectors by the TBN matrix we get the following results:
Every vector can be written as
(You commonly see this as )
And since matrix multiplication is distributive, multiplying the (x, y, z) vector by the TBN matrix will result in
Since the tangent, bitangent and normal vectors are the orthonormal basis of their respective space, this transforms the (x, y, z) vector into it.
Inverse of a rotation matrix
The inverse
function is very slow, so it’s best to avoid it (there’s a reason Optifine gives you inverse matrices for almost everything). Sadly it’s not possible to avoid it for situations when matrices are created in the shader program. Luckily for rotation matrices inverting is not necessary. Instead you can use the following identity:
Furthermore, since in GLSL when a vector is multiplied by a matrix from the left (a.k.a. matrix * vector) the vector is treated as a column matrix and when multiplied from the right (a.k.a. vector * matrix) the vector is treated as a row vector, we can add a further identity:
Explanation
The explanation will be based on 2x2 rotation matrices. Extending the explanation to 3x3 is left as an exercise to the reader.
Since rotation matrices transform from one space to another, the columns of a rotation matrix must form an orthonormal basis. The important part from this is that if we take the dot product of any two vectors from the set, we either get 1 if the two vectors are the same or 0 if they aren’t (since they are orthogonal). Let’s mark these orthonormal vectors in our rotation matrix as and :
Keep in mind: and
We’ll multiply this matrix by its transpose (marked here, and everywhere else as ):
By definition matrix multiplication works as if we took the rows of the left matrix and the columns of the right matrix as vectors and took their dot products. In other words the resulting value at position is the dot product of row from the left matrix and column of the right matrix. Knowing this we can write the result of our matrix multiplication the following way:
And since by definition
We can say that
Calculating normal vectors from various stuff
Heightmap means that for any coordinates you define a corresponding value. This means that you can’t have overhangs (e.g. POM uses a heightmap). I’ll also assume (for simplicity’s sake) that the area is flat, so this is perfect for water normals.
2D->3D mapping means for any value you define an position. and can be anything, including horizontal positions, but also stuff like latitude and longitude on a sphere. This can have overhangs and supports non-flat geometry.
Numerical solutions
Numerical solutions in this case means these are approximations, but they work quite reliably (and most importantly: always). The functions take in an extra stepSize parameter, when the functions read from a texture, this should be the texel size, otherwise make it something small, like .
For heightmaps I’ll assume you have a height function that takes in a vec2 and returns the corresponding height. For 2D->3D mappings I assume you have a map function that takes in a vec2 and returns the corresponding vec3 position (not offset!).
From heightmaps
vec3 normalFromHeight(vec2 pos, float stepSize) {
vec2 e = vec2(stepSize, 0);
vec3 px1 = vec3(pos.x - e.x, height(pos - e.xy), pos.y - e.y);
vec3 px2 = vec3(pos.x + e.x, height(pos + e.xy), pos.y + e.y);
vec3 py1 = vec3(pos.x - e.y, height(pos - e.yx), pos.y - e.x);
vec3 py2 = vec3(pos.x + e.y, height(pos + e.yx), pos.y + e.x);
return normalize(cross(px2 - px1, py2 - py1));
}
From 2D->3D mappings
vec3 normalFromMapping(vec2 pos, float stepSize) {
vec2 e = vec2(stepSize, 0);
return normalize(cross(
map(pos + e.xy) - map(pos - e.xy),
map(pos + e.yx) - map(pos - e.yx)
));
}
Analytical solutions
Analytical solutions rely on analytical partial derivatives. For this first we need to make our separate definitions for the heightmaps and 2D->3D mappings the same. We can convert any heightmap into a 2D->3D mapping by doing.
vec3 heightmapToMapping(vec2 pos) {
return vec3(pos.x, height(pos), pos.y);
}
To get the normal at any point we need to get the partial derivatives of this function in respect to both of the inputs.
When we take the partial derivative of a function in respect of any of it’s parameters, we are essentially asking “How much do the values of the output change if I change one of the input parameters by a very small amount?”. How this is done exactly is beyond the scope of this document, but WolframAlpha is great at doing this. For instance if we want to calculate the normals for the surface of this shape :
We can just ask WolframAlpha to derivate it for us by saying partial derivative <mapping>. In our case this would be partial derivative f(x,z)=(x, sin(x) + sin(z), z). We get
Reading this out is easier than it seems, we only need the part after the last “=”. One extra thing we need to know is that and are 0, since these are how much changes based on and how much changes based on respectively. So our two derivatives will be and . To get the normal vector from this we can take their cross product in the order and normalize it. Proof here, moving point A around will move the point A’ around, which is just A projected on the surface and the n vector shows the calculated normal vector. Additionally in mathematics xy is the horizontal plane and z is the vertical axis, so the values there will be in a strange order.
If you do this, you should also pre-compute the cross product for the two vectors to make it faster. I would implement this as
vec3 getNormal(vec2 pos) {
return normalize(vec3(cos(pos.x), 1, cos(pos.z)));
}
This is much faster than the numeric approach, since for that we would’ve had to evaluate a sine function 8 times (2 times for each sampling position), but in this case we only need 2 cosines.
Explanation
The partial derivatives are essentially telling us what direction the surface is pointing in at the current position in both axes, therefore they describe the tangent plane of the surface at that position and the two partial derivatives will be the basis vectors of it. The cross product by definition creates a vector which is perpendicular to both of the input vectors, so it is also perpendicular to the tangent plane.
The numerical solutions are just approximations of the partial derivatives.
View position from depth
// projInv is the inverse of the projection matrix
vec3 depthToView(vec2 texCoord, float depth, mat4 projInv) {
vec4 ndc = vec4(texCoord, depth, 1) * 2 - 1;
vec4 viewPos = projInv * ndc;
return viewPos.xyz / viewPos.w;
}
You can plug in 1 for depth and normalize the result if all you need is a ray going from the camera towards the fragment positioned at .
Explanation
To draw something that’s positioned in view space, we need to transform it into normalized device coordinates. This is done in 2 steps. First we transform our vertices into clip space using a projection matrix. This is where, as the name implies, clipping happens (cutting off geometry outside of the visible area). The next step is done by OpenGL, since clip space is a homogeneous coordinate system we can divide by w to apply the perspective projection and to put everything into normalized device coordinates, which then get rendered.
So in short:
The coordinate from gets put into the depth buffer. To reverse this we just need to do the operations backwards. Sadly we don’t have the original value of , but we can use the fact that the coordinate of is always 1. If we multiply by the inverse of the projection matrix, we get
We can extract the division:
is just , so we can substitute that. Our current formula is
And since we know that the component of viewpos is 1, we know that the w component of the resulting vector is , therefore if we divide by the component, it’s exactly like multiplying by .
Linearizing depth
This takes the depth value you read from a depth buffer and transforms it into a more usable, linear range
// depth is the value you read from the depth buffer
// near is the near plane distance
// far is the far plane distance
// Ideally the last two should come from the projection matrix
float linearizeDepth(float depth, float near, float far) {
// Convert depth back to NDC depth
depth = depth * 2 - 1;
return 2 * far * near / (far + near - depth * (far - near));
}
// Same algorithm, but faster, thanks to Kneemund/Niemand
float linearizeDepthFast(float depth, float near, float far) {
return (near * far) / (depth * (near - far) + far);
}
Explanation
The usual perspective projection matrix looks like this:
is the aspect ratio of the screen (width in pixels divided by height in pixels)
is the vertical field of view
is the far clipping plane, the distance where things stop rendering
is the near clipping plane, the distance where things start rendering at. Should be strictly larger than 0.
The purpose of the matrix is to take the position of the vertices in view space and convert them to clipping space. As noted in the last “chapter”, the full process is
Knowing this we can plug in an arbitrary vector and see what the result is:
After this the division by happens and we get the following vector:
From this we only care about the z coordinate for now, since that’s what gets written to the depth buffer (OpenGL compresses it to the range too, but we won’t care about this now, since the first line of the linearization is essentially just reversing this).
is the non-linear depth and we want to get z back from it. We can do this by using some basic algebra:
One additional difference is that the equation in the code seems to be negated. The reason for this is that in OpenGL the z axis points out of the screen, so anything in front of the player would have a negative z coordinate. This would just be an extra thing for people to keep in mind, so the usual linearization function also removes this.
Constructing perspective projection matrices
This is only really useful if you need to either modify the FOV of the already existing projection matrix or you want to generate them for realtime cubemap rendering.
// fov is the vertical field of view angle of the camera in radians
// aspect is the width of the resulting image divided by height
// near is the position of the near clipping plane, must be
// strictly larger than 0
// far is the position of the far clipping plane
mat4 perspectiveProjection(float fov, float aspect, float near,
float far) {
float inverseTanFovHalf = 1.0 / tan(fov/ 2);
return mat4(
inverseTanFovHalf / aspect, 0, 0, 0,
0, inverseTanFovHalf, 0, 0,
0, 0, -(far + near) / (far - near), -1,
0, 0, -2 * far * near / (far - near), 0
);
}
Additional disclaimers for the parameters: shouldn’t be too small and shouldn’t be too large, try to keep moderation when setting those values, something around for near and to for far should be enough ( blocks is about chunks). The reason for this is the nonlinear nature of the depth buffer, if I create a projection matrix with a near of and a far of , the resulting depth of the points and will be around and respectively. The difference seems small, but a good rule of thumb is that floats can store values up to around 8 significant decimal places and the difference in our case is at the 5th, so we are fine. However, if we want to avoid the near clipping as much as we can and set near to a not so unreasonable looking 0.0001, the difference becomes too small to make it reasonably precise and we get Z fighting (WolframAlpha actually gave up on me and just spit out 1 for the depths).
Additionally can’t be greater or equal to degrees (in radians), because and so we’d get a division by 0 error, with values larger than 180 degrees we’d get a division by 0 error, with values larger than 180 degrees we’d just flip the screen.
Explanation
To understand perspective matrices, we first need to know how you’d calculate perspective projection mathematically. Let’s first create an example:
We have a camera looking at our scene. It has a field of view (FOV), and a near plane, which in this case is just our screen. The size of the screen will by definition be 1 unit and it will be 1 unit away from out camera (this means our camera has a FOV of 53.13° by the way). We have 2 lines in our scene, the height of each is also 1 unit. The red line is 2 units from the screen and the blue line is at a distance of 3 units.
We want to know what the size of the lines will be when we project them onto the screen. Let’s visualize the red line first:
I marked the points with names to make it easier to refer to them. We have 2 triangles, and . Since these triangles share a point (), an angle (angle at ) and one of their faces is parallel ( with ), they are similar. If two triangles are similar, the ratio of the length of their matching sides are the same (). We can cut the triangles in half and get the same results:
Now we can say that
Which in english means that the height of the projected line on the screen is the height of the original line divided by the distance of the line to the camera. In this case distance to the camera is 3 units (2 units from the screen + 1 unit to the camera), so the height of the projected line is . For the blue line we can do the same calculation and get that the projected height is .
So all we need to do to achieve perspective projection is divide by z. Doing this inside a shader is not recommended because it destroys texture mapping very fast, since we don’t give OpenGL any actual information about Z to do perspective correction (it would look like a PS1 game, since that console had this problem on a hardware level). To address this, OpenGL let’s us define the position of a vertex in a vec4 and it will divide by after we give it the result.
Now we can start constructing our projection matrix. Since we’ll need to add and subtract from our coordinates, we’ll start out with the assumption, that the w coordinate is 1. The first thing we need to do is put the coordinate in the w coordinate and negate it (we need to negate it, because in OpenGL forward is the direction):
Then we need to convert our z coordinate such that when dividing by w it will be in the range of . We’ll use the near and far clipping planes. When equals the near clipping plane, will also be the near clipping plane and we need to get -1 and similarly, when z equals the far clipping plane, w will be the far clipping plane and the result has to be 1. This means that we need to convert the current range to . To do this we first negate the z coordinate, then we subtract and divide by . This brings z into the range. Next we can multiply by and subtract near to bring it into . So the equation becomes
We can’t use this in a matrix in this state, so with algebra we’ll separate the multiplications and division from the additions and subtractions:
Now we can put this into our matrix:
The next step we should take is to correct the aspect ratio, otherwise this matrix can only work with square-shaped cameras. We have two options for this: Either we can multiply the y coordinate by the aspect ratio or divide x. Usually the latter is preferred, because the former would cut a bit off of the screen, where the latter only adds to it, so I’ll do the same here:
Now technically this is already a usable projection matrix. We can define the near and far clipping planes and it does the perspective transformation too. One downside to it however is that we can’t tell it to use a specific field of view, it will always be limited to 90 degrees vertically. Since we are projecting everything onto a screen 1 unit away from the camera (this can be seen from the fact that at that distance the sizes of objects don’t change), we need to take into account how large that screen is. We can do a little trigonometry to find this size:
From this we can see that . One additional thing is that OpenGL wants the resulting image to be in the range, which has a width and height of 2, so we won’t be multiplying this by 2 (otherwise we’d divide the and coordinate by 2 later). We can now scale our coordinates and get our final projection matrix. We need to divide the x and y coordinates by the screen size, because we want to take them from the and ranges to :
When we recreate this in GLSL, we also need to keep in mind that OpenGL uses column-major matrices, so the order of the elements will start from the top-left element, go down in the column, then the second column, then third and fourth.
Rotating with quaternions
// a is the left side of the multiplication
// b is the right side of the multiplication
vec4 quaternionMultiply(vec4 a, vec4 b) {
return vec4(
a.x * b.w + a.y * b.z - a.z * b.y + a.w * b.x,
-a.x * b.z + a.y * b.w + a.z * b.x + a.w * b.y,
a.x * b.y - a.y * b.x + a.z * b.w + a.w * b.z,
-a.x * b.x - a.y * b.y - a.z * b.z + a.w * b.w
);
}
// pos is the position you want to rotate
// axis is the unit length axis you want to rotate it around
// angle is the angle you want to rotate the object by
vec3 quaternionRotate(vec3 pos, vec3 axis, float angle) {
vec4 q = vec4(sin(angle / 2.0) * axis, cos(angle / 2.0));
vec4 qInv = vec4(-q.xyz, q.w);
return quaternionMultiply(quaternionMultiply(q, vec4(pos, 0)), qInv).xyz;
}
// Fast versions, but less intuitive
vec4 fastQuaternionMultiply(vec4 a, vec4 b) {
return vec4(
a.w * b.xyz + b.w * a.xyz + cross(a.xyz, b.xyz),
a.w * b.w - dot(a.xyz, b.xyz)
);
}
vec3 fastQuaternionRotate(vec3 pos, vec3 axis, float angle) {
vec4 q = vec4(sin(angle / 2.0) * axis, cos(angle / 2.0));
vec4 partial = fastQuaternionMultiply(q, vec4(pos, 0));
// Skip calculating the real part, since it's always 0
return -partial.w * q.xyz + q.w * partial.xyz + cross(q.xyz, partial.xyz);
}
Explanation
Quaternions are essentially an extension to complex numbers, but instead of using one imaginary component, they use three: , and with the following fundamental formula:
In graphics programming we use them to describe 3D rotation in a way, where we don't run into common issues, such as gimbal lock. We can do this by taking our axis of rotation and our angle and constructing the following quaternion from them:
Then inverting it by negating the imaginary components (this only works, because is already unit length):
And applying them using quaternion multiplication in the following way:
Where is the position we want to rotate with the x, y, and z components being multiplied by , and respectively and is the rotated position in a similar manner.
Since quaternions are far too complex for the scope of this page, instead of providing an explanation built up from scratch, I will use a formal proof to verify these claims. To do this all we need to check is that when rotating a given position around a given axis by a certain angle, the resulting position...
- ... has the same angle with the axis of rotation as the original position
- ... forms the given angle with the original position when projected onto the plane perpendicular to the axis
To answer either of these questions, we need to create a generalized formula for . For this we will use the scalar+vector representation of a quaternion where . Quaternion multiplication in this version can be written as:
With this we can expand the original formula:
From this we can see that our new, rotated position is . Now we can look at the first requirement. Instead of finding the angle, we can just check the dot product between the rotated vector and the rotation axis and it should be equal to the dot product between the original vector and the rotation axis. For this we need 2 important identities, and :
Therefore the first requirement is fulfilled. For the second requirement we will project the two points onto the plane perpendicular to the rotation axis. This can be done by creating tangent and bitangent vectors using the vector pointing to and the rotation axis (if they are parallel, then the rotation will leave it in the same spot, so we can ignore this option):
Now we can project the two points onto the plane defined by the two vectors:
For :
For (The actual algebra was left out as a space and time saving measure):
The second row from this is very hard to convert. Ideally we would like to force the identity into it, but the multipliers on the squared functions at first seem to be different. They are actually the same, the reason for it is the following:
is the angle between and and is the angles between and . The latter is actually 90°, so I will omit it from this point forward. Similarly the length of is 1, so it will receive the same fate.
is the angle between and . Finally, we have arrived at our final equation in this set:
These two are equal because of how their angles are set up. Now we can get back to the original problem. We'll extract into for simplicity's sake:
This by definition means that when we project and onto the plane perpendicular to the rotation axis, we get two vectors which are exactly angle apart counter clockwise, which proofs the second requirement and the rotation with quaternions.