The inspiration for this project came from reading an online book published by Nvidia that indicated that is possible to improve graphics performance in skinned models by loading the bone data into textures and storing it directly on the gpu, rather than uploading them on every frame. Though the article was specifically directed at DirectX, I figured something similar was likely also possible in OpenGL. The material I referenced can be found here
There was remarkably little helpful documentation or confirmation that this was possible as I researched this, but ultimately I was able to pull up enough separate sources that I could piece it all together. The end result was deceptively simple and required little code, so ultimately the difficulty rested in being able to read all of the documentation and code fragments and put them together in a manner that would work with our code base.
Following the discussion of my results is a tutorial, so that anyone who wishes to implement this as well may do so with no additional overhead.
Benefits of Using Texture Buffers Instead of Uniforms
One primary benefit is general performance gain. Though performance can vary based on many different factors and systems, this removes a bottleneck that can be caused by uniforms. While uniforms are fairly efficient if the data they are sending rarely changes, they become slower as the data becomes more varied frame by frame. As an example... on one system, both systems could handle roughly twenty-five Chebs all on the same frame at the same time. However, when each Cheb had a random starting frame so no two consecutive Chebs were likely to be on the same frame, the Chebs using uniforms fell to roughly twenty Chebs before they visibly began to lag, while the texture buffer versions were unaffected. This gap increased as performance increased, as on superior machines I managed one-hundred Chebs with lag using uniforms with varied frames, but the texture buffer versions were not visibly lagging at one hundred and twenty one Chebs.
A secondary benefit is that it frees space in the Uniform space. Though not typically a problem in small projects, Uniform space is extremely finite, and it is fairly simple to run out of it if it is being used recklessly to send large pieces of data such as bones or light positions. Texture buffers do not have this limitation, and can upload as much data as the graphics card has memory to store it, and is typically constrained by the amount of VRAM that the card has.
A third benefit is related two the second. Since data is no longer tied to the Uniform, it is possible to send very large datasets, such as extremely large numbers of bones. This can allow a user to send as many bones as the graphics card has memory to store it, while Uniforms will require that the number of bones is kept relatively low. Additionally, all data sent does not need to have a size defined in the shader, allowing the user to potentially change it programatically rather than hard code it into the shader.
Explanation of Performance Gains.
The differences in performance are largely determined in how OpenGL and the graphics card handle uniforms and texture buffers. Due to their direct link to the code executing on the CPU and their variable nature, it is necessary to resend uniform data on every frame. There are some hidden optimizations that reduce this penalty when the data being uploaded has not changed, but it is unable to perform this optimization if the data changes frame by frame. By forcing the Chebs to be on different frames, the loss of that optimization has significant negative effect on performance. Texture buffers instead use a "send once" method, uploading all of the necessary data once in a single buffer bound to a texture unit on the graphics card. This operation would typically be used in an Init function, rather than being used in every draw call. In the place of all the data that would normally be sent using uniforms, only the texture unit number and an index to perform a lookup on are necessary. Since the buffer is being stored as if it were a texture in VRAM, all lookups are incredibly fast because the data is now local.
GL_TEXTURE_BUFFER Tutorial
This tutorial is a generalized version of what I used to store and retrieve the bone data of the Cheb models. The idea of this more generalized tutorial is to make it easy for someone to follow what I learned without having to try and work backwords from my implementation or to have to waste time scouring the internet for clues on how to actually set all of this up. This set up is capable of sending far more than just bone matrices, and could be used to store lights, materials, actual texture data, or any other useful data not efficiently sent using a Uniform.
Every individual texture buffer will require two GLuints shown below. These need to be accessible, as they are necessary to bind the buffer and texture.
GLuint tex;
GLuint buffer;
Then, prepare the memory that you are going to use for the buffer. This data can be be defined globally or only in an Init function.
bufferData = (float*) malloc( sizeof( float ) * <numElements>);
Next, load all the data you require into the memory sequentially. This can be done easily for a matrix in the method shown below. This converts a matrix into a Column Major array, which can be copied position by position. A careful Memcpy would also be appropriate here.
int position = 0;
Matrix4f resultEigen = <targetMatrix> ;
float *result = resultEigen.data();
for(int i = 0; i < 4; i++){
    bufferData[position++] = result[i * 4 + 0];
    bufferData[position++] = result[i * 4 + 1];
    bufferData[position++] = result[i * 4 + 2];
    bufferData[position++] = result[i * 4 + 3];
}
The next step is to initialize the TextureBuffer, as it is not possible to directly bind the buffer that was previously created to the texture unit. These following parts are very sensitive and must be done in exactly this order. Mistakes here may compile, but will result in an empty buffer in the shader. Use the GLuint buffer that was defined previously.
glGenBuffers(1, &buffer);
glBindBuffer(GL_TEXTURE_BUFFER, buffer);
glBufferData(GL_TEXTURE_BUFFER, (sizeof( float ) * <numElements>), boneTex, GL_STATIC_DRAW);
Next, it is necessary to bind the buffer to the texture units and set the active texture unit. This will use the GLuint tex that was defined previously. The number that follows GL_TEXTURE defines what uniform number you should send to access that texture unit (GL_TEXTURE0 uses a 0 in the uniform). GL_RGBA32F is an internal type used by OpenGL, and determines how the data should be returned. In this case, GL_RGBA32F means that texture unit calls will return a float Vec4 with components R, G, B, and A. Subcomponents could be accessed by using (.r .g .b .a). Alternately, there are many other datatypes such as GL_R32F that only retrieve a single float, or types that can return integers, shorts, etc. If using RGBA, be aware that each index will move through four floats, and this will be reiterated later. The full list of types can be found
here.
glGenTextures(1, &tex);
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_BUFFER, tex);
glTexBuffer(GL_TEXTURE_BUFFER, GL_RGBA32F, buffer);
In the draw/render step, there is still some uniform data that must be sent, but this is far reduced from what was originally being sent. It is necessary to send both the index of the data in the buffer, and which Active Texture Unit to use. Where and how these uniforms are defined is up to the user, but they must be used on every draw call.
glUniform1i(prog->getUniform("index"), index);
glUniform1i(prog->getUniform("texNum"), 0);
The next components must be performed on the Vertex or Fragment Shaders, depending on what you require the data for. The most important step is making sure that the GLSL version is #version 150. If it is not, the function needed to access texture data does not yet exist in GLSL. If previously using #version 120, attributes must be called "in" and varying called "out", but otherwise the versions are the same in this implementation.
In the shader, we will define our two uniforms as defined below. Do not worry about the integer texNum being cast to a samplerBuffer, as this is handled automatically.
uniform int index;
uniform samplerBuffer texNum;
The final step is to access the texel unit using a function called texelFetch(<textureNumber>, <index>). Depending on how texelFetch is being used, it is likely it will be extremely beneficial to make a helper function to handle data restoration in the shader. In this instance, I created a helper function that can rebuild a mat4 from its component vec4s.
Important! If using RGBA format as defined previously, each index moves by four floats, not one. Also, this implementation assumes that the matrix stored in the buffer is Column Major, and if it is not this setup will not work. The code below takes an index and pulls out the four vec4 components of the Matrix4f stored in the buffer and places it back into a mat4. Additionally, note that the index passed was multiplied by four. This is to account for the four vec4s that make up each mat4 that is to be retrieved, and assures that the index is moved far enough.
mat4 getMat4(int ind){
    mat4 tempMat;
    vec4 tempVec;
    for(int i = 0; i < 4; i++){
       tempVec = texelFetch(texNum, ind*4 + i);
       tempMat[i] = tempVec;
    }
    return tempMat;
}
Common Errors
Unfortunately, almost any mistake in this chain that is not an obvious syntax error will almost certainly compile AND execute without error or warning, which can make debugging difficult. The most common indicator of an error is many or all texelFetch calls will return zero or a vec4 entirely full of zeros. These can be tested by either using sample indices known to be filled properly and passing it as the color to the fragment shader and seeing if the model turns black, or by trying to use them and having all or most points stick to <0, 0, 0>.
If the above issue exists and the GL calls defined above are in the correct order and the data is confirmed to be in the malloced memory intact, there are a couple points of interest where this can issue can arise. Be sure to check:
1) The size of the buffer being passed in glBufferData(GL_TEXTURE_BUFFER, (sizeof( float ) * <numElements>), boneTex, GL_STATIC_DRAW); as there is no crash if this is too small, but a texelFetch beyond its bounds will return zeros.
2) The glActiveTexture(GL_TEXTURE0); texture number matches the number being passed to the Vertex Shader. A mismatch can read from the wrong buffer or return all zeros if nothing exists there.
3) The index that is passed is not scaled properly for the type of data that is being read in each call to texelFetch. If you use a texelFetch type that reads four values, it is necessary to scale the index by four to compensate, or the data will not line up properly. Additionally, if the index goes beyond the bounds of the buffer, it will return zeros.
These errors are more common as the complexity of the buffer increases. For example, if the binding pose of the bones and their frame poses are all stored in the same texture, it becomes easier to misalign the indices, resulting in interesting effects to the model, but will not generate an error.
Hopefully this is helpful for getting started using Texture Buffers instead of Uniforms for using data in GLSL in OpenGL!