Command Streams with Lambda Functions

Feb 10, 2017

5 minutes read

I’ve been working on a rendering backend to make creating quick-and-dirty OpenGL projects easier - basically a simple game engine that I’m comfortable breaking apart and toying with. Right now I’m splitting off my OpenGL backend to run on a separate thread, so that while the engine is simulating frame N+1, the render thread is drawing frame N. Since I don’t want to have a massive sync point between the two threads, and I don’t want to surround everything in mutexes, I’ll need to have two copies of a lot of data: one that is owned by the render thread, and one that is owned by the game thread. When the game thread changes something (say, the position of a mesh), the render thread will need to know about it for the next thread. This means syncing data between two threads. I came up with a fairly nice way to do that that I think is worth sharing. This technique has been somewhat synthesized from two other locations: The Autodesk Stingray blog post on state reflection, and this post by Stefan Reinalter on command buckets in his Molecule engine.

In a nutshell, the problem is as follows: we have some data on the main thread of the program, which needs to get propagated to the render thread. This data probably has some associated action that needs to be taken: maybe it’s a buffer that needs to be copied to the GPU, or maybe its a filepath that I’m telling the GPU thread to load a shader from. I use a command queue, which essentially is a chunk of memory that the main thread can push commands on to, along with their associated data, and then the render thread can read from. A command is a POD type that looks like this:

struct Cmd
{
	using DispatchFn = void(*)(Cmd*);
	DispatchFn dispatch;
	size_t command_size;
}

Each command has a pointer to a function that can be executed, and a command_size field which is used by the queue.

Commands that inherit from this struct get pushed onto a CommandStream object, that has an API like this:

class CommandStream
{
public:

	//Run every command in the queue
	void ExecuteAll();

	//Add a new command to the queue
	template<typename T>
	bool Push(const T& t);
}

In a bit, I’ll go through the details of these functions, and how the CommandStream stores the commands, but first, let’s see how easy it is to use. One of my main goals was to make the code easy to read and follow - the functions that the main thread calls to interact with the render thread aren’t executing until later, which if you’re not careful can end up making code that seems related end up all over the place. This is where C++11’s lambda functions come in to play! For example, say we have a function that wants to tell the renderer to update the position of some mesh in the game world. That might look something like

void UpdateMeshTransform(MeshHandle mesh, const Transform& transform)
{
	struct CmdType : Cmd
	{
		MeshHandle m;
		Transform t;
	} cmd;
	cmd.m = mesh;
	cmd.t = transform;

	cmd.dispatch = [](Cmd* cmd) {
		auto data = reinterpret_cast<CmdType*>(cmd);
		//Copy transform info to the appropriate mesh.
		render_meshes[data->m] = data->t;
	};

	render_commands.Push(cmd);
}

So here we create a type that inherits from the Cmd struct, which let’s us put in arbitrary data. Then we use a captureless lambda to define a function that will be run by the render thread when it calls render_commands.ExecuteAll(). It is important that the lambda be captureless - this lets it get converted to a function pointer. A nice feature is that even though the lambda is defined here, unlike lambdas which capture, its lifetime is the same as a functions - that is, forever. This could equally be replaced by some pointer to a function that is defined elsewhere, but what I like about this is that the behaviour is very clear when you read this function.

So how does the CommandStream work? Well, in order to make it safe for us to use, internally it has two buffers: a write_buffer_, and and execute_buffer_. When a new command is added, memory is allocated from a linear allocator - so every command will come directly after the previously added one (see this post for a sample implementation). The push function ends up being dead simple:

template<typename T>
inline bool Push(const T& t)
{
	static_assert(std::is_base_of<Cmd, T>::value, "Adding invalid command");
	
	auto block = write_buffer_->Allocate(sizeof(T));
	
	if (block.ptr == nullptr) {
		//Our allocator is out of memory.
		return false;
	}

	memcpy(block.ptr, &t, sizeof(T));
	auto cmd = reinterpret_cast<Cmd*>(block.ptr);
	cmd->command_size = block.length;
	return true;
}

Note that we At the end of a frame, we have a sync point between the main thread and the render thread. At this sync, we swap the two buffers, so that any of the previous frame’s render commands can be executed by the render thread, while the game thread can do whatever it wants. The swapping is simple:

inline void SwapBuffers()
{
	std::swap(execute_buffer_, write_buffer_); // Swap the pointers
	write_buffer_->DeallocateAll();            // Clear all the commands
}

Finally, since all the commands are right after each other in memory, execution is equally straightforward - loop over the buffer, and call the dispatch function for each one.

void ExecuteAll()
{
	auto execute_pos_ = execute_buffer_->Begin();
	while (read_pos_ < execute_buffer_->End()) {
		Cmd* cmd = reinterpret_cast<Cmd*>(read_pos_);
		cmd->dispatch(cmd);
		read_pos_ = read_pos_ + cmd->command_size;
	}
}

And that’s it! A reasonably simple way to create a command stream that can be used to communicate between threads. Next, I want to expand on this to make Push thread-safe, so that multiple worker threads can be making render calls at the same time. Fortunately, this is as simple as making sure that the Allocate function is thread safe, which is rather nice. For now, I’ve gotten my render thread working separately, which I’m quite happy with. Hopefully in future posts I’ll be able to share some of the images that I’ll generate using this codebase!

Update

I’ve realised that this is actually a flawed design - I’m doing two copies for every command. The solution would be to change the Push(T&) function to look like T* Push<T>(), and call it before filling out the command struct. Simple enough, really.

Back to posts