Rendering Terrain Part 7 – Fixing an annoying Bug

Back in Part 1, I briefly mentioned a bug I was getting where my video card would reset itself every time I exited the program while full screen.
There’s plenty of information for when this happens to you with commercial software. I’ll just provide one link because this only happens when I close my own program, and I’ve never had it otherwise.

In Part 1, I had theorized that this bug was caused because we were in full screen mode and we were destroying the Graphics object that managed the display driver before we had switched back to windowed mode. I thought this was the case because, at the time, I wasn’t getting the error when shutting down from windowed mode. It’s possible that this is partially true, but it certainly wasn’t the whole story.
Around the time I started adding in support for 3D rendering, the bug reared its head again, this time when shutting down from windowed mode. Now it was happening every single time I shut the program down, no matter what.
As I mentioned in Part 1, the very first time the bug cropped up, I had fixed it by reorganizing the order in which I was releasing resources in the Graphics object’s destructor. I think I may have released my device before anything else. I forget exactly what, but it was something obvious like that when I looked at it.
So, with that in mind, I spent a fair bit of time swapping commands around in my Graphics destructor and in my Terrain destructor. Nothing seemed to make a difference.
I started commenting chunks of code out and eventually determined that the bug would go away as soon as I removed the Terrain, so I focused on that.
Thinking maybe it was the shaders, I cut the shaders down to the simplest they could be. No help.
I disabled everything related to initializing and drawing in 3D. 2D seemed to work fine. Hell, it wasn’t even giving the bug in full screen mode now! Don’t ask me why not.
I readded the initialization code but not the draw code. No error message. Ok, so now I knew it was something to do with my draw code for rendering the terrain in 3D. Let’s look at that:

void Terrain::Draw3D(ID3D12GraphicsCommandList* cmdList, XMFLOAT4X4 vp, XMFLOAT4 eye) {
	cmdList->SetPipelineState(mpPSO3D);
	cmdList->SetGraphicsRootSignature(mpRootSig3D);

	mCBData.viewproj = vp;
	mCBData.eye = eye;
	mCBData.height = mHeight;
	mCBData.width = mWidth;
	memcpy(mpCBVDataBegin, &mCBData, sizeof(mCBData));
	
	ID3D12DescriptorHeap* heaps[] = { mpSRVHeap };
	cmdList->SetDescriptorHeaps(_countof(heaps), heaps);
	cmdList->SetGraphicsRootDescriptorTable(0, mpSRVHeap->GetGPUDescriptorHandleForHeapStart());
	CD3DX12_GPU_DESCRIPTOR_HANDLE cbvHandle(mpSRVHeap->GetGPUDescriptorHandleForHeapStart(), 1, mSRVDescSize);
	cmdList->SetGraphicsRootDescriptorTable(1, cbvHandle);

	cmdList->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP); // describe how to read the vertex buffer.
	cmdList->IASetVertexBuffers(0, 1, &mVBV);
	cmdList->IASetIndexBuffer(&mIBV);
	
	cmdList->DrawIndexedInstanced(mIndexCount, 1, 0, 0, 0);
}

This chunk of code is all I needed to comment out to stop the error from popping up. Without that, all that gets rendered is a blank screen. But no errors. Uncomment the call to that function, and the display driver will crash on exit.
I know the program renders fine while it is active, without crashing. That makes me pretty certain that the memory isn’t somehow corrupted or otherwise mishandled. But I also know that all of the memory accessed here is technically initialized and then released on shutdown, whether this function is commented out or not. So it must have something to do with accessing that data, mustn’t it?
I couldn’t think of anything else to try, so I walked away from it and worked on something else for a while. When I came back to it, I had intended to devote an entire day to the bug, if necessary, but I wound up fixing it in about five minutes. Derp.

After sleeping on the problem, I realized I already had all of the pieces to the puzzle. I knew it had to be something related to memory. I knew initialization was fine. I knew rendering was fine. I knew shutdown and release of the resources was fine. And, most importantly, I knew what rendering a frame looked like.
When you render a frame, you need to first check that the GPU has actually finished rendering the last frame. When you are double, triple, or n-buffering, you need to make sure the GPU is finished the last frame that was in the current buffer you want to work on. In my case, the code that does this looks like so:

void Graphics::NextFrame() {
	// Add Signal command to set fence to the fence value that indicates the GPU is done with that buffer. maFenceValues[i] contains the frame count for that buffer.
	if (FAILED(mpCmdQ->Signal(maFences[mBufferIndex], maFenceValues[mBufferIndex]))) {
		throw GFX_Exception("CommandQueue Signal Fence failed on Render.");
	}

	// set the buffer index to point to the current back buffer.
	mBufferIndex = mpSwapChain->GetCurrentBackBufferIndex();

	// if the current value returned by the fence is less than the current fence value for this buffer, then we know the GPU is not done with the buffer, so wait.
	if (maFences[mBufferIndex]->GetCompletedValue() < maFenceValues[mBufferIndex]) {
		if (FAILED(maFences[mBufferIndex]->SetEventOnCompletion(maFenceValues[mBufferIndex], mFenceEvent))) {
			throw GFX_Exception("Failed to SetEventOnCompletion for fence in NextFrame.");
		}

		WaitForSingleObject(mFenceEvent, INFINITE);
	}

	++maFenceValues[mBufferIndex];
}

So what happens when I shut the program down? I immediately begin destroying the objects in the scene, then the scene, then the graphics object, then the window, releasing resources as I go. I hadn’t thought to check whether or not the GPU was actually done with those resources before I released them. Being that we’re triple buffering in this project, there’s a reasonably good chance that when I hit close, there is still a frame or two waiting to be rendered.
So I added the following method and call it from my Scene object’s destructor first thing, before anything else get’s destroyed. It just loops through all of the back buffers and confirms that the GPU is done rendering all of the frames. Then, shutdown can continue as normal. Boom. Problem solved.

// Function to be called before shutting down. Ensures GPU is done rendering all frames so we can release graphics resources.
void Graphics::ClearAllFrames() {
	for (int i = 0; i < FRAME_BUFFER_COUNT; ++i) {
		// Add Signal command to set fence to the fence value that indicates the GPU is done with that buffer. maFenceValues[i] contains the frame count for that buffer.
		if (FAILED(mpCmdQ->Signal(maFences[i], maFenceValues[i]))) {
			throw GFX_Exception("CommandQueue Signal Fence failed on Render.");
		}

		// if the current value returned by the fence is less than the current fence value for this buffer, then we know the GPU is not done with the buffer, so wait.
		if (maFences[i]->GetCompletedValue() < maFenceValues[i]) {
			if (FAILED(maFences[i]->SetEventOnCompletion(maFenceValues[i], mFenceEvent))) {
				throw GFX_Exception("Failed to SetEventOnCompletion for fence in NextFrame.");
			}

			WaitForSingleObject(mFenceEvent, INFINITE);
		}
	}
}

My biggest concern after fixing this bug, was why didn’t any of the example code I looked at seem to do this? I looked at the example code from Microsoft and multiple tutorials. I tried searching on Google without finding anything that indicated other people had run into this. I don’t remember it ever being an issue before, although I only did a couple of projects in DirectX 10, and none in 11.
And then I realized every piece of example code I had looked at actually DOES check the buffers on shutdown. The Braynzar Soft tutorial even directly mentions it, saying:

Before we release anything, we want to make sure the GPU has finished with everything before we start releasing things.

My Bad. Read The Fucking Manual, kids.

For the latest version of the project, go to GitHub. Please be aware that the posts are going up on a delay as I am only posting twice a week. I’m coding pretty much every day, so the posts are behind the code. I may have updated the code since I wrote this post.

Traagen