Rendering Terrain Part 30 – Refactor Round Three

Well, I’m a day late, but I’ve finally completed my rewrite of the code for this project. I ran into a couple of bugs that I’ll talk briefly about. They took me a long time to find, but were really simple to fix once I realized what I was doing wrong.
Let’s go through some of the changes.

The big change was moving away from having a single monolithic Graphics class that handled everything. Instead, we now have a Device class which specifically handles interaction with the adapter. Management of descriptor heaps has been moved to a ResourceManager class, which also handles all buffers and loading of files.
The Back Buffers, which were previously contained within the Graphics class, have been moved to the new Frame class. Each Frame object represents everything needed to render the scene each frame: Constant buffers, the Shadow Atlas, Back Buffer, and a Depth/Stencil Buffer.
Because the ResourceManager takes care of all of the resources, the Frame and any objects in the scene just need to hold onto handles to these resources, without actually taking responsibility for clean up.

I’ve also moved loading of resources for the terrain back to the Terrain class; or rather, the Terrain class now contains a TerrainMaterial object and both the Terrain and TerrainMaterial talk to the ResourceManager to get files loaded into buffers.

Moving all of this around has greatly simplified the Terrain and Scene classes. As an example, here’s the old Terrain class declaration:

class Terrain {
public:
	Terrain();
	~Terrain();

	void Draw(ID3D12GraphicsCommandList* cmdList, bool Draw3D = true);

	UINT GetSizeOfVertexBuffer() { return mVertexCount * sizeof(Vertex); }
	UINT GetSizeOfIndexBuffer() { return mIndexCount * sizeof(UINT); }
	UINT GetSizeOfHeightMap() { return mWidth * mDepth * sizeof(float); }
	UINT GetSizeOfDisplacementMap() { return mDispWidth * mDispDepth * 4 * sizeof(float); }
	UINT GetSizeOfDetailMap() { return sizeof(float) * 4 * mDetailHeight * mDetailWidth; }
	UINT GetHeightMapWidth() { return mWidth; }
	UINT GetHeightMapDepth() { return mDepth; }
	UINT GetDisplacementMapWidth() { return mDispWidth; }
	UINT GetDisplacementMapDepth() { return mDispDepth; }
	UINT GetDetailMapWidth() { return mDetailWidth; }
	UINT GetDetailMapHeight() { return mDetailHeight; }
	int GetBaseHeight() { return mBaseHeight; }
	float* GetHeightMapTextureData() { return maImage; }
	float* GetDisplacementMapTextureData() { return maDispImage; }
	float* GetDetailMapTextureData(int index) { return maDetailImages[index]; }
	Vertex* GetVertexArray() { return maVertices; }
	UINT* GetIndexArray() { return maIndices; }
	float GetScale() { return mHeightScale; }
	
	void SetVertexBufferView(D3D12_VERTEX_BUFFER_VIEW vbv) { mVBV = vbv; }
	void SetIndexBufferView(D3D12_INDEX_BUFFER_VIEW ibv) { mIBV = ibv; }
	void SetHeightmapResource(ID3D12Resource* tex) { mpHeightmap = tex; }
	void SetDisplacementMapResource(ID3D12Resource* tex) { mpDisplacementMap = tex; }
	void SetDetailMapResource(ID3D12Resource* tex) { mpDetailMaps = tex; }
	void SetVertexBufferResource(ID3D12Resource* vb) { mpVertexBuffer = vb; }
	void SetIndexBufferResource(ID3D12Resource* ib) { mpIndexBuffer = ib; }

	// Once the GPU has completed uploading buffers to GPU memory, we need to free system memory.
	void DeleteVertexAndIndexArrays();

private:
	// Generates an array of vertices and an array of indices.
	void CreateMesh3D();
	// load the specified file containing the heightmap data.
	void LoadHeightMap(const char* fnHeightMap);
	// load the specified file containing a displacement map used for smaller geometry detail.
	void LoadDisplacementMap(const char* fnMap, const char* fnNMap);
	void LoadDetailDepthMap(int index, const char* fnDiffuse, const char* fnDepth);
	// calculate the minimum and maximum z values for vertices between the provide bounds.
	XMFLOAT2 CalcZBounds(Vertex topLeft, Vertex bottomRight);
	
	ID3D12Resource*				mpHeightmap;
	ID3D12Resource*				mpDisplacementMap;
	ID3D12Resource*				mpDetailMaps;
	ID3D12Resource*				mpVertexBuffer;
	ID3D12Resource*				mpIndexBuffer;
	D3D12_VERTEX_BUFFER_VIEW	mVBV;
	D3D12_INDEX_BUFFER_VIEW		mIBV;
	float*						maImage;
	float*						maDispImage;
	float*						maDetailImages[8];
	unsigned int				mWidth;
	unsigned int				mDepth;
	unsigned int				mDispWidth;
	unsigned int				mDispDepth;
	unsigned int				mDetailWidth;
	unsigned int				mDetailHeight;
	int							mBaseHeight;
	unsigned long				mVertexCount;
	unsigned long				mIndexCount;
	float						mHeightScale;
	Vertex*						maVertices;		// buffer to contain vertex array prior to upload.
	UINT*						maIndices;		// buffer to contain index array prior to upload.
};

And here’s the new:

class Terrain {
public:
	Terrain(ResourceManager* rm, TerrainMaterial* mat, const char* fnHeightmap, const char* fnDisplacementMap);
	~Terrain();

	void Draw(ID3D12GraphicsCommandList* cmdList, bool Draw3D = true);
	// Attach the resources needed for rendering terrain.
	// Requires the indices into the root descriptor table to attach the heightmap and displacement map SRVs and constant buffer CBV to.
	void AttachTerrainResources(ID3D12GraphicsCommandList* cmdList, unsigned int srvDescTableIndexHeightMap,
		unsigned int srvDescTableIndexDisplacementMap, unsigned int cbvDescTableIndex);
	// Attach the material resources. Requires root descriptor table index.
	void AttachMaterialResources(ID3D12GraphicsCommandList* cmdList, unsigned int srvDescTableIndex);

	UINT GetHeightMapWidth() { return m_wHeightMap; }
	UINT GetHeightMapDepth() { return m_hHeightMap; }
	
	// Once the GPU has completed uploading buffers to GPU memory, we need to free system memory.
	void DeleteVertexAndIndexArrays();

private:
	// Generates an array of vertices and an array of indices.
	void CreateMesh3D();
	// Create the vertex buffer view
	void CreateVertexBuffer();
	// Create the index buffer view
	void CreateIndexBuffer();
	// Create the constant buffer for terrain shader constants
	void CreateConstantBuffer();
	// load the specified file containing the heightmap data.
	void LoadHeightMap(const char* fnHeightMap);
	// load the specified file containing a displacement map used for smaller geometry detail.
	void LoadDisplacementMap(const char* fnMap);
	// calculate the minimum and maximum z values for vertices between the provide bounds.
	XMFLOAT2 CalcZBounds(Vertex topLeft, Vertex bottomRight);
	
	TerrainMaterial*			m_pMat;
	ResourceManager*			m_pResMgr;
	D3D12_VERTEX_BUFFER_VIEW	m_viewVertexBuffer;
	D3D12_INDEX_BUFFER_VIEW		m_viewIndexBuffer;
	D3D12_CPU_DESCRIPTOR_HANDLE m_hdlHeightMapSRV_CPU;
	D3D12_GPU_DESCRIPTOR_HANDLE m_hdlHeightMapSRV_GPU;
	D3D12_CPU_DESCRIPTOR_HANDLE m_hdlDisplacementMapSRV_CPU;
	D3D12_GPU_DESCRIPTOR_HANDLE m_hdlDisplacementMapSRV_GPU;
	D3D12_CPU_DESCRIPTOR_HANDLE m_hdlConstantsCBV_CPU;
	D3D12_GPU_DESCRIPTOR_HANDLE m_hdlConstantsCBV_GPU;
	unsigned char*				m_dataHeightMap;
	unsigned char*				m_dataDisplacementMap;
	unsigned int				m_wHeightMap;
	unsigned int				m_hHeightMap;
	unsigned int				m_wDisplacementMap;
	unsigned int				m_hDisplacementMap;
	int							m_hBase;
	unsigned long				m_numVertices;
	unsigned long				m_numIndices;
	float						m_scaleHeightMap;
	Vertex*						m_dataVertices;		// buffer to contain vertex array prior to upload.
	UINT*						m_dataIndices;		// buffer to contain index array prior to upload.
	TerrainShaderConstants*		m_pConstants;
};

There’s certainly still a fair amount going on under the hood, but the public interface has been greatly streamlined. There simply isn’t any need for all of the Get and Set methods anymore.
The new methods for rendering the Scene are also quite a bit shorter and simpler now. DrawShadowMap() has been reduced from this:

void Scene::DrawShadowMap(ID3D12GraphicsCommandList* cmdList) {
	cmdList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(mpShadowMap[mFrame], D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE, D3D12_RESOURCE_STATE_DEPTH_WRITE));

	CD3DX12_CPU_DESCRIPTOR_HANDLE handleDSV(mlDescriptorHeaps[1]->GetCPUDescriptorHandleForHeapStart(), mFrame, msizeofDSVDescHeapIncrement);
	cmdList->ClearDepthStencilView(handleDSV, D3D12_CLEAR_FLAG_DEPTH, 1.0f, 0, 0, nullptr);
	cmdList->OMSetRenderTargets(0, nullptr, false, &handleDSV);

	cmdList->SetPipelineState(mlPSOs[2]);
	cmdList->SetGraphicsRootSignature(mlRootSigs[2]);

	ID3D12DescriptorHeap* heaps[] = { mlDescriptorHeaps[0] };
	cmdList->SetDescriptorHeaps(_countof(heaps), heaps);

	// set the srv buffer.
	CD3DX12_GPU_DESCRIPTOR_HANDLE handleSRV(mlDescriptorHeaps[0]->GetGPUDescriptorHandleForHeapStart(), 0, msizeofCBVSRVDescHeapIncrement);
	cmdList->SetGraphicsRootDescriptorTable(0, handleSRV);

	// set the terrain shader constants
	CD3DX12_GPU_DESCRIPTOR_HANDLE handleCBV(mlDescriptorHeaps[0]->GetGPUDescriptorHandleForHeapStart(), 1, msizeofCBVSRVDescHeapIncrement);
	cmdList->SetGraphicsRootDescriptorTable(1, handleCBV);

	for (int i = 0; i < 4; ++i) {
		cmdList->RSSetViewports(1, &maShadowMapViewports[i]);
		cmdList->RSSetScissorRects(1, &maShadowMapScissorRects[i]);

		// fill in this cascade's shadow constants.
		ShadowMapShaderConstants constants;
		constants.shadowViewProj = DNC.GetShadowViewProjMatrix(i);
		constants.eye = C.GetEyePosition();
		DNC.GetShadowFrustum(i, constants.frustum);
		memcpy(maShadowConstantsMapped[mFrame][i], &constants, sizeof(ShadowMapShaderConstants));
		CD3DX12_GPU_DESCRIPTOR_HANDLE handleShadowCBV(mlDescriptorHeaps[0]->GetGPUDescriptorHandleForHeapStart(), 5 + mFrame * 4 + i, msizeofCBVSRVDescHeapIncrement);

		cmdList->SetGraphicsRootDescriptorTable(2, handleShadowCBV);

		// displacement map
		CD3DX12_GPU_DESCRIPTOR_HANDLE handleSRV2(mlDescriptorHeaps[0]->GetGPUDescriptorHandleForHeapStart(), 21, msizeofCBVSRVDescHeapIncrement);
		cmdList->SetGraphicsRootDescriptorTable(3, handleSRV2);

		// mDrawMode = 0/false for 2D rendering and 1/true for 3D rendering
		T.Draw(cmdList, true);
	}

	cmdList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(mpShadowMap[mFrame], D3D12_RESOURCE_STATE_DEPTH_WRITE, D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE));
}

To this:

void Scene::DrawShadowMap(ID3D12GraphicsCommandList* cmdList) {
	m_pFrames[m_iFrame]->BeginShadowPass(cmdList);

	cmdList->SetPipelineState(m_listPSOs[2]);
	cmdList->SetGraphicsRootSignature(m_listRootSigs[2]);

	ID3D12DescriptorHeap* heaps[] = { m_ResMgr.GetCBVSRVUAVHeap() };
	cmdList->SetDescriptorHeaps(_countof(heaps), heaps);

	// Tell the terrain to attach its resources.
	m_pT->AttachTerrainResources(cmdList, 0, 1, 2);

	for (int i = 0; i < 4; ++i) {
		// fill in this cascade's shadow constants.
		ShadowMapShaderConstants constants;
		constants.shadowViewProj = m_DNC.GetShadowViewProjMatrix(i);
		constants.eye = m_Cam.GetEyePosition();
		m_DNC.GetShadowFrustum(i, constants.frustum);
		m_pFrames[m_iFrame]->SetShadowConstants(constants, i);
		m_pFrames[m_iFrame]->AttachShadowPassResources(i, cmdList, 3);

		// mDrawMode = 0/false for 2D rendering and 1/true for 3D rendering
		m_pT->Draw(cmdList, true);
	}

	m_pFrames[m_iFrame]->EndShadowPass(cmdList);
}

As I mentioned above, not everything went smoothly during this rewrite. I ran into a couple of bugs that I had to work out. Some were obvious things that I just missed, while others involved my misunderstanding the pipeline.

The first bug I ran into was an obvious one that I just didn’t think about. When I moved the file loading to ResourceManager, I made it a generic PNG file loader. That means that all file data I’m loading will be four component unsigned char arrays instead of the float arrays I had previously been converting them to. That’s not a big deal. I can use the UNORM formats and the driver layer will automatically convert to a floating point value between 0 and 1. You can check out how that works here.
What I forgot to deal with was the fact that I had been reducing my height map to a one component float array. I didn’t have a way to do that now that I was happy with, so I had to switch it to the new format as well. This wasn’t a big deal once I realized I had missed it, but it led to some messed up results. I was passing in the four component data but telling the GPU that each component was a separate pixel, so it truncated the data.
I don’t have a screen shot with this issue as I discovered it while dealing with a more pressing problem.

That more pressing problem being that my code would not run. It would create as many buffers as I wanted it to, but it would only upload to one. The next time I tried to upload a buffer, the driver would unload the device on attempting to execute the command list, telling me that I had given it an invalid call. I narrowed the problem down to the UpdateSubresources() call, which now resides in the ResourceManager.UploadToBuffer() method.
This method originally reset the CommandList and CommandAllocator, created a new upload buffer, then called UpdateSubresources(), and finally executed the CommandList. It also was supposed to wait until the GPU signaled that it was done with the CommandList before returning so that it could clean up the upload buffer. Not the most efficient setup, but a good start, or so I thought.
Except for the whole removing the device thing. The way my code is set up, the first upload is always the TerrainMaterial. The next upload is the height map, although I can swap that for the displacement map with no change. There is really no difference between the three objects, at least in terms of the calls. The TerrainMaterial is significantly larger, since it is actually a Texture2DArray, rather than a single Texture2D. That difference is unimportant to this problem.
I wound up having to rewrite this as a more efficient single buffer. I initialize a larger upload buffer and an index variable that points to the next free byte in the buffer. We compare the size of the data we need to upload to the total size of the buffer. If it is too big, we create a new temporary upload buffer as before. If not, then we compare the size we need to how much space is actually left in our main buffer. If there is enough space left, we call UpdateSubresources() with the index offsetting us into the buffer. We then set a Fence so we know when that upload is done and return.
If there wasn’t enough space left in the buffer, then we wait for the GPU to finish with the last upload, which will trigger an event on the previous fence we set. We can then reset the index into the upload buffer and continue as before.

For whatever reason, this fixed the problem with the device getting removed. I can’t say I understand why. Although perhaps it has to do with the next bug I had with this function, because it still wasn’t working right. Now, the program was compiling and running. It would even render the height map fine in the 2D view, but I just got a blank screen in the 3D view. It was definitely switching to the 3D view, because I could see the blue background colour, but there was no terrain.
It made me think the data wasn’t getting into the buffers, so I started up Visual Studio’s Graphics debugger to take a look. And the terrain loaded fine. All of the data was there and it looked normal. So I shut down the Graphics debugger and tried running it again in the normal debugger. No terrain, blank blue screen. So if I try to use the Graphics debugger to look at the buffers and check the data, the buffers load fine so the data is all good. But if I don’t use the debugger, the buffers don’t seem to load any data, but I can’t check them either.
For a long time, I thought this was something wrong with the WaitForGPU() method, but eventually I ruled it out. Even adding in a long sleep didn’t seem to get around the problem. I did notice, however, that the problem appeared to go away if I loaded a small height map, something under 512×512. This will result in not only a smaller height map texture, but also a smaller mesh and resulting vertex and index buffers, the missing data. This just made me think again that it was taking too long to load the data and for some reason the WaitForGPU() method wasn’t working, but it had to be right. It’s too straightforward to not work!

void ResourceManager::WaitForGPU() {
	if (FAILED(m_pFence->SetEventOnCompletion(m_valFence, m_hdlFenceEvent))) {
		throw GFX_Exception("ResourceManager::WaitForGPU failed to SetEventOnCompletion.");
	}

	WaitForSingleObject(m_hdlFenceEvent, INFINITE);
}

It turns out that this was caused by my resetting the CommandAllocator every time I called UploadToBuffer(). Because Reset() was called every time I wanted to render a new frame, I assumed it would make sense to call it every time through this function. This isn’t accurate. You don’t want to call Reset() on a CommandAllocator while the GPU might still be working on executing a previous CommandList. This resets the memory associated with the commands you were running and effectively cuts the CommandList off before it is completed.
Because I am now treating my upload buffer like a queue, I’m not calling WaitForGPU() anymore, until I finish uploading all of the static buffers and textures.

Anyhow, it isn’t necessary to reset the CommandAllocator here, so I removed that from the method. In fact, I now believe, although I haven’t been able to confirm it through searching and I haven’t tried it myself yet, but I don’t think you ever need to reset a CommandAllocator unless you want to assign a different CommandList to it. Since any CommandAllocator in my code will always have the same CommandList associated with it, I don’t actually need to ever reset any of my CommandAllocators! Plus, there doesn’t seem to be any issue with resetting the same CommandList on different CommandAllocators each frame. My Scene only uses one CommandList and passes it to the CommandAllocator associated with each Frame.
The code to reset the CommandAllocator is still there in the latest upload as it isn’t hurting anything, but I’ve commented it out in my working copy and the application still runs perfectly fine. If this is indeed the case, I’m not sure what a program would have to look like to require you to reset the CommandAllocators. The tutorial code I’ve looked at always has NxM CommandAllocators, where N is the number of CommandLists and M is the number of frames. That would make me think that you’d normally have the same CommandList associated with the same CommandAllocator.

Anyway, here’s the current version of the UploadToBuffer() method. It’s one of the longer, more complex methods in the new code, mostly because I needed to account for memory requirements I talked about earlier.

void ResourceManager::UploadToBuffer(unsigned int i, unsigned int numSubResources, D3D12_SUBRESOURCE_DATA* data, D3D12_RESOURCE_STATES initialState) {
	if (i < 0 || i >= m_listResources.size()) {
		std::string msg = "ResourceManager::UploadToBuffer failed due to index " + std::to_string(i) + " out of bounds.";
		throw GFX_Exception(msg.c_str());
	}

	if (FAILED(m_pCmdList->Reset(m_pCmdAllocator, NULL))) {
		throw GFX_Exception("ResourceManager::UploadToBuffer: CommandList Reset failed.");
	}

	m_pCmdList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_listResources[i],
		initialState, D3D12_RESOURCE_STATE_COPY_DEST));

	auto size = GetRequiredIntermediateSize(m_listResources[i], 0, numSubResources);
	size = pow(2, ceilf(log(size) / log(2))); // round size up to the next power of 2 to ensure good alignment in upload buffer.

	if (size > DEFAULT_UPLOAD_BUFFER_SIZE) {
		// then we're going to have to create a new temporary buffer.
		ID3D12Resource* tmpUpload;
		m_pDev->CreateCommittedResource(tmpUpload, &CD3DX12_RESOURCE_DESC::Buffer(size), &CD3DX12_HEAP_PROPERTIES(D3D12_HEAP_TYPE_UPLOAD),
			D3D12_HEAP_FLAG_NONE, D3D12_RESOURCE_STATE_GENERIC_READ, nullptr);

		UpdateSubresources(m_pCmdList, m_listResources[i], tmpUpload, 0, 0, numSubResources, data);

		// set resource barriers to inform GPU that data is ready for use.
		m_pCmdList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_listResources[i],
			initialState, D3D12_RESOURCE_STATE_COPY_DEST));

		// close the command list.
		if (FAILED(m_pCmdList->Close())) {
			throw GFX_Exception("ResourceManager::UploadToBuffer: CommandList Close failed.");
		}

		// load the command list.
		ID3D12CommandList* lCmds[] = { m_pCmdList };
		m_pDev->ExecuteCommandLists(lCmds, __crt_countof(lCmds));

		// add fence signal.
		++m_valFence;
		m_pDev->SetFence(m_pFence, m_valFence);

		WaitForGPU();
		tmpUpload->Release();
	} else {
		if (size > DEFAULT_UPLOAD_BUFFER_SIZE - m_iUpload) {
			// then we need to wait for the GPU to finish with whatever it is currently uploading.
			// check to see if it is already done.
			if (m_pFence->GetCompletedValue() < m_valFence) {
				// then we're not done, so wait.
				WaitForGPU();
				
			}
			m_iUpload = 0;
		}

		UpdateSubresources(m_pCmdList, m_listResources[i], m_pUpload, m_iUpload, 0, numSubResources, data);
		m_iUpload += size;

		// set resource barriers to inform GPU that data is ready for use.
		m_pCmdList->ResourceBarrier(1, &CD3DX12_RESOURCE_BARRIER::Transition(m_listResources[i],
			D3D12_RESOURCE_STATE_COPY_DEST, initialState));

		// close the command list.
		if (FAILED(m_pCmdList->Close())) {
			throw GFX_Exception("ResourceManager::UploadToBuffer: CommandList Close failed.");
		}

		// load the command list.
		ID3D12CommandList* lCmds[] = { m_pCmdList };
		m_pDev->ExecuteCommandLists(lCmds, __crt_countof(lCmds));

		// add fence signal.
		++m_valFence;
		m_pDev->SetFence(m_pFence, m_valFence);
	}
}

I’m not sure what else to say about the code. It works. The last time I brought up performance, I said that with texturing on, our performance bottomed out at 32.1ms, 31.2fps. Now, the worst case I could find was 26.5ms, 37.8fps. That seems like a pretty significant improvement. Whether this is due to the code changes or to my not testing enough, I can’t be sure.
Either way, I’m just about done with this project. I have one or two things that I actually forgot I had said I wanted to do, so I’ll get those sorted out and my next post should be a concluding one. And then we can finally get on with another project!

For the latest version of the code, see GitHub.

Traagen

The Demon Throne

Games and Graphics Programming

Rendering Terrain Part 30 – Refactor Round Three