Rendering Terrain Part 26 – Improving Performance

At this point in the project, we have four cascades of shadow maps requiring a render pass for each map, dynamic level of detail with tessellation, displacement mapping, triplanar texturing for our normal maps, and height and slope based colour and normal mapping, plus distance based normal mapping as an attempt to improve performance. Our terrain isn’t rendered to as high a level of detail in the shadow passes as it is in our main render pass, again in order to save on performance. We also dropped triplanar mapping for the displacement mapping as a performance improving measure. Those last two didn’t really have much of an impact on the visual quality, but definitely saved on frame rate. Distance based normal mapping is noticeable, but acceptable. And it did help knock a couple of milliseconds off of our render time, as we mentioned in Part 24. Our worst case frame rate at this point is 29.2ms, or 34.3fps1.
So what else can we do to try to save on performance? The first thing I wanted to try was moving from four separate textures to an array of textures. This is what my texture definitions looked like originally:

Texture2D<float4> detailmap1 : register(t3);
Texture2D<float4> detailmap2 : register(t4);
Texture2D<float4> detailmap3 : register(t5);
Texture2D<float4> detailmap4 : register(t6);

I changed that to this:

Texture2D<float4> detailmap[4] : register(t3);

Obviously, this makes a big difference in the options I have with calls involving these samples. I can look at options to dynamically index this array that I didn’t have with the separate textures. The cool thing is that this change required no change at all in my DirectX code because I was already setting my root descriptor table to contain all four textures together in consecutive registers. If I had defined them to be in random registers, I’d have to change that. But in my case, this change only had to happen in the Pixel Shader. Did it make a difference? Well, on it’s own, with no other changes, yes. But not a good one. Our frame rate worsened to 30.1ms, or 33.3fps. Given how I’m getting my frame rates, however, I’d say that is probably identical, within a reasonable margin of error. The important thing is that this gives me more options to change my code to use less branching.

But before I move on to that, I wanted to try one other option. The array of textures syntax above appears to be new to HLSL, as of Shader Model 4.0 (DirectX 10). There’s a DirectX 11 tutorial on this method over at Rastertek. There is, however, another way to implement an array of textures: Texture2DArray. As far as I can tell, it also was introduced in SM4. I couldn’t find any tutorials using it in DirectX 12 and the DirectX 10 and 11 tutorials weren’t really pertinent to DirectX 12 code. I’m not sure why there are two different methods of representing an array. From what little information I could gather, the idea was that, prior to DirectX 12 anyway, the Texture2DArray made it easier to bind multiple textures to the pipeline with a single call. So in DirectX 12, we can and are doing that already. Still, it’s a bit different, so I figured why not learn it.
One big difference between using Texture2DArray and Texture2D[] syntax is that the former requires all textures to be the same size and format. That includes mipmaps, but I haven’t looked at those. Texture2D[] allows for different size textures to be included in the same array. Switching to Texture2DArray therefore required me to find some new textures to generate my normal maps from. I took the liberty of trying out a different tool this time. CrazyBump is pretty easy to use and free to download. I was using MindTex 2, but I actually found I could get better results using CrazyBump because I found the interface for adjusting settings simpler to use.

I should also mention that when sampling a Texture2D[], it would look something like this:

myTextures[0].Sample(mySampler, float2(u, v));

Pretty much what you’d expect. Sampling a Texture2DArray is slightly different:

myTextures.Sample(mySampler, float3(u, v, w));

The big difference is that the index into the array is the w component of a float3 instead of being a regular array index. Just something to remember. It has no other impact. You can’t do anything funky like pass a decimal value and get a weird result. I tried. It just rounds the number. 1.4 becomes 1; 1.5 becomes 2.

There were also a number of changes necessary to my DirectX code to use the Texture2DArray. First off, the Texture2DArray is a single ID3D12Resource instead of needing one for each texture. This was a pretty minor change to my code that I won’t go into detail about. We’re just changing an array of resources into a single resource. Pretty straight forward.
Some changes did need to go into our Scene::InitTerrainResources() method, which actually creates the resource and SRV. Originally, creating the detail map texture buffers looked like this:

D3D12_SUBRESOURCE_DATA dataDetailTex[4]; 
ID3D12Resource* detailmap[4];
D3D12_RESOURCE_DESC	descDetailTex[4];
size_t sizeofDetailMap[4];

for (int i = 0; i < 4; ++i) {
	UINT detwidth = T.GetDetailMapWidth(i);
	UINT detdepth = T.GetDetailMapHeight(i);
	descDetailTex[i] = {};
	descDetailTex[i].MipLevels = 1;
	descDetailTex[i].Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
	descDetailTex[i].Width = detwidth;
	descDetailTex[i].Height = detdepth;
	descDetailTex[i].Flags = D3D12_RESOURCE_FLAG_NONE;
	descDetailTex[i].DepthOrArraySize = 1;
	descDetailTex[i].SampleDesc.Count = 1;
	descDetailTex[i].SampleDesc.Quality = 0;
	descDetailTex[i].Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D;
		
	mpGFX->CreateDefaultBuffer(detailmap[i], &descDetailTex[i]);
	detailmap[i]->SetName(L"Terrain Detail Map Texture Buffer");
	sizeofDetailMap[i] = GetRequiredIntermediateSize(detailmap[i], 0, 1);
	// if we're uploading everything in one buffer, we need our textures to all be powers of 2
	// or else the command list won't close as data won't be aligned properly.
	sizeofDetailMap[i] = pow(2, ceilf(log(sizeofDetailMap[i]) / log(2)));

	// prepare detail map data for upload.
	dataDetailTex[i] = {};
	dataDetailTex[i].pData = T.GetDetailMapTextureData(i);
	dataDetailTex[i].RowPitch = detwidth * 4 * sizeof(float);
	dataDetailTex[i].SlicePitch = detdepth * detwidth * 4 * sizeof(float);
}

As already said, this no longer needs to be done four times. We can do most of it in one pass.

D3D12_SUBRESOURCE_DATA dataDetailTex[4];
ID3D12Resource* detailmap;
D3D12_RESOURCE_DESC	descDetailTex;
size_t sizeofDetailMap;

UINT detwidth = T.GetDetailMapWidth();
UINT detdepth = T.GetDetailMapHeight();
descDetailTex = {};
descDetailTex.MipLevels = 1;
descDetailTex.Format = DXGI_FORMAT_R32G32B32A32_FLOAT;
descDetailTex.Width = detwidth;
descDetailTex.Height = detdepth;
descDetailTex.Flags = D3D12_RESOURCE_FLAG_NONE;
descDetailTex.DepthOrArraySize = 4;
descDetailTex.SampleDesc.Count = 1;
descDetailTex.SampleDesc.Quality = 0;
descDetailTex.Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D;
	
mpGFX->CreateDefaultBuffer(detailmap, &descDetailTex);
detailmap->SetName(L"Terrain Detail Map Texture Array Buffer");
sizeofDetailMap = GetRequiredIntermediateSize(detailmap, 0, 4);
// if we're uploading everything in one buffer, we need our textures to all be powers of 2
// or else the command list won't close as data won't be aligned properly.
sizeofDetailMap = pow(2, ceilf(log(sizeofDetailMap) / log(2)));

// prepare detail map data for upload.
for (int i = 0; i < 4; ++i) {
	dataDetailTex[i] = {};
	dataDetailTex[i].pData = T.GetDetailMapTextureData(i);
	dataDetailTex[i].RowPitch = detwidth * 4 * sizeof(float);
	dataDetailTex[i].SlicePitch = detdepth * detwidth * 4 * sizeof(float);
}

So what are the important changes?
First off, notice the line setting the DepthOrArraySize value. That would normally default to 1 and we wouldn’t change it. You only set that value if you are creating an array or a Texture3D.
Next up, I missed this one at first until I actually looked at the code included with Frank Luna’s “Introduction to 3D Game Programming with DirectX 12”. Check out the line where we get sizeofDetailMap. We’re calling GetRequiredIntermediateSize() for the buffer we just created. When I was working on making this change, I didn’t even think to look at this line, but the last argument defines the number of subresources in the resource. If you don’t specify the right value for that, then the function won’t return the correct value. I missed changing the argument from a 1 to a 4, so the function was giving me the space needed for one texture.
Next you’ll note we still have a little loop. We still need to create a D3D12_SUBRESOURCE_DATA structure for each texture in the array in order to actually pass the data in.

After that, I have a try/catch block where I attempt to load all data in one upload buffer. If I can’t create a single buffer big enough to hold everything, then I attempt to create upload buffers for each resource separately. I won’t bother looking at the catch block for this. The try block looks like the following:

ID3D12Resource* upload;
mpGFX->CreateUploadBuffer(upload, &CD3DX12_RESOURCE_DESC::Buffer(sizeofHeightmap + sizeofVertexBuffer + sizeofIndexBuffer + sizeofConstantBuffer + sizeofDispMap + sizeofDetailMap[0] + sizeofDetailMap[1] + sizeofDetailMap[2] + sizeofDetailMap[3]));
mlTemporaryUploadBuffers.push_back(upload);
		
// upload heightmap data
UpdateSubresources(cmdList, heightmap, upload, 0, 0, 1, &dataTex);

// upload displacement map data
UpdateSubresources(cmdList, displacementmap, upload, sizeofHeightmap, 0, 1, &dataDispTex);

// upload detail map data
UpdateSubresources(cmdList, detailmap[0], upload, sizeofHeightmap + sizeofDispMap, 0, 1, &dataDetailTex[0]);
UpdateSubresources(cmdList, detailmap[1], upload, sizeofHeightmap + sizeofDispMap + sizeofDetailMap[0], 0, 1, &dataDetailTex[1]);
UpdateSubresources(cmdList, detailmap[2], upload, sizeofHeightmap + sizeofDispMap + sizeofDetailMap[0] + sizeofDetailMap[1], 0, 1, &dataDetailTex[2]);
UpdateSubresources(cmdList, detailmap[3], upload, sizeofHeightmap + sizeofDispMap + sizeofDetailMap[0] + sizeofDetailMap[1] + sizeofDetailMap[2], 0, 1, &dataDetailTex[3]);

// upload vertex buffer data
UpdateSubresources(cmdList, vertexbuffer, upload, sizeofHeightmap + sizeofDispMap + sizeofDetailMap[0] + sizeofDetailMap[1] + sizeofDetailMap[2] + sizeofDetailMap[3], 0, 1, &dataVB);

// upload index buffer data
UpdateSubresources(cmdList, indexbuffer, upload, sizeofHeightmap + sizeofDispMap + sizeofDetailMap[0] + sizeofDetailMap[1] + sizeofDetailMap[2] + sizeofDetailMap[3] + sizeofVertexBuffer, 0, 1, &dataIB);

// upload the constant buffer data
UpdateSubresources(cmdList, constantbuffer, upload, sizeofHeightmap + sizeofDispMap + sizeofDetailMap[0] + sizeofDetailMap[1] + sizeofDetailMap[2] + sizeofDetailMap[3] + sizeofVertexBuffer + sizeofIndexBuffer, 0, 1, &dataCB);

And the new version:

ID3D12Resource* upload;

mpGFX->CreateUploadBuffer(upload, &CD3DX12_RESOURCE_DESC::Buffer(sizeofHeightmap + sizeofVertexBuffer + sizeofIndexBuffer + sizeofConstantBuffer + sizeofDispMap + sizeofDetailMap));
mlTemporaryUploadBuffers.push_back(upload);

// upload heightmap data
UpdateSubresources(cmdList, heightmap, upload, 0, 0, 1, &dataTex);

// upload displacement map data
UpdateSubresources(cmdList, displacementmap, upload, sizeofHeightmap, 0, 1, &dataDispTex);

// upload detail map data
UpdateSubresources(cmdList, detailmap, upload, sizeofHeightmap + sizeofDispMap, 0, 4, dataDetailTex);

// upload vertex buffer data
UpdateSubresources(cmdList, vertexbuffer, upload, sizeofHeightmap + sizeofDispMap + sizeofDetailMap, 0, 1, &dataVB);

// upload index buffer data
UpdateSubresources(cmdList, indexbuffer, upload, sizeofHeightmap + sizeofDispMap + sizeofDetailMap + sizeofVertexBuffer, 0, 1, &dataIB);

// upload the constant buffer data
UpdateSubresources(cmdList, constantbuffer, upload, sizeofHeightmap + sizeofDispMap + sizeofDetailMap + sizeofVertexBuffer + sizeofIndexBuffer, 0, 1, &dataCB);

The big changes here are that we only have one sizeofDetailMap variable and we only need to call UpdateSubresources once for all textures in the array, passing it the entire array of D3D12_SUBRESOURCE_DATA structures and specifying the number of subresources.

Finally, we need to change how we define the Shader Resource View. We used to need four of these as well.

D3D12_SHADER_RESOURCE_VIEW_DESC descDetailSRV[4];
for (int i = 0; i < 4; ++i) {
	descDetailSRV[i] = {};
	descDetailSRV[i].Shader4ComponentMapping = D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING;
	descDetailSRV[i].Format = descDetailTex[i].Format;
	descDetailSRV[i].ViewDimension = D3D12_SRV_DIMENSION_TEXTURE2D;
	descDetailSRV[i].Texture2D.MipLevels = descDetailTex[i].MipLevels;
	CD3DX12_CPU_DESCRIPTOR_HANDLE handleDetailSRV(mlDescriptorHeaps[0]->GetCPUDescriptorHandleForHeapStart(), 22 + i, msizeofCBVSRVDescHeapIncrement);
	mpGFX->CreateSRV(detailmap[i], &descDetailSRV[i], handleDetailSRV);
	T.SetDetailMapResource(i, detailmap[i]);
}

Now we only need one.

D3D12_SHADER_RESOURCE_VIEW_DESC descDetailSRV;
descDetailSRV = {};
descDetailSRV.Shader4ComponentMapping = D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING;
descDetailSRV.Format = descDetailTex.Format;
descDetailSRV.ViewDimension = D3D12_SRV_DIMENSION_TEXTURE2DARRAY;
descDetailSRV.Texture2DArray.ArraySize = descDetailTex.DepthOrArraySize;
descDetailSRV.Texture2DArray.MipLevels = descDetailTex.MipLevels;
CD3DX12_CPU_DESCRIPTOR_HANDLE handleDetailSRV(mlDescriptorHeaps[0]->GetCPUDescriptorHandleForHeapStart(), 22, msizeofCBVSRVDescHeapIncrement);
mpGFX->CreateSRV(detailmap, &descDetailSRV, handleDetailSRV);
T.SetDetailMapResource(detailmap);

We just need to ensure we set the ViewDimension correctly and then set the applicable values for the Texture2DArray member.

There is actually one other small change that needs to be made. Our Root Signature needs to be modified. Before, we created a Descriptor table entry with four descriptors in it. Now we can reduce that to one.

//rangesRoot[5].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 4, 3);
rangesRoot[5].Init(D3D12_DESCRIPTOR_RANGE_TYPE_SRV, 1, 3);

Was this change worth it versus just sticking with Texture2D[]? Well, in terms of performance, we went to 29.6ms, 33.8fps. That puts us right between the other two. Realistically, these three values are within a margin for error that would put them roughly equivalent to each other. The big thing I do like better about the Texture2DArray is that it should be a bit easier to add more textures to the array. I’ve got some hard coded values right now, but that could be easily changed. And with fewer lines needed, it makes the code a bit easier to follow.

But let’s move on to where we actually start shaving some milliseconds off of our frame rate.
My first change was just to the width of the height bands we’re blending between.

float3 height_and_slope_based_normal(float height, float slope, float3 N, float3 V, float3 uvw) {
	float bounds = scale * 0.02f;
	float transition = scale * 0.6f;
	float greenBlendEnd = transition + bounds;
	float greenBlendStart = transition - bounds;
	float snowBlendEnd = greenBlendEnd + 2 * bounds;

	float3 N1 = get_normal_from_detailmap_slopebased(slope, N, V, uvw, 1, 0, displacementsampler);
	
	if (height < greenBlendStart) {
		// get grass/dirt values
		return N1;
	}

	float3 N2 = get_normal_from_detailmap(N, V, uvw, 2, displacementsampler);

	if (height < greenBlendEnd) {
		// get both grass/dirt values and rock values and blend
		float blend = (height - greenBlendStart) * (1.0f / (greenBlendEnd - greenBlendStart));
		return lerp(N1, N2, blend);
	}

	float3 N3 = get_normal_from_detailmap_slopebased(slope, N, V, uvw, 2, 3, displacementsampler);

	if (height < snowBlendEnd) {
		// get rock values and rock/snow values and blend
		float blend = (height - greenBlendEnd) * (1.0f / (snowBlendEnd - greenBlendEnd));
		return lerp(N2, N3, blend);
	}

	// get rock/snow values
	return N3;
}

The indices into the Texture2DArray are hard coded in this version. Look for the constants in the function calls.
I changed the 0.02 to a 0.005. Since scale is currently set to 512 and the blend areas are twice the bounds value, that gives us an initial blend region of 20.48 and a new blend region of 5.12. This made a negligible difference in performance at this stage, but I left it in and actually discovered that it had a sizeable impact in the final version, about 3ms.

Next, I decided I was being greedy trying to blend from the green to a rock only band and then having a second blend region to go to rock and snow. I decided to eliminate the intermediary rock band and only have one blend region. This shaved about 2ms off of our frame rate, bringing us to 27.5ms, 36.3fps.

I then made some significant changes to the structure of the height blending function, taking advantage of the fact that we can now use dynamic indexing to decide which maps to sample before sampling them. This moved our sampling to after the branching.

float3 height_and_slope_based_normal(float height, float slope, float3 N, float3 V, float3 uvw) {
	float bounds = scale * 0.005f;
	float transition = scale * 0.6f;
	float greenBlendStart = transition - bounds;
	float snowBlendEnd = transition + 2 * bounds;
	float index1 = 1, index2 = 0, index3 = -1, index4;
	
	if (height > snowBlendEnd) {
		index1 = 2;
		index2 = 3;
	} else if (height > greenBlendStart) {
		index3 = 2;
		index4 = 3;
	}

	float3 N1 = get_normal_from_detailmap_slopebased(slope, N, V, uvw, index1, index2, displacementsampler);

	if (index3 != -1) {
		float blend = (height - greenBlendStart) * (1.0f / (snowBlendEnd - greenBlendStart));

		float3 N2 = get_normal_from_detailmap_slopebased(slope, N, V, uvw, index3, index4, displacementsampler);
		return lerp(N1, N2, blend);
	}

	return N1;
}

This version of the function is much more efficient. Of course, we eliminated that intermediate band and one of the blend regions, so that naturally simplifies anyway, but now when we are outside of the blend zone, we only need to call the slope-based sampler once. Before, we were calling it twice (plus a call to a basic triplanar method for the intermediate region) if we were above a certain height. This change made a pretty significant difference. Our frame rate improved to 20.1ms, 49.7fps.

That’s about all the improvement I could think to make to the height based function. I tried specifying whether the compiler should perform dynamic branching (using the [branch] keyword) or statically perform both sides of the branch ([flatten]). Neither option seemed to impact performance.

As for the slope based blending function, I couldn’t really see anywhere I could improve. Here’s the current function:

float3 sample_detailmap_slopebased(float slope, float3 N, float3 V, float3 uvw, float indexXY, float indexZ, SamplerState sam) {
	if (slope < 0.25f) {
		return detailmaps.Sample(sam, float3(uvw.xy, indexZ)).xyz;
	}

	float tighten = 0.4679f;
	float3 blending = saturate(abs(N) - tighten);
	// force weights to sum to 1.0
	float b = blending.x + blending.y + blending.z;
	blending /= float3(b, b, b);

	float3 x = detailmaps.Sample(sam, float3(uvw.yz, indexXY)).xyz;
	float3 y = detailmaps.Sample(sam, float3(uvw.xz, indexXY)).xyz;
	float3 z = detailmaps.Sample(sam, float3(uvw.xy, indexXY)).xyz;

	if (slope < 0.5f) {
		float3 z2 = detailmaps.Sample(sam, float3(uvw.xy, indexZ)).xyz;

		float blend = (slope - 0.25f) * (1.0f / (0.5f - 0.25f));

		return lerp(z2, x * blending.x + y * blending.y + z * blending.z, blend);
	}

	return x * blending.x + y * blending.y + z * blending.z;
}

I tried changing the values used to determine the blend area, but while that does have an impact on the appearance of the terrain, it has no noticeable effect on performance. The problem with trying to improve this function more is that there is something quite different happening in flat regions compared to steep regions. We don’t need triplanar mapping in flat regions, so we can get away with only one sample. In steep regions, we need triplanar mapping. I tried playing around with reducing the triplanar mapping to biplanar mapping (2 samples instead of 3) but I could not get it to look good. Finally, in the blend region, we need all four samples. Exactly how well this will run will likely depend on the terrain. I tried my 1024×1024 test map and while the worst case was slightly better, the average frame rate was slightly higher. Given that terrain is more rounded and will have more cases that fall into the blend region for slope, this seems reasonable. As long as the worst case is not worse, then we’re ok.

Finally, we have our distance based normal mapping, where we reduce the amount of sampling we need based on distance from the camera.

float3 dist_based_normal(float height, float slope, float3 N, float3 V, float3 uvw) {
	float dist = length(V);

	float3 N1 = perturb_normal(N, V, uvw / 8, displacementmap, displacementsampler);

	if (dist > 150) return N;
	
	if (dist > 100) {
		float blend = (dist - 100.0f) / 50.0f;

		return lerp(N1, N, blend);
	}

	float3 N2 = height_and_slope_based_normal(height, slope, N1, V, uvw);

	if (dist > 50) return N1;

	if (dist > 25) {
		float blend = (dist - 25.0f) / 25.0f;

		return lerp(N2, N1, blend);
	}

	return N2;
}

This function really bothers me. I mentioned why in Part 24, and that hasn’t changed. There is no reason that I can understand why I would have to define either N1 or N2 where I have them. You would think I could define them like so:

float3 dist_based_normal(float height, float slope, float3 N, float3 V, float3 uvw) {
	float dist = length(V);

	if (dist > 150) return N;

	float3 N1 = perturb_normal(N, V, uvw / 8, displacementmap, displacementsampler);	

	if (dist > 100) {
		float blend = (dist - 100.0f) / 50.0f;

		return lerp(N1, N, blend);
	}

	if (dist > 50) return N1;

	float3 N2 = height_and_slope_based_normal(height, slope, N1, V, uvw);

	if (dist > 25) {
		float blend = (dist - 25.0f) / 25.0f;

		return lerp(N2, N1, blend);
	}

	return N2;
}

That small change would improve our frame rate to 15.1ms, 66fps. That finally gets us back over 60fps! But it also reintroduces the problem I showed in Part 24. Rather than take a new screen shot for this problem, I’ll just show you the lines from the height based blending. The distance based blend lines are the same, but line up with the distance bands.

I even came up with this:

float3 dist_based_normal(float height, float slope, float3 N, float3 V, float3 uvw) {
	float dist = length(V);

	float3 norm = N;

	if (dist <= 100) {
		norm = perturb_normal(N, V, uvw / 8, displacementmap, displacementsampler);
	} else if (dist <= 150) {
		float blend = (dist - 100.0f) / 50.0f;

		norm = lerp(perturb_normal(N, V, uvw / 8, displacementmap, displacementsampler), N, blend);
	}

	if (dist <= 25) {
		norm = height_and_slope_based_normal(height, slope, norm, V, uvw);
	} else if (dist <= 50) {
		float blend = (dist - 25.0f) / 25.0f;

		norm = lerp(height_and_slope_based_normal(height, slope, norm, V, uvw), norm, blend);
	}

	return norm;
}

This version improved the frame rate even more, getting an even 14ms, 71.4fps, but those damn lines are still showing up where the normal calculation changes, even though the changes should be smooth and the normal should be continuous.

Anyway, it’s probably not worth fussing with. It’s not even an issue with diffuse textures and the only reason I’m trying to blend normal maps like this is because I have two different steep materials. If you look at actual pictures of mountains, though, they’re usually made up of the same types of rock, from top to bottom. So I may actually modify my code. I can change it to only have one choice of normal map for steep regions, and then add height based blending for which normal map to use for the flat areas. If I do make this change, I may even add an intermediate normal map for the slope based mapping, similar to what they do in this tutorial. That is dealing with diffuse maps, not normal maps, but my purpose is to be able to use the same algorithm for both. I’m thinking I’m going to extend this with a toggle so I can switch back and forth between the colour palette and diffuse textures.

I think I just figured out what I’m talking about in my next post.

As for improving performance further, I’m not sure where to go from here. These changes to blending may have an impact, for better or worse.
Up until adding triplanar mapping, this project was geometry limited, meaning my frame rate was dependent pretty much entirely on the size of the height map and the resulting static mesh we generate. Smaller height maps meant vastly improved frame rates. Now, our frame rate seems to be entirely dependent on the texture sampling we do. Changing the height map to a smaller one has no impact on performance. I was considering whether to implement something along the lines of geometry clipmaps, or at least a mesh centered on the camera and with built in levels of detail as it extends out. That’s still something I’d like to do, and I wasn’t really planning on implementing it in this project anyway, but now I don’t think it will actually improve performance, anyway.
I may still revisit looking for a way to build all four shadow cascades in one pass. I feel like that should be faster, but when I played around with Geometry Shaders for that, it was far worse, as I mentioned in Part 16.

For now, however, this post is plenty long enough.

For the latest code, check out GitHub.

Traagen

The Demon Throne

Games and Graphics Programming

Rendering Terrain Part 26 – Improving Performance