HoloLens Terrain Generation Demo Part 10 – Converting Surface Mesh Data For Plane Finding

Times like these, I really wish I had just tackled this whole HoloLens project in Unity. Working on it in DirectX has been a challenge, to say the least. All of this Spatial Mapping stuff is pretty much covered in one tutorial for Unity. In contrast, there is almost no data available for working in DirectX directly. You’ve basically just got the MSDN pages, and all they really tell you is what members the various classes have. I haven’t found a single tutorial about this stuff by anyone 1.
Further, the open source code Microsoft provides for Plane Finding is a project specifically set up to compile to DLLs for use in Unity and on the HoloLens. I didn’t know how to make a DLL that Visual Studio would accept, so I wound up just copying all of the project files into my project.

With my complaining out of the way, let’s look at taking our Surface Mesh data and converting it into a format the Microsoft’s Plane Finding code can deal with. The algorithm is called using the FindPlanes() function:

vector<BoundedPlane> FindPlanes(
     _In_ INT32 numMeshes,
     _In_count_(numMeshes) MeshData* meshes,
     _In_ float snapToGravityThreshold);

As you can see, we give it one or more meshes in the MeshData format. What does that look like?

struct MeshData {
    DirectX::XMFLOAT4X4 transform;
    INT32 vertCount;
    INT32 indexCount;
    DirectX::XMFLOAT3* verts;
    DirectX::XMFLOAT3* normals;
    INT32* indices;
};

Thankfully, this lines up reasonably well with what we get in a SpatialSurfaceMesh object. One would hope so, given both were provided by Microsoft. Unfortunately, while the members are similar, they are completely different types of data. The SpatialSurfaceMesh uses IBuffers to store the vertex, normal, and index data. IBuffers are essentially raw byte data. We’re going to have to convert that data into the types we need.
The first step is to figure out what format the Surface Mesh has stored our vertices, normals, and indices in. Luckily, we’re actually defining this when we collect the mesh data from the HoloLens.

Concurrency::task<void> RealtimeSurfaceMeshRenderer::AddOrUpdateSurfaceAsync(Guid id, SpatialSurfaceInfo^ newSurface) {
	auto options = ref new SpatialSurfaceMeshOptions();
	options->IncludeVertexNormals = true;
	
	IVectorView<DirectXPixelFormat>^ supportedVertexPositionFormats = options->SupportedVertexPositionFormats;
	unsigned int formatIndex = 0;
	if (supportedVertexPositionFormats->IndexOf(DirectXPixelFormat::R16G16B16A16IntNormalized, &formatIndex)) {
		options->VertexPositionFormat = DirectXPixelFormat::R16G16B16A16IntNormalized;
	}
	IVectorView<DirectXPixelFormat>^ supportedVertexNormalFormats = options->SupportedVertexNormalFormats;
	if (supportedVertexNormalFormats->IndexOf(DirectXPixelFormat::R8G8B8A8IntNormalized, &formatIndex))	{
		options->VertexNormalFormat = DirectXPixelFormat::R8G8B8A8IntNormalized;
	}
	IVectorView<DirectXPixelFormat>^ supportedTriangleIndexFormats = options->SupportedTriangleIndexFormats;
	if (supportedTriangleIndexFormats->IndexOf(DirectXPixelFormat::R16UInt, &formatIndex))
	{
	    options->TriangleIndexFormat = DirectXPixelFormat::R16UInt;
	}
//...
}

Our vertices are stored as four normalized two byte integers. Normals are four normalized one byte integers. Indices are two byte unsigned integers. For information on what all that means, you can check out these Data Conversion Rules.
I managed to find a pretty handy forum topic that got me going on converting from this Surface Mesh data to our MeshData. It doesn’t exactly outline everything I need, so I may have a few pieces in my code that aren’t correct. Once I can render the planes, I can figure that stuff out. So far, the data I’m getting from the conversion seems reasonable.

// Return a MeshData object from the Raw data buffers.
MeshData SurfaceMesh::ConstructMeshData() {
	// we configured RealtimeSurfaceMeshRenderer to ensure that the data
	// we are receiving is in the correct format.
	// Vertex Positions: R16G16B16A16IntNormalized
	// Vertex Normals: R8G8B8A8IntNormalized
	// Indices: R16UInt (we'll convert it from here to R32Int. HoloLens Spatial Mapping doesn't appear to support this format directly.

	MeshData newMesh;
	newMesh.vertCount = m_surfaceMesh->VertexPositions->ElementCount;
	newMesh.verts = new XMFLOAT3[newMesh.vertCount];
	newMesh.normals = new XMFLOAT3[newMesh.vertCount];
	newMesh.indexCount = m_surfaceMesh->TriangleIndices->ElementCount;
	newMesh.indices = new INT32[newMesh.indexCount];

	XMSHORTN4* rawVertexData = (XMSHORTN4*)GetDataFromIBuffer(m_surfaceMesh->VertexPositions->Data);
	XMBYTEN4* rawNormalData = (XMBYTEN4*)GetDataFromIBuffer(m_surfaceMesh->VertexNormals->Data);
	UINT16* rawIndexData = (UINT16*)GetDataFromIBuffer(m_surfaceMesh->TriangleIndices->Data);
	float3 vertexScale = m_surfaceMesh->VertexPositionScale;
	
	for (int index = 0; index < newMesh.vertCount; ++index) {
		// read the current position as an XMSHORTN4.
		XMSHORTN4 currentPos = XMSHORTN4(rawVertexData[index]);
		XMFLOAT4 vals;

		// XMVECTOR knows how to convert XMSHORTN4 to actual floating point coordinates.
		XMVECTOR vec = XMLoadShortN4(&currentPos);

		// Store that into an XMFLOAT4 so we can read the values.
		XMStoreFloat4(&vals, vec);

		// Scale by the vertex scale.
		XMFLOAT4 scaledPos = XMFLOAT4(vals.x * vertexScale.x, vals.y * vertexScale.y, vals.z * vertexScale.z, vals.w);

		// Then we need to down scale the vector since it will be rescaled when rendering (ie divide by w).
		// do we?
		float4 downScaledPos = float4(scaledPos.x, scaledPos.y, scaledPos.z, scaledPos.w);

		newMesh.verts[index].x = downScaledPos.x;
		newMesh.verts[index].y = downScaledPos.y;
		newMesh.verts[index].z = downScaledPos.z;

		// now do the same for the normal.
		XMBYTEN4 currentNormal = XMBYTEN4(rawNormalData[index]);
		XMFLOAT4 norms;
		XMVECTOR norm = XMLoadByteN4(&currentNormal);
		XMStoreFloat4(&norms, norm);
		// No need to downscale. Does nothing.
		newMesh.normals[index].x = norms.x;
		newMesh.normals[index].y = norms.y;
		newMesh.normals[index].z = norms.z;
	}

	for (int index = 0; index < newMesh.indexCount; ++index) {
		newMesh.indices[index] = rawIndexData[index];
	}

	newMesh.transform = XMFloat4x4Identity;

	return newMesh;
}

You’ll need DirectXPackedVector.h and the DirectX::PackedVector namespace for the XMSHORTN4 and XMBYTEN4 types. They save so much headache though, as they automatically convert from normalized integer to floating point for us.
The bit I’m not entirely certain about is where we scale the vertex positions and then divide by w (down scale). For one, I’m not currently scaling the normals at all. Scaling the vertex positions without scaling the normals will have odd results, so I should probably fix that. My thought, however, is that the scaling should be combined with the transform matrix. That matrix takes us from object/mesh space to world space. I think it probably makes more sense to add the scaling in there, but I’m not sure.
You’ll probably also notice that this code currently sets the transform matrix to the Identity matrix. My reason for this is that ConstructMeshData() is currently being called from UpdateSurface(), so that every time the surface updates, we rebuild our MeshData structure for the SurfaceMesh. But UpdateSurface() doesn’t have access to a SpatialCoordinateSystem object and we would need that to create the transform.
So far, I’ve been building the MeshData object with the Identity matrix, and then updating the transform member each frame when we call UpdateTransform() on the Surface Mesh. This made sense to me because we’re already calculating the transform matrix, and also combining it with the scaling transformation, in UpdateTransform(). All I had to do was add a call to XMStoreFloat4x4() to save that matrix to our MeshData object.
I do have a few problems with this, however. Firstly, when do we call the FindPlanes() function? We can’t do it in UpdateTransform, when we have an accurate transform matrix. The algorithm is simply too slow to run every frame. Trust me, I tried it. For testing purposes, I wound up just calling the function in UpdateSurface(), directly after building the MeshData object.

auto planeList = FindPlanes(1, &m_localMesh, 5.0f);

The good news is that it actually finds planes in the mesh data! So the algorithm at least appears to work. I still need to move things around, though. For one, that 5.0f at the end refers to the snapToGravityThreshold variable. This is an angle in degrees off horizontal for which the algorithm will snap it to horizontal. This is handy if your surfaces are being detected as slightly off horizontal or you’re getting rounding errors, etc. You can set it to 0.0f and you’ll get no snapping. I’ve set it fairly high for now and will likely adjust it later, or maybe expose it in the GUI. You can check out Util.cpp for the SnapToGravity() code.
Anyway, the point of bringing that variable up is that it kind of requires us to know which way is down. If we don’t have an accurate transform matrix, we can’t determine that.
Moving on, there is a MergePlanes() method as well. Looking at MergePlanes.cpp, it appears to me like it looks for planes that overlap and merges them, potentially reducing the total number of planes and making larger bounded planes. This would be handy for finding good surfaces for our terrain. Right now, we would need to pull the list of planes out of each SurfaceMesh and combine them somewhere else, probably in RealtimeSurfaceMeshRenderer.
Finally, we have the problem of rendering these planes. We need to be able to update our list of planes when surfaces change, which means updating DirectX buffers. It likely means changing the lengths of those buffers, which means releasing and recreating them. Plus, rendering them means a whole new set of shaders. The current structure of the SurfaceMesh class doesn’t really allow buffers to be created and destroyed on update. It needs the device to be passed in, which only happens when the RealtimeSurfaceMeshRenderer tells it to create everything.

What I’m currently planning is to have the surface planes update at a slower rate than the rest of the application, maybe once every couple of seconds. We can then request a list of planes from the RealtimeSurfaceMeshRenderer, which in turn will request a list of planes from each SurfaceMesh. We can give it the current coordinate system at that time, so that we can get an accurate transform with which to calculate horizontal from, since we ultimately only care about horizontal surfaces with upward facing normals for this project. The RealtimeSurfaceMeshRenderer can attempt to merge the planes it receives from its collection of SurfaceMesh objects. It can then return the aggregate list of planes to the Main program. All of this can be run asyncronously so that it doesn’t stall the application.
From there, we will pass this data to a SurfacePlaneRenderer object, which will handle updating the DirectX buffers when it receives the new list.

I’m also a bit uncertain about how rendering the planes will work. The type returned from FindPlanes() and MergePlanes() is a vector of BoundedPlanes.

struct Plane {
    DirectX::XMFLOAT3 normal;
    FLOAT d;

    Plane() {}
    Plane(const DirectX::XMFLOAT3& normal, FLOAT d) : normal(normal), d(d) {}
    Plane(const DirectX::XMVECTOR& vec) { StoreVector(vec); }

    void StoreVector(const DirectX::XMVECTOR& vec) {
        XMStoreFloat4(reinterpret_cast<DirectX::XMFLOAT4*>(this), vec);
    }

    const DirectX::XMVECTOR AsVector() const {
        return XMLoadFloat4(reinterpret_cast<const DirectX::XMFLOAT4*>(this));
    }
};

struct BoundedPlane {
    Plane plane;
    DirectX::BoundingOrientedBox bounds;
    FLOAT area;
};

As you can see, a BoundedPlane is a Plane, an Oriented Bounding Box, and the area of the plane. The bounding box gives us a Center, which I am currently assuming will also be the center of the plane; Extents, which is made up of three values which should give the coordinates of the vertices of the bounding box by adding to and subtracting from the Center; and an Orientation, which is a Quaternion representing the rotation of the Box, and therefore the Plane, to what should be World Space. I see two problems here.
The first is that this transform won’t be valid in any subsequent frame, because we don’t have a stationary frame of reference. I’m fairly certain that I’ll need to add a CoordinateSystem object to this structure so we can update the orientation each frame.
The second is that in order to render a plane (as a rectangle), we need to figure out where the four corners are. I haven’t figured out what that will look like, based on what we’ve got.

In case you’re curious about how the Plane itself is being stored, see this explanation of a plane.

So that’s where I’m at right now. I’ve figured out how to get our SurfaceMesh data into a format that Microsoft’s Plane Finding algorithm will accept. Now, I need to get to a point where I can actually render the planes I’m finding. We’ll see in my next post how close my solution winds up being to what I described here.

For the latest code, go to GitHub.
Traagen