Shape optimization#

Overview#

In this tutorial, we will optimize a triangle mesh to match a target shape specfied using a set of reference renderings.

Gradients with regards to vertex positions are typically extremely sparse, since only vertices located on visibility discontinuities receive a contribution. As a consequence, naively optimizing a triangle mesh generally results in horrible, tangled meshes.

To avoid this, we will use the method from the paper “Large Steps in Inverse Rendering of Geometry”. This method optimizes a latent variable that enables smoother gradients.

🚀 You will learn how to:

  • Use the “large steps” algorithm to optimize a shape

  • Use remeshing to refine the optimized shape

Setup#

As always, let’s import drjit and mitsuba and set a differentiation-aware variant.

[1]:
import drjit as dr
import mitsuba as mi
import matplotlib.pyplot as plt # We'll also want to plot some outputs
import os

mi.set_variant('cuda_ad_rgb', 'llvm_ad_rgb')

Setting up sensors#

In order to accurately recover a shape, we need several reference renderings, taken from different viewpoints. Similar to the volume optimisation tutorial, we render each viewpoint separately during the optimisation.

Note that we also have to set the sample_border flag to True for the hdrfilm plugin. By enabling this option, Mitsuba will render regions slightly beyond the film’s boundaries, enabling us to track objects that enter or exit the frame.

Here we will generate 8 viewpoints evenly distributed on the sphere, using the Fibonacci lattice.

[2]:
from mitsuba import ScalarTransform4f as T

sensor_count = 8
sensors = []

golden_ratio = (1 + 5**0.5)/2
for i in range(sensor_count):
    theta = 2 * dr.pi * i / golden_ratio
    phi = dr.acos(1 - 2*(i+0.5)/sensor_count)

    d = 5
    origin = [
        d * dr.cos(theta) * dr.sin(phi),
        d * dr.sin(theta) * dr.sin(phi),
        d * dr.cos(phi)
    ]

    sensors.append(mi.load_dict({
        'type': 'perspective',
        'fov': 45,
        'to_world': T.look_at(target=[0, 0, 0], origin=origin, up=[0, 1, 0]),
        'film': {
            'type': 'hdrfilm',
            'width': 256, 'height': 256,
            'filter': {'type': 'gaussian'},
            'sample_border': True,
        },
        'sampler': {
            'type': 'independent',
            'sample_count': 128
        },
    }))

Scene construction#

Let’s now generate the reference renderings. We will load a scene with tthe target mesh and an environment map. Note the use of the direct_reparam integrator, that properly accounts for visibility discontinuities when differentiating, as in the object pose estimation tutorial. There won’t be many shadows in our scene intitially, so let’s turn off indirect visibility effects in the integrator to speed the optimization up.

[3]:
scene_dict = {
    'type': 'scene',
    'integrator': {
        'type': 'direct_projective',
        # Indirect visibility effects aren't that important here
        # let's turn them off and save some computation time
        'sppi': 0,
    },
    'emitter': {
        'type': 'envmap',
        'filename': "../scenes/textures/envmap2.exr",
    },
    'shape': {
        'type': 'ply',
        'filename': "../scenes/meshes/suzanne.ply",
        'bsdf': {'type': 'diffuse'}
    }
}

scene_target = mi.load_dict(scene_dict)

We can now generate the reference image, our goal is to reconstruct Blender’s Suzanne.

[4]:
def plot_images(images):
    fig, axs = plt.subplots(1, len(images), figsize=(20, 5))
    for i in range(len(images)):
        axs[i].imshow(mi.util.convert_to_bitmap(images[i]))
        axs[i].axis('off')
[5]:
ref_images = [mi.render(scene_target, sensor=sensors[i], spp=256) for i in range(sensor_count)]
plot_images(ref_images)
../../_images/src_inverse_rendering_shape_optimization_12_0.png

The starting geometry which we’ll optimize is a relatively dense sphere. More challenging scenes and target shapes might require a better initialization.

[6]:
scene_dict['shape']['filename'] = '../scenes/meshes/ico_10k.ply'
scene_source = mi.load_dict(scene_dict)

init_imgs = [mi.render(scene_source, sensor=sensors[i], spp=128) for i in range(sensor_count)]
plot_images(init_imgs)
../../_images/src_inverse_rendering_shape_optimization_14_0.png

Naive optimization#

It is tempting to apply the same steps as in previous tutorials, i.e. simply render the scene, compute a loss, and differentiate. Let’s try that first.

As per usual, all that is needed is an Optimizer and a simple optimization loop.

[7]:
params = mi.traverse(scene_source)
opt = mi.ad.Adam(lr=3e-2)
opt['shape.vertex_positions'] = params['shape.vertex_positions']
[8]:
for it in range(5):
    total_loss = mi.Float(0.0)

    for sensor_idx in range(sensor_count):
        params.update(opt)

        img = mi.render(scene_source, params, sensor=sensors[sensor_idx], seed=it)

        # L2 Loss
        loss = dr.mean(dr.sqr(img - ref_images[sensor_idx]))
        dr.backward(loss)

        opt.step()

        total_loss += loss[0]

    print(f"Iteration {1+it:03d}: Loss = {total_loss[0]:6f}", end='\r')
Iteration 005: Loss = 0.340038
[9]:
final_imgs = [mi.render(scene_source, sensor=sensors[i], spp=128) for i in range(sensor_count)]
plot_images(final_imgs)
../../_images/src_inverse_rendering_shape_optimization_19_0.png

As you can see, even after only a few steps, this approach produces an unusable mess of triangles. This is a consequence of the sparsity of the visibility-related gradients. Indeed, these are only present on edges that are on the silhouette of the mesh for a given viewpoint.

[10]:
# Reset the scene
scene_source = mi.load_dict(scene_dict)
params = mi.traverse(scene_source)

Large Steps Optimization#

Rather than directly optimizing cartesian vertex coordinates, the work of [Nicolet et al. 2021]Large Steps in Inverse Rendering of Geometry” introduces a latent representation that improves the smoothnes of gradients. In short, a latent variable \(u\) is defined as \(u = (I + \lambda L) v\), where \(v\) denotes the vertex positions, \(L\) is the (combinatorial) Laplacian of the given mesh, and \(\lambda\) is a user-defined hyper-parameter. Intuitively, this parameter \(\lambda\) defines by how much the gradients should be smoothed out on the surface.

This approach is readily available in Mitsuba in the mi.ad.LargeSteps class. It requires cholespy, a Python package to solve sparse linear systems with Cholesky factorisations.

[ ]:
!pip install cholespy

The LargeSteps helper is instanciated from the starting shapes’ vertex positions and faces. If the mesh has duplicate vertices (e.g. due to face normals or UV seams), it will internally convert the mesh to a “unique” representation.

[12]:
lambda_ = 25
ls = mi.ad.LargeSteps(params['shape.vertex_positions'], params['shape.faces'], lambda_)

We also use a slightly modified version of the Adam optimizer, that uses a uniform second moment for all parameters. This can be done by setting uniform=True when instantiating the optimizer.

[13]:
opt = mi.ad.Adam(lr=1e-1, uniform=True)

The LargeSteps class has a couple utility methods that can be used to convert back and forth between cartesian and differential representation of the vertex coordinates. The LargeSteps.to_differential method is used here to initialize the latent variable.

[14]:
opt['u'] = ls.to_differential(params['shape.vertex_positions'])

The optimisation loop must also be slighlty changed. We now need to update the shape using the latent variable, this additional step can be done by using LargeSteps.from_differential.

[15]:
iterations = 100 if 'PYTEST_CURRENT_TEST' not in os.environ else 5
for it in range(iterations):
    total_loss = mi.Float(0.0)

    for sensor_idx in range(sensor_count):
        # Retrieve the vertex positions from the latent variable
        params['shape.vertex_positions'] = ls.from_differential(opt['u'])
        params.update()

        img = mi.render(scene_source, params, sensor=sensors[sensor_idx], seed=it)

        # L1 Loss
        loss = dr.mean(dr.abs(img - ref_images[sensor_idx]))

        dr.backward(loss)
        opt.step()

        total_loss += loss[0]

    print(f"Iteration {1+it:03d}: Loss = {total_loss[0]:6f}", end='\r')
Iteration 100: Loss = 0.040712

Intermediate result#

[16]:
# Update the mesh after the last iteration's gradient step
params['shape.vertex_positions'] = ls.from_differential(opt['u'])
params.update();

Let us again plot the result of our optimization.

[17]:
final_imgs = [mi.render(scene_source, sensor=sensors[i], spp=128) for i in range(sensor_count)]
plot_images(final_imgs)
../../_images/src_inverse_rendering_shape_optimization_36_0.png

Remeshing#

The result above can be further improved with the help of remeshing. By increasing the tesselation of the mesh, we will be able to recover more details of the target shape. Intuitively, the intent of this step is similar to other “coarse-to-fine” optimization strategies. For example, in the caustics or the volume optimization tutorial we increase the resolution of texture that is being optimized over time.

We will use the Botsch-Kobbelt remeshing algorithm provided by the gpytoolbox package:

[ ]:
!pip install gpytoolbox

Since the gpytoolbox package expects NumPy arrays, we will first convert the mesh data to the correct format.

[19]:
import numpy as np
v_np = params['shape.vertex_positions'].numpy().reshape((-1,3)).astype(np.float64)
f_np = params['shape.faces'].numpy().reshape((-1,3))

The Botsch-Kobbelt remeshing algorithm takes a “target edge length” as input argument. This controls the desired tesselation of the mesh. Since we want to increase resolution, we will set this as half of the mean edge length of the current mesh.

[20]:
# Compute average edge length
l0 = np.linalg.norm(v_np[f_np[:,0]] - v_np[f_np[:,1]], axis=1)
l1 = np.linalg.norm(v_np[f_np[:,1]] - v_np[f_np[:,2]], axis=1)
l2 = np.linalg.norm(v_np[f_np[:,2]] - v_np[f_np[:,0]], axis=1)

target_l = np.mean([l0, l1, l2]) / 2

We can now run the Botsch-Kobbelt remeshing algorithm. It runs for a user-specified number of iterations, which we set to 5 here. Further details about this algorithm can be found it the package’s documentation.

[21]:
from gpytoolbox import remesh_botsch

v_new, f_new = remesh_botsch(v_np, f_np, i=5, h=target_l, project=True)

The new vertices and faces must now be passed to our Mitsuba Mesh. If the mesh has other attributes (e.g. UV coordinates), they also need to be updated. By default, Mitsuba will reset these to 0 if the vertex or face count is altered.

[22]:
params['shape.vertex_positions'] = mi.Float(v_new.flatten().astype(np.float32))
params['shape.faces'] = mi.Int(f_new.flatten())
params.update();

Since the mesh topology has changed, we also need to compute a new latent variable.

[23]:
ls = mi.ad.LargeSteps(params['shape.vertex_positions'], params['shape.faces'], lambda_)
opt = mi.ad.Adam(lr=1e-1, uniform=True)
opt['u'] = ls.to_differential(params['shape.vertex_positions'])

We had disabled indirect visibility derivatives so far, as they weren’t really needed. However, the mesh now does shadow itself a fair amount, indirect visibility effects became more important and may help with optimizing the last few details of the shape.

[24]:
integrator = mi.load_dict({
    'type': 'direct_projective'
})

Let’s continue the optimization.

[25]:
iterations = 100 if 'PYTEST_CURRENT_TEST' not in os.environ else 5
for it in range(iterations):
    total_loss = mi.Float(0.0)
    for sensor_idx in range(sensor_count):
        # Retrieve the vertex positions from the latent variable
        params['shape.vertex_positions'] = ls.from_differential(opt['u'])
        params.update()

        img = mi.render(scene_source, params, sensor=sensors[sensor_idx], seed=it, integrator=integrator)

        # L1 Loss
        loss = dr.mean(dr.abs(img - ref_images[sensor_idx]))
        dr.backward(loss)
        total_loss += loss[0]
    opt.step()

    print(f"Iteration {1+it:03d}: Loss = {total_loss[0]:6f}", end='\r')
Iteration 100: Loss = 0.015494

Final result#

We recover the final state from the latent variable:

[26]:
params['shape.vertex_positions'] = ls.from_differential(opt['u'])
params.update();

Finally, let’s compare our end result (bottom) to our reference views (top).

[27]:
final_imgs = [mi.render(scene_source, sensor=sensors[i], spp=128) for i in range(sensor_count)]

fig, ax = plt.subplots(nrows=2, ncols=sensor_count, figsize=(5*sensor_count, 10))
ax[0][0].set_title("Reference", y=0.3, x=-0.1, rotation=90, fontsize=20)
ax[1][0].set_title("Optimized shape", y=0.2, x=-0.1, rotation=90, fontsize=20)
for i in range(sensor_count):
    ax[0][i].imshow(mi.util.convert_to_bitmap(ref_images[i]))
    ax[0][i].axis('off')
    ax[1][i].imshow(mi.util.convert_to_bitmap(final_imgs[i]))
    ax[1][i].axis('off')
../../_images/src_inverse_rendering_shape_optimization_58_0.png

Note that the results could be further improved by e.g. using more input views, or using a less agressive step size and more iterations.

See also#