Shape optimization#


In this tutorial, we will optimize a triangle mesh to match a target shape specfied using a set of reference renderings.

Gradients with regards to vertex positions are typically extremely sparse, since only vertices located on visibility discontinuities receive a contribution. As a consequence, naively optimizing a triangle mesh generally results in horrible, tangled meshes.

To avoid this, we will use the method from the paper “Large Steps in Inverse Rendering of Geometry”. This method optimizes a latent variable that enables smoother gradients.

🚀 You will learn how to:

  • Create a sensor that batches several viewpoints together

  • Use the “large steps” algorithm to optimize a shape

  • Use remeshing to refine the optimized shape


As always, let’s import drjit and mitsuba and set a differentiation-aware variant.

import matplotlib.pyplot as plt
import os

import drjit as dr
import mitsuba as mi

mi.set_variant('cuda_ad_rgb', 'llvm_ad_rgb')

Batched rendering#

In order to accurately recover a shape, we need several reference renderings, taken from different viewpoints. Rather than rendering each viewpoint separately during the optimisation like in the volume optimisation tutorial, we will use the batch sensor, that allows us to render all views at once. There is a performance benefit to this approach: the just-in-time compiler only needs to trace the rendering process once instead of multiple times (once for each viewpoint).

Note that we also have to set the sample_border flag to True for the hdrfilm plugin. By enabling this option, Mitsuba will render regions slightly beyond the film’s boundaries, enabling us to track objects that enter or exit the frame.

sensor_count = 8
sensor = {
    'type': 'batch',
    'film': {
        'type': 'hdrfilm',
        'width': 256 * sensor_count, 'height': 256,
        'filter': {'type': 'gaussian'},
        'sample_border': True

The batch sensor expects a list of nested sensors. Here we will generate 8 viewpoints evenly distributed on the sphere, using the Fibonacci lattice.

from mitsuba import ScalarTransform4f as T

golden_ratio = (1 + 5**0.5)/2
for i in range(sensor_count):
    theta = 2 * dr.pi * i / golden_ratio
    phi = dr.acos(1 - 2*(i+0.5)/sensor_count)

    d = 5
    origin = [
        d * dr.cos(theta) * dr.sin(phi),
        d * dr.sin(theta) * dr.sin(phi),
        d * dr.cos(phi)

    sensor[f"sensor_{i}"] = {
        'type': 'perspective',
        'fov': 45,
        'to_world': T.look_at(target=[0, 0, 0], origin=origin, up=[0, 1, 0])

Scene construction#

Let’s now generate the reference renderings. We will load a scene with the previous sensor, the target mesh, and an environment map. Note the use of the direct_reparam integrator, that properly accounts for visibility discontinuities when differentiating, as in the object pose estimation tutorial.

scene_dict = {
    'type': 'scene',
    'integrator': {
        'type': 'direct_reparam',
    'sensor': sensor,
    'emitter': {
        'type': 'envmap',
        'filename': "../scenes/textures/envmap2.exr",
    'shape': {
        'type': 'ply',
        'filename': "../scenes/meshes/suzanne.ply",
        'bsdf': {'type': 'diffuse'}

scene_target = mi.load_dict(scene_dict)

We can now generate the reference image, our goal is to reconstruct Blender’s Suzanne. When using the batch sensor, the output is a single image of all viewpoints stitched together horizontally.

def plot_batch_output(out: mi.TensorXf):
    fig, ax = plt.subplots(figsize=(5*sensor_count, 5))
ref_img = mi.render(scene_target, spp=256)

The starting geometry which we’ll optimize is a relatively dense sphere. More challenging scenes and target shapes might require a better initialization.

scene_dict['shape']['filename'] = '../scenes/meshes/ico_10k.ply'
scene_source = mi.load_dict(scene_dict)

init_img = mi.render(scene_source, spp=128)

Naive optimization#

It is tempting to apply the same steps as in previous tutorials, i.e. simply render the scene, compute a loss, and differentiate. Let’s try that first.

As per usual, all that is needed is an Optimizer and a simple optimization loop.

params = mi.traverse(scene_source)
opt =
opt['shape.vertex_positions'] = params['shape.vertex_positions']
for it in range(5):
    loss = mi.Float(0.0)


    img = mi.render(scene_source, params, seed=it, spp=16)

    # L1 Loss
    loss = dr.mean(dr.abs(img - ref_img))

    print(f"Iteration {1+it:03d}: Loss = {loss[0]:6f}", end='\r')
Iteration 005: Loss = 0.065677
final_img = mi.render(scene_source, spp=128)

As you can see, even after only a few steps, this approach produces an unusable mess of triangles. This is a consequence of the sparsity of the visibility-related gradients. Indeed, these are only present on edges that are on the silhouette of the mesh for a given viewpoint.

# Reset the scene
scene_source = mi.load_dict(scene_dict)
params = mi.traverse(scene_source)

Large Steps Optimization#

Rather than directly optimizing cartesian vertex coordinates, the work of [Nicolet et al. 2021]Large Steps in Inverse Rendering of Geometry” introduces a latent representation that improves the smoothnes of gradients. In short, a latent variable \(u\) is defined as \(u = (I + \lambda L) v\), where \(v\) denotes the vertex positions, \(L\) is the (combinatorial) Laplacian of the given mesh, and \(\lambda\) is a user-defined hyper-parameter. Intuitively, this parameter \(\lambda\) defines by how much the gradients should be smoothed out on the surface.

This approach is readily available in Mitsuba in the class. It requires cholespy, a Python package to solve sparse linear systems with Cholesky factorisations.

[ ]:
!pip install cholespy

The LargeSteps helper is instanciated from the starting shapes’ vertex positions and faces. If the mesh has duplicate vertices (e.g. due to face normals or UV seams), it will internally convert the mesh to a “unique” representation.

lambda_ = 25
ls =['shape.vertex_positions'], params['shape.faces'], lambda_)

We also use a slightly modified version of the Adam optimizer, that uses a uniform second moment for all parameters. This can be done by setting uniform=True when instantiating the optimizer.

opt =, uniform=True)

The LargeSteps class has a couple utility methods that can be used to convert back and forth between cartesian and differential representation of the vertex coordinates. The LargeSteps.to_differential method is used here to initialize the latent variable.

opt['u'] = ls.to_differential(params['shape.vertex_positions'])

The optimisation loop must also be slighlty changed. We now need to update the shape using the latent variable, this additional step can be done by using LargeSteps.from_differential.

iterations = 100 if 'PYTEST_CURRENT_TEST' not in os.environ else 5
for it in range(iterations):
    loss = mi.Float(0.0)

    # Retrieve the vertex positions from the latent variable
    params['shape.vertex_positions'] = ls.from_differential(opt['u'])

    img = mi.render(scene_source, params, seed=it, spp=16)

    # L1 Loss
    loss = dr.mean(dr.abs(img - ref_img))

    print(f"Iteration {1+it:03d}: Loss = {loss[0]:6f}", end='\r')
Iteration 100: Loss = 0.005003

Intermediate result#

# Update the mesh after the last iteration's gradient step
params['shape.vertex_positions'] = ls.from_differential(opt['u'])

Let us again plot the result of our optimization.

final_img = mi.render(scene_source, spp=128)


The result above can be further improved with the help of remeshing. By increasing the tesselation of the mesh, we will be able to recover more details of the target shape. Intuitively, the intent of this step is similar to other “coarse-to-fine” optimization strategies. For example, in the caustics or the volume optimization tutorial we increase the resolution of texture that is being optimized over time.

We will use the Botsch-Kobbelt remeshing algorithm provided by the gpytoolbox package:

[ ]:
!pip install gpytoolbox

Since the gpytoolbox package expects NumPy arrays, we will first convert the mesh data to the correct format.

import numpy as np
v_np = params['shape.vertex_positions'].numpy().reshape((-1,3)).astype(np.float64)
f_np = params['shape.faces'].numpy().reshape((-1,3))

The Botsch-Kobbelt remeshing algorithm takes a “target edge length” as input argument. This controls the desired tesselation of the mesh. Since we want to increase resolution, we will set this as half of the mean edge length of the current mesh.

# Compute average edge length
l0 = np.linalg.norm(v_np[f_np[:,0]] - v_np[f_np[:,1]], axis=1)
l1 = np.linalg.norm(v_np[f_np[:,1]] - v_np[f_np[:,2]], axis=1)
l2 = np.linalg.norm(v_np[f_np[:,2]] - v_np[f_np[:,0]], axis=1)

target_l = np.mean([l0, l1, l2]) / 2

We can now run the Botsch-Kobbelt remeshing algorithm. It runs for a user-specified number of iterations, which we set to 5 here. Further details about this algorithm can be found it the package’s documentation.

from gpytoolbox import remesh_botsch

v_new, f_new = remesh_botsch(v_np, f_np, i=5, h=target_l, project=True)

The new vertices and faces must now be passed to our Mitsuba Mesh. If the mesh has other attributes (e.g. UV coordinates), they also need to be updated. By default, Mitsuba will reset these to 0 if the vertex or face count is altered.

params['shape.vertex_positions'] = mi.Float(v_new.flatten().astype(np.float32))
params['shape.faces'] = mi.Int(f_new.flatten())

Since the mesh topology has changed, we also need to compute a new latent variable.

ls =['shape.vertex_positions'], params['shape.faces'], lambda_)
opt =, uniform=True)
opt['u'] = ls.to_differential(params['shape.vertex_positions'])

Let’s continue the optimization.

iterations = 150 if 'PYTEST_CURRENT_TEST' not in os.environ else 5
for it in range(iterations):
    loss = mi.Float(0.0)

    # Retrieve the vertex positions from the latent variable
    params['shape.vertex_positions'] = ls.from_differential(opt['u'])

    img = mi.render(scene_source, params, seed=it, spp=16)

    # L1 Loss
    loss = dr.mean(dr.abs(img - ref_img))

    print(f"Iteration {1+it:03d}: Loss = {loss[0]:6f}", end='\r')
Iteration 150: Loss = 0.003975

Final result#

We recover the final state from the latent variable:

params['shape.vertex_positions'] = ls.from_differential(opt['u'])

Finally, let’s compare our end result (bottom) to our reference views (top).

final_img = mi.render(scene_source, spp=128)

fig, ax = plt.subplots(nrows=2, figsize=(5*sensor_count, 10))
ax[0].set_title("Reference", y=0.3, x=-0.005, rotation=90, fontsize=20)
ax[1].set_title("Optimized shape", y=0.2, x=-0.005, rotation=90, fontsize=20)
for i, img in enumerate((ref_img, final_img)):

Note that the results could be further improved by e.g. using more input views, or using a less agressive step size and more iterations.

See also#