Performance of numpy operations in Blender

This article is a follow-up to this one, where we introduced the foreach_get()/foreach_set() methods and had a quick look at NumPy.

I want to present a small add-on that implements different methods to determine the distance from the active camera to the closest vertex in the active mesh. As we will see, all three methods use different approaches for accessing vertex data and calculating distances, which significantly impacts performance and memory trade-offs.

1. NAIVE Implementation

What It Does

The NAIVE approach iterates over mesh.vertices using a standard Python loop and calculates distances with pure Python's per-vertex Vector math. When we measure performance it provides us with a baseline that we can compare other methods to.

The relevant code is straight forward, to get the vertex coordinates we loop over all vertices and get the co attribute (a Vector object) and return them as a Python list:

def get_vertex_positions(obj: Object) -> list[Vector]: mesh: Mesh = obj.data # type: ignore return [v.co for v in mesh.vertices]

The code to determine the closest distance is also very straight forward:

def get_closest_vertex_index_to_camera_naive( world_verts: npt.NDArray[np.float32], cam_pos: npt.NDArray[np.float32] ) -> Tuple[int, float | np.floating[Any]]: closest_distance = np.inf closest_index = -1 for vertex_index, vertex_pos in enumerate(world_verts): direction = vertex_pos - cam_pos distance = np.linalg.norm(direction) if distance < closest_distance: closest_distance = distance closest_index = vertex_index return closest_index, closest_distance

Note that the type annotation here mentions numpy NDArray, so that it will be possible to pass them, but in the NAIVE code path we will actually pass just Python lists. If we would pass numpy arrays this code would would also work.

We do use np.linalg.norm() here just to make it possible to accept ndarrays; Would we have gone for a completely naive Python/Blender approach we should have used distance = direction.length instead.

Key Trade-offs

  • Pros: It's the simplest and most readable method, and doesn´t requiry NumPy.
  • Cons: It incurs high per-vertex Python overhead (for attribute access and method calls), making it slow on large meshes.
  • Performance: Acceptable for small meshes (hundreds to low thousands of vertices). It scales poorly for 10,000+ vertices.

2. FOREACH (mesh.foreach_get / foreach_set)

What It Does

The FOREACH method uses mesh.foreach_get to perform a bulk copy of vertex coordinates into a flat, numeric buffer (typically a NumPy array). However, in the provided code path, the closest vertex is still found using a Python loop, meaning only the data transfer is accelerated (but the values are stored in an ndarray; that's why we accept those in the get_closest_vertex_index_to_camera_naive() function)

def get_vertex_positions_np(obj: Object) -> npt.NDArray[np.float32]: mesh: Mesh = obj.data # type: ignore coords = np.empty(len(mesh.vertices) * 3, dtype=np.float32) mesh.vertices.foreach_get("co", coords) return coords.reshape(-1, 3)

Key Trade-offs

  • Pros: The bulk copy significantly reduces Python attribute overhead for reading vertex data, making it much faster than NAIVE for data transfer.
  • Cons: If the distance calculation remains in a Python loop, you still keep the performance-limiting Python-loop overhead. This method also requires NumPy.
  • Performance: A good improvement over NAIVE for medium meshes, but it doesn't achieve the speed of fully vectorized numerical processing.

3. BROADCAST (NumPy Vectorized)

What It Does

The BROADCAST method also reads vertex coordinates into a NumPy array using foreach_get. Crucially, it then performs distance calculations using vectorized NumPy operations. This means no Python per-vertex loop is used; instead, np.argmin() finds the closest index.

def get_closest_vertex_index_to_camera( world_verts: npt.NDArray[np.float32], cam_pos: npt.NDArray[np.float32] ) -> Tuple[int, float | np.floating[Any]]: dists = np.linalg.norm(world_verts - cam_pos, axis=1) i = np.argmin(dists) return i, dists[i]

This function is not only a lot shorter without any Python loop in sight, but it also nicely shows the power of numpy.

world_verts is a N x 3 array, and cam_pos a 1 x 3 array. Numpy's broadcasting rules interpret this to mean we want to subtract the camera position from each vertex position, result in an N x 1 list of distances.

np.argmin() then takes this list of distance and returns the index of the smallest distance.

All this happens inside optimized C/C++ code, sidestepping the Python loop performance penalty completely.

Key Trade-offs

  • Pros: Distances are computed in highly optimized C code (via BLAS/NumPy internals), resulting in minimal Python overhead. This is the fastest approach for large meshes.
  • Cons: It requires extra memory for NumPy arrays and temporary arrays (verts_h, world_verts, dists). It also requires NumPy and careful management of array shapes and data types.
  • Performance: Best for large meshes (tens of thousands of vertices and up). While array allocation overhead may make it slightly slower for tiny meshes, it's generally still acceptable.

Performance results

If we look at the measured performance we see a clear linear dependance on the number of vertices for all methods:

Performance graph

Note that both axes are logarithmic to make it possible to graph the results of many orders of magnitude.

The naive method is too slow to go beyond 1 million vertices: The last point at 1.5M verts already takes more than 4 seconds.

The method that uses foreach_get() as about twice as fast, but that wouldn´t get us anywhere near 10 million vertices.

However, the method that uses numpy operations to calculate distances and figure out what the minimum distance is, is a whole lot faster. The last point on the yellow line clocks in at about 1 second for 25 million vertices, over a 60x improvement on the naive method.

The actual numbers may of course be different on your computer but the overall trend is pretty clear: using numpy is the way to go.

Practical Guidance & Rule of Thumb

Since the code isn´t all that more complicated, always use all the available nummpy functionality. Only in those cases where you are very contrained by memory you might consider the other approaches because the extra buffer for the foreach_get() method and any emporary arrays that numpy creates can add up when working with large numbers of vertices.