Small Blender Things: numpy

Showing posts with label numpy. Show all posts

Colinearity tests in Blender meshes using Numpy

I re-implemented the algorithm used in the select_colinear_edges add-on to select all edges that are co-linear with already selected edges, and I thought a little write-up with some details could be useful for some people.

Warning! Long read! (≈ 20 min)

The challenge

If we want to select a path of co-linear edges all we have to do is start from any already selected edge, check if its neighbor is co-linear and if it is, select it and proceed from there. If we are careful not to examine any edges more than once, this algorithm will be quite fast and the time will depend on the number of directly connected edges that prove to be co-linear. And even in a large mesh this is likely to be a small number.

But what if we do not require those connected edges to form an unbroken path?

Then for all initially selected edges we would have to test all other edges in the mesh for co-linearity, something that can take a very long time if the mesh contains millions of vertices and everything is implemented in Python using just the mathutils module.

The algorithm

How do you determine if two edges are co-linear?

The first step is to see if they are parallel. This is done by calculating the dot product of the two normalized direction functions. If this product is very close to 1 or -1 we consider them parallel.

(The dot product of two normalized vectors is the cosine of the angle between them)

Being parallel is a necessary condition but not a sufficient one to determine if two edges are co-linear. We also need to check if they are on the same line. This is done by first calculating the vector from anyone of the two vertices in one edge to any one of the vertices in the other edge.

E3 is parallel to E1 but the light blue between vector is not parallel to E1

If the length of this vector is zero, then the chosen vertices coincide, and the edges are co-linear. If not we check the angle between this vector and the direction vector of one of the edges, and if this is is very close to 1 or -1, the edges are co-linear.

This means that for all edges we need to calculate the normalized direction vector and for all initially selected edges we need to calculate this between vector for all other edges.

A numpy based solution

Numpy can work efficiently on vast arrays of numbers and is bundled with Blender. By using Numpy we can avoid two notoriously slow things: Python loops and calling functions.

Our function looks like this (See the function colinear_edges() in this file):

def colinear_edges(selected: np.ndarray, indices, coords, threshold):
    colinear = np.zeros_like(selected)

    # calculate direction vectors for each edge
    edge_dirs = coords[indices[:, 1]] - coords[indices[:, 0]]
    edge_dirs = edge_dirs / np.linalg.norm(edge_dirs, axis=1)[:, np.newaxis]

    for e in selected.nonzero()[0]:
        # get the direction vector of the selected edge
        dir1 = edge_dirs[e]
        # check all other edges for colinearity
        angles = np.arccos(np.clip(np.dot(dir1, edge_dirs.T), -1.0, 1.0))
        parallel = (angles < threshold) | (np.abs(angles - np.pi) < threshold)
        v1 = coords[indices[e, 0]]
        w1 = coords[indices[:, 0]]
        # vector between start points
        between = w1 - v1
        # if the vector between start points is zero, they share a vertex, so colinear
        between_length = np.linalg.norm(between, axis=1)
        connected = between_length < 1e-6
        angles_between = np.abs(
            np.arccos(
                np.clip(
                    np.dot(dir1, (between / between_length[:, np.newaxis]).T), -1.0, 1.0
                )
            )
        )
        bparallel = (angles_between < threshold) | (
            np.abs(angles_between - np.pi) < threshold
        )
        # colinear if they are parallel and either share a vertex or the angle between the direction vector and the vector between start points is less than the threshold
        colinear |= (connected | bparallel) & parallel

    return colinear

Lets explain a few important steps.

The function is called with 4 arguments, a boolean array that indicates which edges are currently selected, an array with indices (2 for each edge) that indexes the third argument an array with vertex coordinates, and a threshold value we'll discuss later. All those arrays come from a Blender Mesh object and we will see how later in this article.

Line 5+6: Here we calculate all direction vectors between the edge indices in one go, and then normalize them in a single statement by dividing each vector by its norm (i.e. length).

Line 8-10: We loop over each selected edge and get its direction vector.

Line 12: Then we calculate the angles with all other vectors. This is done by calculating the dot product between the direction vector and all other direction vectors in on go (note that we need to transpose the array of vectors for this to work). We clip the dot products between -1 and 1 to guard against any floating point inaccuracies and then use the arccos() function to calculate the angle (Remember that the dot product represents the cosine of the angle between two vectors)

Line 13: then the angle is checked against the threshold and if smaller (or very close to π, because we don´t care in which direction the vectors are aligned) we deem them parallel.

Line 14-17: then we take a vertex v1 from the first edge and all vertices w1 for each other edge, and calculate the between vector.

Line 19+20: we calculate the length all those between vectors, and for all of them determine if this length is so small we consider the vertices coincident.

Line 21-27: then we calculate all angle between the direction vector and the between vectors in the same way we did before.

Line 28-30: we then determine if the between vectors are parallel with the direction vector (or anti-parallel, because we don´t care about that)

Line 32: Finally we combine the logic and say two edges are co-linear if the are parallel AND their chosen vertices are coincident OR the angle is near zero. The result is OR-ed into the colinear array because we do this for each edge that was initially selected and want to return the combined set.

Calling colinear_edges()

If we have a Blender Mesh object we can access obj.data.edges and obj.data.vertices. The select-colinear() function takes reference to those properties and uses then to efficiently retrieving all the indices, selected status and vertex coordinates with the foreach_get() method (Line 7-9). It stores them in arrays we have created first (Line 4-6).

foreach_get() expects flat arrays, so we reshape them into their expected shape where needed (Line 10+11), before we call the colinear_edges() function discussed earlier (Line 13).

The result is a flat array with the new selected status of each edge which we store in the select attribute of the mesh edges with the foreach_set() method (Line 14).

And finally we return the number of selected edges by counting all non-zero values (True is considered non-zero too, so this works fine for arrays of booleans).

def select_colinear(edges, vertices, threshold):
    n_edges = len(edges)
    n_vertices = len(vertices)
    indices = np.empty(2 * n_edges, dtype=int)
    coords = np.empty(3 * n_vertices, dtype=float)
    selected = np.zeros(n_edges, dtype=bool)
    edges.foreach_get("vertices", indices)
    edges.foreach_get("select", selected)
    vertices.foreach_get("co", coords)
    coords = coords.reshape((n_vertices, 3))
    indices = indices.reshape((n_edges, 2))

    colinear = colinear_edges(selected, indices, coords, threshold)
    edges.foreach_set("select", colinear)
    return np.count_nonzero(colinear)

Summary

Using the foreach_get() / foreach_set() lets us easily get access to mesh properties in bulk, which allows us to use Numpy to implement an algorithm that calculates co-linearity without Python loops (except for the loop of all initially selected edges).

In exchange for a modest increase in complexity we gain a lot of performance: Although your mileage may vary of course, I could easily (in < 0.1 second) select all co-linear edges when picking one edge in a default cube that was subdivided 500 times (= around 3.5 million edges). Fast enough for me 😀

Calculating the number of connections in a tree, or why numpy is awesome but a good algorithm even more so

the issue

imagine you are creating trees, either pure data structures or ones that try to mimic real trees, and that you are interested in the number of connected branch segments. How would you calculate these numbers for each node in the tree?
If we have a small tree like the one in the illustration, where we have placed the node index inside each sphere, we might store for each node the index of its parent, like in the list: p = [ -1, 0, 1, 2, 2, 4 ]Note that node number 0 has -1 as the index of its parent to indicate it has no parent because it is the root node. Also node 3 and 4 have the same parent node because they represent a fork in a branch.

naive solution

to calculate the number of connected nodes for each node in the tree we could simply iterate over all the nodes and traverse the list of parent nodes until we reach the root node, all the while adding 1 to the number of connections. This might look like this (the numbers are profiling information)

      hits       time      t/hit

      7385       4037      0.5   for p in parents:
    683441    1627791      2.4    while p >= 0:
    676057     410569      0.6     connections[p] += 1
    676057     351262      0.5     p = parents[p]

The result has the following values c = [5, 4, 3, 0, 1, 0]The tips have no connections while the root counts all nodes as connections minus itself so our 6-node tree had a root with 5 connections. This is also illustrated in the illustration (on the right).
If we time this simple algorithm for a moderately sized tree of just over 7000 nodes (which might look like the one in the image below to get a sense of the complexity) I find that it takes about 1.16 seconds on my machine.

numpy implementation

Now we all know that numpy is good at working with large arrays of numbers (and conveniently already bundled with Blender) so we might try some simple changes. Almost identical code but with numpy arrays instead of python lists:

      hits       time      t/hit

      7385       4563     0.6   for p in self.parent[:self.nverts]:
    683441    1695443     2.5    while p >= 0:
    676057    2004637     3.0     self.connections[p] += 1
    676057     456995     0.7     p = self.parent[p]

But in fact this is slower! ( 2.68 seconds on my machine) If we look at the timings from the excellent kernprof line profiler (the numbers on the left of the code snippets) we see that numpy spends an awful lot of time on both the iteration over the parents and especially the indexing of the connection array. Apparently indexing a single element in a numpy array is slower than the equivalent operation on a python list. (There is a good explanation why indexing a single element of a numpy array takes so long on StackOverflow.)

reworking the algorithm

It must be possible to do this faster right? Yes it is, have a look at the following code:

      hits       time      t/hit

         1           2      2.0   p = self.parent[:self.nverts]
         1          19     19.0   p = p[p>=0]
       137          97      0.7   while len(p):
       136        1684     12.4    c = np.bincount(p)
       136        1013      7.4    self.connections[:len(c)] += c
       136        3395     25.0    p = self.parent[p]
       136        1547     11.4    p = p[p>=0]

This produces the same result (trust me, i'll explain in a minute), yet uses only milliseconds (7 to be precise). The trick is to use as much numpy code with looping and indexing built in as possible: bincount() counts the number of occurrences of a number and stores it the corresponding bin. So if 3 nodes have node 0 as their parent, c[0] will be 3. Next we add this list in one go to the cumulative number of connections and finally we get the parent indices also in a single statement and remove the -1 entries (those nodes that reached root in their tree traversal). We repeat this as long as there are nodes that have not yet reached root.

conclusion

Numpy might often be faster than pure Python but it pays to check, using careful profiling. Using an appropriate algorithm that does away as much as possible with any python overhead might however give you a sizeable speed improvement although the algorithm might not be immediately obvious to find.

Space tree pro

This bit of research is not just academical, I am currently investigating options to speed up tree generation as implemented in my Space Tree Pro Blender add-on (available on BlenderMarket)

random vertex colors using Numpy

The bit of research presented in a previous article was not completely academical: speed improvements using Numpy are very welcome when for example generating random vertex colors for very large meshes.
With the results of the measurements in the back of my mind and spurred by a discussion on BlenderArtists I created a new version of the random vertex colors addon. (available on GitHub), which reduced the time to generate random vertex colors for approximately 1 million faces from 3.2 seconds to 0.8 seconds on my machine (or even 0.4s for a 32 bit variant), i.e. up to 8 times faster which is not bad.

Some explanation

Assigning random vertex colors to faces means that you have to assign colors to all the loops of a polygon the same random color (for the difference between vertices and loops, check the BMesh design document. The same applies to the regular Mesh objects we use in the code below). This can be done in a straight forward manner once you have located the vertex color layer:

mesh = context.scene.objects.active.data
vertex_colors = mesh.vertex_colors.active.data
polygons = mesh.polygons
for poly in polygons:
 color = [random(), random(), random()]
 for loop_index in range(poly.loop_start, poly.loop_start + poly.loop_total):
  vertex_colors[loop_index].color = color

Straight forward as this may be, a loop inside a loop is time consuming and so is generating lots of random numbers, especially because Python does not do this in parallel even if you have a processor with multiple cores. Fortunately for us Blender comes bundled with Numpy, which is a library that can manipulate huge arrays of numbers in a very efficient manner. This allows for a much more efficient approach (although as shown previously a significant speed increase is only noticeable for large meshes).

startloop = np.empty(npolygons, dtype=np.int)
numloops = np.empty(npolygons, dtype=np.int)
polygon_indices = np.empty(npolygons, dtype=np.int)

polygons.foreach_get('index', polygon_indices)
polygons.foreach_get('loop_start', startloop)
polygons.foreach_get('loop_total', numloops)

colors = np.random.random_sample((npolygons,3))

We can even reduce storage (and get an additional speedup) if we change the types to 32 bit variants. There will be no loss of accuracy as these are the sizes used by Blender internally. (Would you do a lot of additional calculations this might be different of course). The change would only alter the declarations of the arrays:

startloop = np.empty(npolygons, dtype=np.int32)
numloops = np.empty(npolygons, dtype=np.int32)
polygon_indices = np.empty(npolygons, dtype=np.int32)

polygons.foreach_get('index', polygon_indices)
polygons.foreach_get('loop_start', startloop)
polygons.foreach_get('loop_total', numloops)

colors = np.random.random_sample((npolygons,3)).astype(np.float32)

As shown above we start out by creating Numpy array that will hold the startloop indices and the number of of loops in each polygon as well as an array that will hold the polygon indices itself. This last one isn't strictly needed for assigning random values because we don't care which random color we assign to which polygon but for other scenarios it might make more sense so we keep it here. We get all these indices from the Mesh object using the fast foreach_get method. We then use Numpy buil-in random_sample function the generate the random colors (3 random floats between 0 and 1) for all polygons.

loopcolors = np.empty((nloops,3)) # or loopcolors = np.empty((nloops,3), dtype=np.float32)
loopcolors[startloop] = colors[polygon_indices]
numloops -= 1
nz = np.flatnonzero(numloops)
while len(nz):
 startloop[nz] += 1
 loopcolors[startloop[nz]] = colors[polygon_indices[nz]]
 numloops[nz] -= 1
 nz = np.flatnonzero(numloops)

The real work is done in the code above: we first create an empty array to hold the colors for all individual loops. Then we assign all the loops with index startloop with the colors that corresponds to the color of the polygon it belongs to. Note that startloop and polygon_indices are arrays with the same length, i.e. the number of polygons. Now we also have an array numloops which holds for each polygon the number of loops. We decrement this number of loops by one for all polygons in one go (line 3) and create an array of indices of those elements in numloops that are still greater than zero (line 4). If we are still left with one or more indices (line 5) we increment the index of the startloop for all those nonzero indices (line 6).
Line 7 then assigns again in one go a polygon color to a loop at a certain index for all polygons where there are still a non zero number of loops. And finally we again reduce our numloop counters. Note that this all works because the loop indices of all loops of a polygon are consecutive.

loopcolors = loopcolors.flatten()
vertex_colors.foreach_set("color", loopcolors)

The final lines flatten the array of loop colors to a 1-D array and write back the colors to the Mesh object with the fast foreach_set method.

Discussion

Now even though we end up with about four times as much code, it is much faster while still quite readable, as long as you get your head around using index arrays. Basically our single while loops executes the inner loop of assigning colors to loops for all polygons in parallel. The penalty is for this speed increase is memory consumption: We use 5 large numpy arrays here.

Copying vertices to Numpy arrays in Blender

If you want to do a lot of mathematical work on very large Blender meshes using plain Python might be too slow. Fortunately Blender comes bundled with Numpy which allows for fast calculations on vast arrays of numerical data.
To use Numpy you would need to copy for example vertex coordinates first before performing any calculations and then copy the results back again. This consumes extra memory and time. You wouldn't do this for a single operation like determining the average position of all vertices because Python can perform this so efficiently that it wouldn't compensate the extra setup time. However, when calculating many iterations of some erosion algorithm this might be well worth the cost.
In order to help in deciding whether this setup cost is worth the effort I decided to produce some benchmark values. My aim was to find the fastest method from several alternatives and to produce actual timings. Now your timings will be different of course but the trends will be comparable and I have provided the source code for my benchmark programs on GitHub.

results, copying to a Numpy array

First we have a look at how much time it takes to copy the vertex coordinates from a mesh to a Numpy array. Given a the array of vertices me.vertices and a Numpy array verts we have several options:

# blue line
verts = np.array([v.co for v in me.vertices])

# red line
verts = np.fromiter((x for v in me.vertices for x in v.co),
                     dtype=np.float64)
verts.shape = (len(me.vertices), 3)

# orange line
verts = np.fromiter((x for v in me.vertices for x in v.co),
                     dtype=np.float64,
                     count=len(me.vertices)*3)
verts.shape = (len(me.vertices), 3)

# green line
verts = np.empty(count*3, dtype=np.float64)
me.vertices.foreach_get('co', verts)
verts.shape = shape

The blue line represents the creating of a Numpy array from a list. Even with list comprehension this has the disadvantage of creating a intermediate copy, which is also a Python list. Apparently this a costly operation as this naive method performes the worst of all.
The red line shows a huge increase in performance as we use a generator expression to access the coordinates. This will not create an intermediate list but the result is a one dimensional array. Changing the shape of an array in Numpy involves no copying so is very cheap.
We can still improve a little bit on our timing by preallocating the size for the Numpy array. The result is shown with the orange line.
The green line shows the fastest option. Blender provides a fast method to access attributes in a property collection and this method wins hands down. (thanks Linus Yng for pointing this out)

results, copying from a Numpy array

Copying data back from a Numpy array can also be done in more than one way, the most interesting ones are shown below:

# blue line
for i,v in enumerate(me.vertices):
    v.co = verts[i]

# green line
for n,v in zip(verts, me.vertices):
    v.co = n

# light blue line
def f2(n,v):
    v.co = n

for v in itertools.starmap(f2, zip(verts, me.vertices)): pass

# orange line
verts.shape = count*3
me.vertices.foreach_set('co', verts)
verts.shape = shape

The conventional methods perform not significantly different, but again Blenders built-in foreach_set method outperforms all other other options by a mile, just like its foreach_get counterpart.

Discussion

The actual timings on my machine (a intel core i7 running at 4GHz) indicate that we can copy about 1.87 Million vertex coordinates per second from a Blender mesh to a Numpy array and about 1.12 Million vertex coordinates per second from a Numpy array to a Blender mesh using 'conventional methods'. The 50% difference might be due to the fact that the coordinate data in a Blender mesh is scattered over the internal memory (as compared to Numpy's contiguous block of ram) and writing to scattered memory incurs a bigger hit form cache misses and the like than reading from it. Note that a vertex location consists of three 64 bit floating point numbers.
Using foreach_get and foreach_set this performance is greatly improved to 13.8 Million vertex coordinates per second and 10.8 Million vertex coordinates per second respectively (on my machine).
Overall, with this performance this means that on my machine even a simple scaling operation on a million vertices might already be faster with Numpy, like shown in the first example below, which on my machine is about 7x faster (not that scaling is such a perfect example, chances are Blender's scale operator is faster but it gives you an idea how much there is to gain if even for simple operations that are not natively available in Blender.

    # numpy code with fast Blender foreach
    fac = np.array([1.1, 1.1, 1.1])
    count = len(me.vertices)
    shape = (count, 3)
    
    fac = np.array([1.1, 1.1, 1.1])

    verts = np.empty(count*3, dtype=np.float64)
    me.vertices.foreach_get('co', verts)
    verts.shape = shape
    np.multiply(verts,fac,out=verts)
    verts.shape = count*3
    me.vertices.foreach_set('co', verts)
    verts.shape = shape

    # 'classic code'
    fac = Vector((1.1, 1.1, 1.1))

    for v in me.vertices:
        v.co[0] *= fac[0]
        v.co[1] *= fac[1]
        v.co[2] *= fac[2]

Benchmark availability

The .blend file with code and instructions can be downloaded from my GitHub repository.

Simulating erosion in Blender part I: Thermal erosion

For some time I wanted to create a script that could erode meshes to complement landscape add-ons like ANT.

The first step is presented here: erode.py, a python script that simulates thermal erosion, i.e. the landslides on steep hill slopes and the gradual breaking up of large boulders by freeze/thaw processes (called diffusion in the code).

It's currently a stand alone script and not yet a proper Blender add-on but it does read and produce raw mesh data that can be exported/imported in Blender (by the raw input/output bundled add-on). It can read hand crafted heightmaps but it is also possible to create some artificial shapes. In its current form the script is intended to verify the algorithms used and get a feel for the simulation parameters.

Code availability and dependencies

The script is available on GitHub. It needs Python 3.2 or newer and depends heavily on NumPy. The code is even multithreaded when the NumExpr package is installed as well and this can make a huge difference for larger grids.

Examples

Some examples are shown below. The first mesh is a mesh generated with ANT. In the second one we applied a few iterations of just the landslide algorithm, which removes the steepest slopes and in the final example we also applied diffusion, softening the landscape as a whole.

Usage

The script isn't very user friendly at the moment but does have a fair number of options:


usage: erode.py [-h] [-I ITERATIONS] [-Kd KD] [-Kh KH] [-Kp KP] [-ri] [-ro]
                [-i] [-t] [-infile INFILE] [-outfile OUTFILE] [-Gn GRIDSIZE]
                [-Gp GRIDPEAK] [-Gs GRIDSHELF] [-Gm GRIDMESA] [-Gr GRIDRANDOM]
                [-m THREADS] [-u] [-a] [-n]

Erode a terrain while assuming zero boundary conditions.

optional arguments:
  -h, --help        show this help message and exit

  -I ITERATIONS     the number of iterations
  -Kd KD            Diffusion constant
  -Kh KH            Maximum stable cliff height
  -Kp KP            Avalanche probability for unstable cliffs

  -ri               use Blender raw format for input
  -ro               use Blender raw format for output
  -i                use an inputfile (instead of just a synthesized grid)
  -infile INFILE    input filename
  -outfile OUTFILE  output filename
  -t                do not write anything to an output file

  -Gn GRIDSIZE      Gridsize (always square)
  -Gp GRIDPEAK      Add peak with given height
  -Gs GRIDSHELF     Add shelf with given height
  -Gm GRIDMESA      Add mesa with given height
  -Gr GRIDRANDOM    Add random values between 0 and given value

  -m THREADS        number of threads to use
  -u                perfom unittests
  -a                show some statistics of input and output meshes
  -n                use numexpr optimizations

Exporting and using an ANT generated mesh

Create a landscape mesh: For example with ANT but of course you can sculpt one as well or import a DEM
Export the mesh: File->Export->Raw. Make sure you enabled the add-on
Erode it: See example commandline below
Import the mesh: File->Import->Raw

Assuming you exported the grid to a file called untitled.raw, you could erode that file and put the result in a file called out.raw using the following command line:


python erode.py -Kd 0.001 -Kh 0.03 -i -infile untitled.raw -ri -outfile out.raw -ro -I 10

(Which model parameters (especially Kh) to use depends of course on the input mesh.)

Future developments/Roadmap

Erosion due to whethering isn't all that interesting in itself but serves to test the basic data structures in the script. Water erosion is the final goal so the roadmap for this script is short: fluvial erosion (creation of canyons and gullies and the deposition of sediment due to rain) will be next and when that works I'll rework the script to a proper addon. After that we'll see if there is demand for other features. I would like to add meandering but based on my current research I think that's quite some effort so I'll see if I can find the time.

Reference

This work was inspired by the work of Musgrave et al.. A more in depth an physically meaningful source worth reading is textbook by Pelletier.