On the usefulness of python type annotations in Blender add-ons

Yes, this is a bit of a rant, and my opinion may be a bit controversial, but I think Python type annotations are overrated when used in Blender add-on development, and perhaps even in general. Let me explain.

Annotation

The idea of annotation in Python is to add extra information to anything, be it a variable, function parameter or return type, or whatever.

One of its uses is to add type information 1 so that static type checkers can verify that variables are assigned values of a suitable type and IDEs can show additional information about a parameter for example 2.

Because Python is a dynamically types language this is not enforced at runtime, but exactly because Python is dynamically typed, adding extra information on what kind of types to expect can be incredible useful to let IDEs catch potential problems while writing your code.

For simple functions/methods with parameters of simple types, this is quite clear:

def multiply(a:float, b:float) -> float: return a * b

If we would accidentally pass a string to this function somewhere in our code, the IDE would complain. It is also quite readable, but that starts to change once we are dealing with more complex types:

from typing import Any, Iterable def convert(a: list[Iterable[Any]]) -> tuple[tuple[Any, ...], ...]: """ Convert a list of iterables to a tuple-of-tuples. """ return tuple([tuple(it)for it in a])

Not only do we have to import some types explicitly, it already starts to become difficult to parse for human brains, which goes a bit against the spirit of the Zen of Python: "Readability counts.".

Properties

In Blender we have another issue that goes beyond readability and that is to use of annotations to describe properties. Take a look at this code snippet:

VScode is not happy. Its typechecker, Pylance, reports "Call expression not allowed in type expression". And that is true for the typing specs, but Blender uses annotations for something completely different here: amount is not simply a float but a property that can hold a float value and at the same time knows how to display itself to the user, what its default is, and a lot more.

So the annotation here is used not for type checking, but for defining both the base type of the property and its behavior, something that couldn´t be described with a simple type annotation like amount: float

There is a solution of sorts for this, and that is to inform the type checker to disregard the annotation as a type hint:

Yes, this does get rid of the red squiggly line, but as you can see from the example() method, it also blocks type checking: We can now happily assign a float to amount which would cause havoc at runtime.

More type checking

Now type checking does have its benefits and if you don´t have to look at the type annotations yourself but let your IDE do the hard work, it arguably is beneficial to have libraries with comprehensive type annotations. The bpy module does not, and as far as I know there are no plans to add this, but some people have stepped in to fill the gap.

The fake-bpy-module provides a stub implementation that you can install inside a virtual environment to develop your Blender code outside of Blender while having the full benefits of comprehensively type annotated code.

It shouldn´t be confused with Blender-as-a-module which lets run Blender from Python directly for use in studio pipelines etc; fake-bpy-module doesn´t contain implementation code, it just provides type annotations. So when yo have it installed in your environment and have import bpy, your IDE knows all about the types, but you couldn´t run your code outside Blender. But would your run your code inside Blender, this import would get Blender's bundled bpy module and run just fine.

Now this is great, but be aware that this isn´t all teddy bears and unicorns: You still have the property annotation issue and sometimes the choices made in the fake-bpy-module aren't very convenient:

When I typed in def execute( in vscode, it auto expands the definition based on the type annotation of the execute() method it finds in the Operator class. At first glance this is fine, however, the return type annotation is now a copy of the type annotation of the execute() method in the superclass (i.e. Operator). So if that annotation would change, perhaps because a new possible status string would be added, we (and our IDE) wouldn´t know we could use that when we change our own code.

This apparently happens because the return type is not annotated with a class or type alias, but with a set, and this set definition is what the IDE copies:

def execute( self, context: Context ) -> set[bpy.stub_internal.rna_enums.OperatorReturnItems] ...

I am no expert and I have to think about how this could be fixed (if it can be fixed at all) before submitting a PR perhaps, but Blender's Python API is large and complex, so I don´t blame the module maintainer at all, on the contrary, their efforts for the community are greatly appreciated.

Conclusion

Should we use type annotation in Blender addons? Probably, but not everywhere.

I would argue that type annotations are a benefit if you can factor out functions, which is a good idea for (unit)testing anyway. For top level stuff, like defining an Operator class and such, you do not gain much. I simple keep an execute() function as short as possible and define the actual functionality in separate modules/functions. You can then even put # type: ignore at the top of the file with the operator definition and registration code for minimal annoyance. With this setup you even avoid the property issue: they don´t get flagged when passed to a function defined elsewhere.

In the end it's all up to personal preferences of course, but if you want to use an IDE and not disable type checking altogether, you will have to make some choices.

1: Some people argue that this is its only use, but they are clearly misguided: From the Python glossary "...used by convention as a type hint" (emphasis is mine). And this is important for Blender add-ons, see the section on properties.

2: Among other things: Python compilers, like pypy and others use it to generate optimized code for example.

Blender add-on development for beginners: A video series

Hi everybody,

as you might know I more or less stopped all my commercial work on Blender add-ons, but that doesn´t mean I stopped caring about Blender. So I started working on a video series on Blender add-on developments for beginners:

It is completely free and has a GitHub repository with code .

The first video in the series is an introduction that explains what we are going to cover and what you need to know before you start: The serie is for beginner add-on developers, but it is still coding of course, so you need to know a bit of Python already. I have tried to keep the Python code simple and readable, and we avoid nerdy stuff as much as possible.

The other videos demonstrate how to build working add-ons from scratch.  They are not necessarily very useful in and of themselves, but they show all kinds of relevant concepts and building blocks that are needed in any add-on, and that can be used in your own add-ons. By the end of the first module you should already be able to create an add-on that creates a functional menu item that performs an action on the active object. And even better, you will see that this requires only a few lines of code because Blender's Python API is very well thought out and very powerful: everything you can do as a user can be done in Python as well (and more!), and links to relevant parts of the docs are provided in the video descriptions.

If you like the series and can afford it, consider leaving me a tip on Ko-Fi. Feedback and suggestions are just as welcome, so leave any remarks or ideas in the video comments and/or create an issue in the repository. The idea is to use this feedback to create more videos in the future.


UPDATED - Blempy: easy and fast access to Blender attributes

In a previous post I mentioned future improvements for the blempy package, and the future is now 😀



I also moved the code to its own repository for easier maintenance, and created quite extensive documentation as well.

I am not going to reproduce all that text here, suffice to say that I implementated the unified attribute access I hinted at previously, so you can for example transparently access vertex color data in a few lines of code:


proxy = blempy.UnifiedAttribute(mesh, "Color")
proxy.get()
for polygon_loops in proxy:
    polygon_loops[:,:3] *= 0.5
proxy.set()

(This code will reduce the vertex color brightness by a half for all polygons in the mesh object)

Check the documentation for some more examples.


Blempy: easy and fast access to Blender attributes

This code has been improved and moved to a new repo, see this article.



While writing about the foreach_get()/foreach_set() methods and the advantages of using numpy, I realized that we often need a lot of boiler plate code just to access attributes in this way.

This was especially obvious when looking at some old code that deals with a lot of mesh attributes. In hindsight the code for that add-on isn´t all that well-structured1, partly because is needs access to all kinds of attributes.

So long story short, I decided to create a little package to streamline access to attributes and in that way make the code more readable and easier to maintain. The result is a work-in-progress, a package called blempy (a portmanteau of Blender and Numpy2)

1 Although to be fair, it is also more than 9 years old and saw numerous updates. 
2 Yes, I don´t have any imagination.


An example

Perhaps the most inconvenient sets of attributes are those that are connected to loops, i.e. face corners. Examples are uv coordinates and vertex colors and they are that inconvenient to work with because the attributes themselves are stored separately from the other face data.

To access them we would need to download three separate arrays of data: the loop_start and loop_total attribute associated with each polygon (face), and the actual data itself. If we want to access that loop data for a specific polygon, we would need to get the loop_start and loop_total attributes for that polygon to calculate the range of indices into the actual array of attributes, which is quite a bit of code.

The LoopVectorAttributeProxy class abstracts all that away and takes care of downloading everything needed as numpy arrays. The API is currently quite bare, but it does already provide a convenient iterator to access all the loops of a polygon in one go.

For example, to set a unique, but uniform grey value for each face we would need only a handful lines of code (the code assumes that the mesh variable points to a valid bpy.types.Mesh):

from blempy import LoopVectorAttributeProxy proxy = LoopVectorAttributeProxy(mesh, "vertex_colors.active.data", "color") proxy.get() for loops in proxy: grey = random() loops[:] = [grey, grey, grey, 1.0] proxy.set()

As shown, we first create a proxy for the active vertex color layer, and specifically its color attribute, and we then retrieve all the necessary data with the get() method, that will store everything behind the scenes as numpy arrays.

We then get, for each polygon, an array with all the colors for all the loops associated with that polygon, and get some random grey value as well. We then assign the same color (i.e. a list of RGBA values) to all those loop colors. Numpy will take care both of converting the python list to a numpy array as well as broadcasting that same RGBA array to all items in the loops array. By assigning the same color to all vertex corners (loops) in a polygon, that polygon will have a uniform color.

And finally we copy everything back to the mesh object.

Future development

As mentioned, the whole package is a little bare bones still, but I hope the extend its functionality in the near future. In particular I would like to create some classes that can simply be initialized with a Mesh (or even a BMesh) and a unified attribute name, to get rid of even more boiler plate. That and some more convenience functions to manipulate specific types of attributes, like (position) vectors, normals, uv-coordinates, etc.

Suggestions are welcome by the way, just create an issue in the the repository so that we can discuss it. Don´t forget to add the blempy label so it can be distinguished from any other issue.

Installation

blempy is available as a package on pypi and can be installed in the usual way:

python -m pip install blempy

But that would install the package in the default location. That is fine if you use Blender as a module, but if your are developing for the "complete" Blender, you would need to install it inside your Blender environment. A refresher on how to do that can be found in this old article.

If you are developing an addon that uses the blempy package, it is probably easiest to bundle it with your add-on, i.e. simply copy the blempy folder from the repository into your own project. By copying, you automatically fix the version, so that if blempy gets a breaking update you won´t have to deal with that immediately.

Performance of 4x4 matrix multiplications in numpy

In a previous article we compared the performance of different implementations of 3x3 matrix - vector multiplications. The conclusion was that even large arrays of vectors can be multiplied with high performance if we use highly optimized numpy functions, in particular the np.dot() function.

transforms

Warning: technical read ahead!


Multiplying by a 3x3 matrix allows us to perform scaling and rotation of vectors, but often we want more. For example, when converting between coordinate systems, like from world to local, we might not only need to rotate and scale, but translation might come in to play as well.

Now you can of course do the 3x3 multiplication first, and then add any translation vector in a second step, but that means we would need to process the array of vectors twice. This could add up quite a bit in processing costs, but there is a way to perform such a transformation in a single step: the combination of scaling, rotation and translation (technically an affine transformation) can be packed into a single, 4x4 matrix which can then be multiplied with a vector of length 4.

In Blender for example, each object has a matrix_world attribute that encodes the scale, rotation and position of an object in world space as a 4x4 matrix. This means that we could for example calculate the position of a vertex coordinate (which are stored in local space, i.e. relative to the origin of the object) in global space by multiplying its coordinate vector by this matrix (or the other way around by inverting the matrix first). The vertex coordinates are vectors of length 3 of course, so they would need to be extend to 4 elements by adding a 1. This would ensure that the translation was factored in when multiplying with the 4x4 matrix as the translation is stored in the 4th column. (if we wanted to transform a normal, we would extend it with a 0, because normals do not change due to translation.)

The is a lot more to matrix multiplication, so my advise would be to read a good book on the subject (perhaps Mathematics for 3D Game Programming and Computer Graphics by Eric Lengyel), but I hope it is clear that 4x4 matrix multiplication is very relevant for any 3D application.

So what would be the performance if we´d want to do this in python?

numbers

Lets compare the numbers from the 3x3 matrix multiplication to those of the 4x4 multiplication. The numbers are the number of seconds it takes to perform a million matrix x vector multiplications (Mops). Lower is better and going to the right we often see a slight improvement (decrease) when working on larger arrays of vectors, but this is probably not significant for very large arrays. Also, some numbers are missing for very slow implementations + large arrays because they take too long.

3x3 — seconds / Mops

The first table is from the previous article:

Method 100000 200000 500000 1000000 2000000 5000000 10000000
naive 2.4320 2.4365 2.1950 2.2037 2.1824 2.1994
comprehension 5.0050 5.0010 5.0108 5.0688 5.0245 4.9647
np_dot 0.5940 0.5885 0.5820 0.5853 0.5811 0.5871
array_np_dot 0.0040 0.0035 0.0022 0.0077 0.0035 0.0034 0.0032
array_np_einsum 0.0710 0.0445 0.0400 0.0401 0.0407 0.0408 0.0403
array_np_dot_in_place 0.0030 0.0035 0.0030 0.0033 0.0049 0.0045 0.0044

4x4 — seconds / Mops

A similar table for the 4x4 multiplications:

Method 100000 200000 500000 1000000 2000000 5000000 10000000
naive_4x 4.0480 3.7010 3.6120 3.5078
comprehension_4x4 7.3080 7.2580 7.3198 7.3427
np_dot 0.3200 0.3195 0.3196 0.3195
array_np_dot 0.0060 0.0035 0.0026 0.0036 0.0045 0.0039 0.0038
array_np_einsum 0.0760 0.0460 0.0450 0.0452 0.0463 0.0459 0.0457
array_np_dot_in_place 0.0040 0.0035 0.0036 0.0047 0.0066 0.0057 0.0056

Slowdown: 4x4 compared to 3x3 (expected: 3.2x slower)

If we divide the results we can see the relative slowdown of 4x4 multiplication compared to 3x3 multiplication:

Method 100000 200000 500000 1000000 2000000 5000000 10000000
naive 1.66 1.52 1.65 1.59
comprehension 1.46 1.45 1.46 1.45
np_dot 0.54 0.54 0.55 0.55
array_np_dot 1.50 1.00 1.18 0.47 1.29 1.17 1.18
array_np_einsum 1.07 1.03 1.13 1.13 1.14 1.12 1.13
array_np_dot_in_place 1.33 1.00 1.20 1.42 1.35 1.25 1.27


observations

The first thing we note is that 4x4 multiplication is indeed almost always slower and that is in itself no surprise. After all, a 4x4 matrix - vector multiplication takes 4 x ( 4 multiplications + 3 additions ) = 48 floating point operations, compared to the 3 x ( 3 multiplications + 2 additions ) = 15 operations. So we would expect a slowdown by a factor of 3.2 based on the number of floating point operations alone.

But what we see is that even for the pure python naive and comprehension based implementations the slowdown isn't anywhere near 3.2. This is likely because of the relatively large cost of the function calls made in python and the setup cost for the loops and generators.

Even more surprising perhaps is that when we use the np.dot() function to individually multiply each vector with the matrix we see a speed increase instead of a decrease. Because there is no python loop setup here whatsoever and the number of functions calls is the same, something different must be in play here. My conjecture is that processor caching plays an important role here, and perhaps even choices made by the underlying numpy implementation to perform those multiplications in a different way. Hard to tell without looking at the numpy code (and even then I am probably nowhere knowledgeable enough to say something relevant), just be aware that on a different machine this result might be quite different (test were performed on a AMD Ryzen 7 7700X). I might redo these timings on even larger arrays and/or with more repetitions to verify this a bit more.

The array based implementations see even less of a slowdown compare to the pure python ones. This is nice, but again not so easy to explain. Why would 3 times as much floating point operations only slow down things by about 25% or so, even when the python function call overhead is the same?

conclusions

4x4 matrix multiplications are almost just as fast 3x3 multiplications, which is nice if you want to perform transforms that also involve translations, like converting global to local coordinates.

However, if we really want to understand why this is so fast, we might want to take a closer look at the numpy code itself.

Snap! Add-on tested on Blender 5.0

The current version of Snap! seems to work without change on Blender 🥳

Availability

It is available complete with a manual from the releases section of This Repository.

Performance of matrix vector multiplications in numpy

matrix multiplication is an excellent example of a operation in Blender that is performed frequently and can benefit enormously from using numpy. If you want to perform transformations like scaling or rotation on vertex coordinates or normals, or any kind of vector, you will probably be using matrix x vector multiplication.

The Blender API has a mathutils module that provides convenient classes like Vector and Matrix, but if you are dealing with millions of vertices you are better of using the numpy package that comes bundled with Blender. Properties like vertex coordinates can easily be retrieved as numpy arrays (see this previous article), which can then be manipulated efficiently with all the available numpy functions.

In this article I'll explore some different implementations of matrix x vector multiplication and measure the performance on large arrays of vectors. The code for these experiments is available as tests/test_matrix_multiplication.py in this GitHub repository

Warning: long read!


matrix multiplication

Multiplying a vector with a matrix is straight forward: to calculate an element in the result vector, we take the corresponding column in the matrix and calculate the dot product with the input vector, i.e. we multiply each element pair and sum the result.



For example, if we want to calculate the second element in the result vector, we take the second column in the matrix and multiply each element in the column with the corresponding element in the input vector and sum those together.

pure python

A (almost) pure python implementation may look like this:

def multiply_vector_matrix(a, x): y = np.ndarray(3, dtype=np.float32) for k in range(3): y[k] = 0 for j in range(3): y[k] += a[j, k] * x[j] return y

We could have used a python list for the input and result vectors and a list of lists for the matrix, but because we want to compare this to numpy functions later, we choose to use ndarray from the beginning, which also allows us to work with the 32 bit floats that Blender uses instead of doubles.

As you can see, the code implementation follows the algorithm sketched in the previous section to the letter and contains a loop with in a loop. As is to be expected this implementation is slow: 100 thousand iterations take almost a quarter of a second, 0.2432s to be precise.

pure python with built-ins and list comprehension

As you might know, loops in python are slow, so why not use the built-in sum() function in combination with list comprehension to save us a loop?

def multiply_matrix_vector_comprehension(a, x): y = np.ndarray(3, dtype=np.float32) for k in range(3): y[k] = sum(a[j, k] * x[j] for j in range(3)) return y

The generator inside the sum saves us a loop, but a bit surprisingly perhaps, this implementation is about twice as slow: 0.5005s for 100 thousand iterations. Apparently creating a generator for such a small number (3) of elements and calling a function is more expensive than just looping and calculating.

numpy

Now lets turn our attention to numpy. Numpy has a dot() function that does exactly what we want. Multiplying a single matrix with a single vector can be implemented like this:

def multiply_matrix_vector_np_dot(a, x): return np.dot(a,x)

Yes, that is just a single function call, all the looping and calculating is implemented in C/C++ behind the scenes. Performance is therefore almost an order of magnitude faster: 0.0594s for 100 thousand vectors.

Promising, but it wouldn´t be enough if we'd want to work with tens of millions of vectors. Can we do better?

numpy v. stacks of vectors

In the previous implementation, id we wanted to multiply a matrix with a list of vectors we would have to call the function for each individual vector in turn, resulting in a lot of overhead from the function call itself and any loop surrounding it.

But numpy functions are way more powerful than that, they can figure out how to broadcast our 3x3 matrix to apply it to each individual vector in turn, and return the results as an array of vectors again. This saves a boatload of function calls and we also don´t need a loop.

The implementation looks suspiciously like the previous one:

def multiply_matrix_vector_array_np_dot(a, x): return np.dot(x, a)

This time however x should be an array of vectors, and if you look closely you will see that the arguments passed to np.dot() are reversed. This is necessary to match up the length of the individual vectors (3) with the height of the matrix. (We could have done this in many different ways, for example transposing the individual arrays, but this is the simplest way).

You might call this function something like this:

a = ... # some 3x3 matrix x = np.array([[1,0,0], [2,0,0,], [3,0,0], [4,0,0], ...]) # a long list of vectors result = multiply_matrix_vector_array_np_dot(a, x)

The result array would have the same shape as x and hold all the transformed vectors.

If we do this for 100k vectors, we see that the speed increase is enormous compared to calling dot() for each individual vector: almost two orders of magnitude at 0.0004s. This allows us easily to scale up to millions of vectors.

is there more to gain?

Numpy also has an np.einsum() function that allows for a more descriptive way to denote what combination of multiplication and summation operations to perform on arrays. This is often used to perform multiplications of large, complex arrays (tensors) in the realm of machine learning, so maybe we can use this here too:

def multiply_matrix_vector_array_np_einsum(a, x): return np.einsum("ij,jk->ik", x, a)

We are not going to cover this in any detail in this article, but if you look at the index notation you will see that it expects a list of i vectors with j columns, and multiplies those with a matrix with j rows and k columns, resulting in a list of i vectors of k columns. Because the index j is reused, einsum knows to calculate the dot product of each input vector with the column of the matrix. Sounds confusing perhaps if you are unfamiliar with this kind of notation, so this article might give you a head start.

Nevertheless, although promising, the result disappoints: 0.0071s for 100k vectors, or about an order of magnitude slower than np.dot()

The final option I tried is to see whether storing the result vectors in the original array, so calculating everything in place and perhaps saving a costly allocation of a new array made any difference. May numpy function take an out argument to select a destination array and the code might look like this:

def multiply_vector_matrix_array_np_dot_in_place(a: MAT3x3, x): x = np.dot(x, a, out=x) return x

Unfortunately, although it might save memory, performance is exactly the same as for a regular call (with the margin of error).

results

To get a more thorough understanding I have measured the elapsed time for different numbers of vectors as shown in the table below

number of vectors color 100000 200000 500000 1000000 2000000 5000000 10000000
naive red 0.2432 0.4873 1.0975 2.2037 4.3647 10.9972
comprehension blue 0.5005 1.0002 2.5054 5.0688 10.0490 24.8236
np_dot yellow 0.0594 0.1177 0.2910 0.5853 1.1621 2.9355
array_np_dot green 0.0004 0.0007 0.0011 0.0077 0.0069 0.0168 0.0323
array_np_einsum orange 0.0071 0.0089 0.0200 0.0401 0.0814 0.2042 0.4025
array_np_dot_in_place cyan 0.0003 0.0007 0.0015 0.0033 0.0097 0.0227 0.0442

(elapsed times will be different on different computers; The ten million vector results are missing for the per vector function calls because I don´t have that kind of patience 😁. Color refers to the lines in the graphs below)

This is a bit easier to interpret when we graph those results



All implementations scale linearly with the number of vectors but the versions that make use of numpy's broadcasting capabilities, i.e. do only a single function call are so much faster that they are nearly indistinguishable from the x-axis.

Those performance differences are a bit easier to see if we transform all data to the number of seconds it takes to perform a million matrix x vector operations (not that the vertical axis logarithmic)



The flat lines are again indicative of the linear behavior of the operations: it doesn't make much difference in time to perform a single matrix x vector multiplication, regardless the total number of vectors (except for slight random variations for low numbers of vectors)

conclusion

Python might be slow, but we still can perform computationally expensive operations on large arrays with blazing speed if we leverage the power of numpy. Finding the correct function in the documentation might be a challenge, but because we don´t need python loops, save on individual function calls, and benefit from highly optimized implementations, we can easily manipulate millions of vectors.

final remarks

New versions of numpy also have a np.matvec() function that I did not check because Blender 5.0 bundles a version of numpy (1.26.4) that doesn´t have this function.

Even though einsum() might be slower, in many situations its expressive power is very convenient. Some examples of what can be done with it can be found here.


No AI was hurt in the writing of this article, all text and research was creating using old skool wetware.