Floor boards add-on: minor feature added

Prompted by a question from Trevor I added a Restricted random mode to the uv randomization. It still randomizes the position of planks on the uv-map but keeps them inside the original bounds. It does this simply by not moving a uv map of a plank at all when the move would result in any portion of the plank ending up outside the bounds. The result is a compacter randomized uv-map which should make it somewhat easier to use with non-tiling textures, i.e. reduce the area of the textures that is not used.

The new version is of course available on GitHub.

The difference is illustrated in the image below where I have contrasted the random uv-map in orange with the restricted uv-map in green

random vertex colors using Numpy

The bit of research presented in a previous article was not completely academical: speed improvements using Numpy are very welcome when for example generating random vertex colors for very large meshes.
With the results of the measurements in the back of my mind and spurred by a discussion on BlenderArtists I created a new version of the random vertex colors addon. (available on GitHub), which reduced the time to generate random vertex colors for approximately 1 million faces from 3.2 seconds to 0.8 seconds on my machine (or even 0.4s for a 32 bit variant), i.e. up to 8 times faster which is not bad.

Some explanation

Assigning random vertex colors to faces means that you have to assign colors to all the loops of a polygon the same random color (for the difference between vertices and loops, check the BMesh design document. The same applies to the regular Mesh objects we use in the code below). This can be done in a straight forward manner once you have located the vertex color layer:
mesh = context.scene.objects.active.data
vertex_colors = mesh.vertex_colors.active.data
polygons = mesh.polygons
for poly in polygons:
 color = [random(), random(), random()]
 for loop_index in range(poly.loop_start, poly.loop_start + poly.loop_total):
  vertex_colors[loop_index].color = color
Straight forward as this may be, a loop inside a loop is time consuming and so is generating lots of random numbers, especially because Python does not do this in parallel even if you have a processor with multiple cores. Fortunately for us Blender comes bundled with Numpy, which is a library that can manipulate huge arrays of numbers in a very efficient manner. This allows for a much more efficient approach (although as shown previously a significant speed increase is only noticeable for large meshes).
startloop = np.empty(npolygons, dtype=np.int)
numloops = np.empty(npolygons, dtype=np.int)
polygon_indices = np.empty(npolygons, dtype=np.int)

polygons.foreach_get('index', polygon_indices)
polygons.foreach_get('loop_start', startloop)
polygons.foreach_get('loop_total', numloops)

colors = np.random.random_sample((npolygons,3))
We can even reduce storage (and get an additional speedup) if we change the types to 32 bit variants. There will be no loss of accuracy as these are the sizes used by Blender internally. (Would you do a lot of additional calculations this might be different of course). The change would only alter the declarations of the arrays:
startloop = np.empty(npolygons, dtype=np.int32)
numloops = np.empty(npolygons, dtype=np.int32)
polygon_indices = np.empty(npolygons, dtype=np.int32)

polygons.foreach_get('index', polygon_indices)
polygons.foreach_get('loop_start', startloop)
polygons.foreach_get('loop_total', numloops)

colors = np.random.random_sample((npolygons,3)).astype(np.float32)
As shown above we start out by creating Numpy array that will hold the startloop indices and the number of of loops in each polygon as well as an array that will hold the polygon indices itself. This last one isn't strictly needed for assigning random values because we don't care which random color we assign to which polygon but for other scenarios it might make more sense so we keep it here. We get all these indices from the Mesh object using the fast foreach_get method. We then use Numpy buil-in random_sample function the generate the random colors (3 random floats between 0 and 1) for all polygons.
loopcolors = np.empty((nloops,3)) # or loopcolors = np.empty((nloops,3), dtype=np.float32)
loopcolors[startloop] = colors[polygon_indices]
numloops -= 1
nz = np.flatnonzero(numloops)
while len(nz):
 startloop[nz] += 1
 loopcolors[startloop[nz]] = colors[polygon_indices[nz]]
 numloops[nz] -= 1
 nz = np.flatnonzero(numloops)
The real work is done in the code above: we first create an empty array to hold the colors for all individual loops. Then we assign all the loops with index startloop with the colors that corresponds to the color of the polygon it belongs to. Note that startloop and polygon_indices are arrays with the same length, i.e. the number of polygons. Now we also have an array numloops which holds for each polygon the number of loops. We decrement this number of loops by one for all polygons in one go (line 3) and create an array of indices of those elements in numloops that are still greater than zero (line 4). If we are still left with one or more indices (line 5) we increment the index of the startloop for all those nonzero indices (line 6).
Line 7 then assigns again in one go a polygon color to a loop at a certain index for all polygons where there are still a non zero number of loops. And finally we again reduce our numloop counters. Note that this all works because the loop indices of all loops of a polygon are consecutive.
loopcolors = loopcolors.flatten()
vertex_colors.foreach_set("color", loopcolors)
The final lines flatten the array of loop colors to a 1-D array and write back the colors to the Mesh object with the fast foreach_set method.

Discussion

Now even though we end up with about four times as much code, it is much faster while still quite readable, as long as you get your head around using index arrays. Basically our single while loops executes the inner loop of assigning colors to loops for all polygons in parallel. The penalty is for this speed increase is memory consumption: We use 5 large numpy arrays here.

Copying vertices to Numpy arrays in Blender

If you want to do a lot of mathematical work on very large Blender meshes using plain Python might be too slow. Fortunately Blender comes bundled with Numpy which allows for fast calculations on vast arrays of numerical data.
To use Numpy you would need to copy for example vertex coordinates first before performing any calculations and then copy the results back again. This consumes extra memory and time. You wouldn't do this for a single operation like determining the average position of all vertices because Python can perform this so efficiently that it wouldn't compensate the extra setup time. However, when calculating many iterations of some erosion algorithm this might be well worth the cost.
In order to help in deciding whether this setup cost is worth the effort I decided to produce some benchmark values. My aim was to find the fastest method from several alternatives and to produce actual timings. Now your timings will be different of course but the trends will be comparable and I have provided the source code for my benchmark programs on GitHub.

results, copying to a Numpy array

First we have a look at how much time it takes to copy the vertex coordinates from a mesh to a Numpy array. Given a the array of vertices me.vertices and a Numpy array verts we have several options:
# blue line
verts = np.array([v.co for v in me.vertices])

# red line
verts = np.fromiter((x for v in me.vertices for x in v.co),
                     dtype=np.float64)
verts.shape = (len(me.vertices), 3)

# orange line
verts = np.fromiter((x for v in me.vertices for x in v.co),
                     dtype=np.float64,
                     count=len(me.vertices)*3)
verts.shape = (len(me.vertices), 3)

# green line
verts = np.empty(count*3, dtype=np.float64)
me.vertices.foreach_get('co', verts)
verts.shape = shape
The blue line represents the creating of a Numpy array from a list. Even with list comprehension this has the disadvantage of creating a intermediate copy, which is also a Python list. Apparently this a costly operation as this naive method performes the worst of all.
The red line shows a huge increase in performance as we use a generator expression to access the coordinates. This will not create an intermediate list but the result is a one dimensional array. Changing the shape of an array in Numpy involves no copying so is very cheap.
We can still improve a little bit on our timing by preallocating the size for the Numpy array. The result is shown with the orange line.
The green line shows the fastest option. Blender provides a fast method to access attributes in a property collection and this method wins hands down. (thanks Linus Yng for pointing this out)

results, copying from a Numpy array

Copying data back from a Numpy array can also be done in more than one way, the most interesting ones are shown below:
# blue line
for i,v in enumerate(me.vertices):
    v.co = verts[i]

# green line
for n,v in zip(verts, me.vertices):
    v.co = n

# light blue line
def f2(n,v):
    v.co = n

for v in itertools.starmap(f2, zip(verts, me.vertices)): pass

# orange line
verts.shape = count*3
me.vertices.foreach_set('co', verts)
verts.shape = shape
The conventional methods perform not significantly different, but again Blenders built-in foreach_set method outperforms all other other options by a mile, just like its foreach_get counterpart.


Discussion

The actual timings on my machine (a intel core i7 running at 4GHz) indicate that we can copy about 1.87 Million vertex coordinates per second from a Blender mesh to a Numpy array and about 1.12 Million vertex coordinates per second from a Numpy array to a Blender mesh using 'conventional methods'. The 50% difference might be due to the fact that the coordinate data in a Blender mesh is scattered over the internal memory (as compared to Numpy's contiguous block of ram) and writing to scattered memory incurs a bigger hit form cache misses and the like than reading from it. Note that a vertex location consists of three 64 bit floating point numbers.
Using foreach_get and foreach_set this performance is greatly improved to 13.8 Million vertex coordinates per second and 10.8 Million vertex coordinates per second respectively (on my machine).
Overall, with this performance this means that on my machine even a simple scaling operation on a million vertices might already be faster with Numpy, like shown in the first example below, which on my machine is about 7x faster (not that scaling is such a perfect example, chances are Blender's scale operator is faster but it gives you an idea how much there is to gain if even for simple operations that are not natively available in Blender.
    # numpy code with fast Blender foreach
    fac = np.array([1.1, 1.1, 1.1])
    count = len(me.vertices)
    shape = (count, 3)
    
    fac = np.array([1.1, 1.1, 1.1])

    verts = np.empty(count*3, dtype=np.float64)
    me.vertices.foreach_get('co', verts)
    verts.shape = shape
    np.multiply(verts,fac,out=verts)
    verts.shape = count*3
    me.vertices.foreach_set('co', verts)
    verts.shape = shape

    # 'classic code'
    fac = Vector((1.1, 1.1, 1.1))

    for v in me.vertices:
        v.co[0] *= fac[0]
        v.co[1] *= fac[1]
        v.co[2] *= fac[2]

Benchmark availability

The .blend file with code and instructions can be downloaded from my GitHub repository.