Random points on a unit sphere in OSL, code and benchmarks

In preparation to adapting the wood knot shader to have randomly oriented cylindrical knots rather than spherical knots I needed a function that generates vectors that are uniformly distributed over the surface of a unit sphere. This is not as simple as creating a vector with three random components but several methods exist that will produce vectors with a correct distribution.

A straight forward method (implemented in the function random_sphere() shown below) consists of generating correctly distributed spherical coordinates and the converting them to cartesian coordinates. This works fine but because trigonometric functions (like acos() and sincos()) used here) may be expensive, several alternatives exist that do not use these functions.

In the code presented here we have implemented the methods of Marsaglia (random_sphere1()) and Cook (random_sphere2()). Both are rejection methods: they discard some random numbers when they would lead to invalid vectors. This is wasteful so the question is: are these methods really more efficient on modern hardware where trigonometric functions are implemented as cpu operations?

Some timings

The timings presented below were measured on a 64-bit Amd 6-core cpu with the shader provided below. They might be completely different for other CPUs and probably even more so once OSL shaders will be able to run on a GPU.

nrandom_sphererandom_sphere1 random_sphere2
100 3.6 3.4 7.5
500 10.4 10.1 -
First thing to note is that the timings for 100 and 500 vectors do not scale proportionally because some of the dots we draw with our shader overlap, effectively reducing the number of vectors we have to generate for each shading sample. The other thing is that random_sphere2 is much slower than the other implementations, probably because we generate more random numbers and reject a lot more combinations.

As for the difference between the other two methods: the difference is probably significant but too small to make a real difference on my CPU. I'll probably check again when I get another CPU or OSL shaders will run on a GPU, but for now I stick with the most straightforward method.

Code and node setup


// generate random unit vectors randomly distributed over a sphere
// straight forward method
vector random_sphere(point p, int n){
float t = M_2PI*noise("cell",p,n*2+0);
float u = 2*noise("cell",p,n*2+1)-1;
float s,c,a;
sincos(t,s,c);
a = sqrt(1-u*u);
float x = a*c;
float y = a*s;
float z = u;
return vector(x,y,z);
}

// marsaglia's method
vector random_sphere1(point p, int m){
vector v = 0;
float repeat = 0;
int n = m + 1;
while(1){
repeat++;
float r0 = 2*noise("cell",p,repeat*n*2+0)-1;
float r1 = 2*noise("cell",p,repeat*n*2+1)-1;
float r02 = r0*r0;
float r12 = r1*r1;
float sr2 = r02 + r12;
if( sr2 < 1 ){
float x = 2*r0*sqrt(1-r02-r12);
float y = 2*r1*sqrt(1-r02-r12);
float z = 1-2*sr2;
v = vector(x,y,z);
break;
}
}
return v;
}

// cook's method
vector random_sphere2(point p, int m){
vector v = 0;
float repeat = 0;
int n = m + 1;
while(1){
repeat++;
float r0 = 2*noise("cell",p,repeat*n*4+0)-1;
float r1 = 2*noise("cell",p,repeat*n*4+1)-1;
float r2 = 2*noise("cell",p,repeat*n*4+2)-1;
float r3 = 2*noise("cell",p,repeat*n*4+3)-1;
float r02 = r0*r0;
float r12 = r1*r1;
float r22 = r2*r2;
float r32 = r3*r3;
float sr2 = r02 + r12 + r22 + r32;
if( sr2 < 1 ){
float x = 2*(r1*r3 + r0*r2)/sr2;
float y = 2*(r2*r3 - r0*r1)/sr2;
float z = (r02 + r32 - r12 - r22)/sr2;
v = vector(x,y,z);
break;
}
}
return v;
}

shader sphere_test(
vector p = P,
int n = 100,
float R = 0.03,
int method = 0,

output float Fac = 0
){
for(int i=0; i < n; i++){
vector v = 0;
if( method == 0 ){
v= random_sphere(point(0,0,0),i);
}else if( method == 1){
v= random_sphere1(point(0,0,0),i);
}else{
v= random_sphere2(point(0,0,0),i);
}
if( distance(point(0,0,0),1000*v,p ) < R ){
Fac = 1;
break;
}
}
}

The way we generate random numbers might seem strange but not only do we want to be able to generate any number of vectors for a given cell but in the rejection methods we also want to be able to generate replacements that are guaranteed to be different, hence the multiplication by the number of times we have to repeat.

The image shows the node setup used to verify the shader and time the different implementtations

No comments:

Post a Comment