Barnes-Hut Algorithm
In this blog post we will cover some of the basics of the Barnes Hut algorithm. This is completely new to me, it is not an algorithm I've used/studied before (and I am by no means an astrophysicist). Nonetheless it has piqued my interest so I have decided to write about it. In this blog I will be talking about 2 dimensions unless otherwise stated, this just makes the resulting code run a little quicker and output easier to visualise. Modifying the 2d code to be 3d (or even higher dimension) requires only minor revisions.
The N-Body Problem
It is well understood that for $N > 2$ that the problem of determining trajectories under gravitation for N-bodies cannot be solved analytically. Instead we must rely on computation to create trajectories. From Newton's equations we can write the force $\pmb{F_{i,j}}$ between two bodies $i, j$ with masses $m_i, m_j$ and positions $\pmb{p_i}, \pmb{p_j}$ as: $$ \pmb{F_{i,j}} = \frac{G m_i m_j (\pmb{p_i} - \pmb{p_j})} {\lVert \pmb{p_i} - \pmb{p_j} \rVert^3} $$
For $N$ bodies we therefore have $\frac{N(N-1)}{2}$ forces to be calculated. For small $N$ this is not a problem, however if we want to model galaxy dynamics with thousands (or more) of bodies this becomes a computational issue.
Barnes Hut Process
With Barnes Hut we can move from $O(N^2)$ complexity to $O(Nln(N))$ - this is a significant improvement for large $N$.
The crux of the Barnes Hut scheme is to approximate the effect of "far away" bodies but calculate exactly the forces attributed to "near" bodies. To do this a quadtree data structure is used (in 3d an octree or higher dimensions a $2^d$tree). To do this the space is partitioned into 4 equal squares at depth 1 (or 8 cubes in 3d or $2^d$ hypercubes in higher dimension) - at depth 2 each of these squares is then partitioned into 4 equal squares and so on. From this a tree is constructed with each node representing the number of bodies within the quadrant at that depth. This is repeated recursively until a depth is reached where each quadrant has 1 or 0 bodies in it. It is easier to understand this with a visual example:
With this data structure in place we can begin to simulate. For each quadrant in the quadtree we can calculate a centre of mass (in the obvious way). Then given a body we can calculate the force acting on it from a given quadrant by applying the Newtonian force calculation using the centre of mass. We could, for example, apply this to the 4 quadrants at depth 1 to approximate the force acting on the body. This of course might not be particularly accurate if there are many bodies present. To overcome this the Barnes Hut algorithm specifies a critical distance $\theta$ if the distance beween the body and the centre of mass of the quadrant is greater than $\theta$ then it is used as an approximation, if not the algorithm moves to the next depth of the quadtree and tries again, this happens recursively until the distance becomes below $\theta$ or there is only 1 body in the quadrant (as such setting $\theta = 0$ returns to the $O(N^2)$ brute force approach).
While this is presented as an algorithm for the N-body problem it can be used in other situations that involve a spatial dimension and many entities. I am currently considering an application to speed up an agent based model (potentially the subject of another blog post).
Implementation
Initially I will code up an example in Python, I may revisit this and create a more efficient implementation in Cython/C++/etc. I will ignore all "complications" for example I will not give each body a size nor will it consider collisions, etc. Each body is a "point mass" in this example. For now I'll also ignore the issues relating to plotting trajectories.
We first create a node class to represent nodes within the quadtree. Using this data structure we will then apply a verlet time step to calculate positions:
from copy import deepcopy
import numpy as np
class node:
"""
A class for a node within the quadtree.
We use the terminology "child" for nodes in the next depth level - consistent with tree nomenclature
If a node is "childless" then it represents a body
"""
def __init__(self, x, y, px, py, m):
"""
Initializes a childless node
m - Mass of node
x - x-coordinate centre of mass
y - y-coordinate centre of mass
px - x- coordinate of momentum
py - y-coordinate of momentum
pos - centre of mass array
mom - momentum array
child - child node
s - side-length (depth=0 s=1)
relpos = relative position
"""
self.m = m
self.pos = np.array([x,y])
self.mom = np.array([px,py])
self.child = None
def next_quad(self):
"""
Places node in next quadrant and returns quadrant number
"""
self.s = 0.5*self.s
return self.divide_quad(1) + 2*self.divide_quad(0)
def divide_quad(self, i):
"""
Places node in next level quadrant and recomputes relative position
"""
self.relpos[i] *= 2.0
if self.relpos[i] < 1.0:
quadrant = 0
else:
quadrant = 1
self.relpos[i] -= 1.0
return quadrant
def reset_quad(self):
"""
Repositions to the zeroth depth quadrant (full space)
"""
self.s = 1.0
self.relpos = self.pos.copy()
def dist(self, other):
"""
Calculates distance between node and another node
"""
return np.linalg.norm(self.pos - other.pos)
def force_ap(self, other):
"""
Force applied from current node to other
"""
d = self.dist(other)
return (self.pos - other.pos) * (self.m * other.m / d**3)
def add_body(body, node):
"""
Adds body to a node of quadtree. A minimum quadrant size is imposed
to limit the recursion depth.
"""
new_node = body if node is None else None
min_quad_size = 1.e-5
if node is not None and node.s > min_quad_size:
if node.child is None:
new_node = deepcopy(node)
new_node.child = [None for i in range(4)]
quad = node.next_quad()
new_node.child[quad] = node
else:
new_node = node
new_node.m += body.m
new_node.pos += body.pos
quad = body.next_quad()
new_node.child[quad] = add_body(body, new_node.child[quad])
return new_node
def force_on(body, node, theta):
if node.child is None:
return node.force_ap(body)
if node.s < node.dist(body) * theta:
return node.force_ap(body)
return sum(force_on(body, c, theta) for c in node.child if c is not None)
def verlet(bodies, root, theta, G, dt):
for body in bodies:
force = G * force_on(body, root, theta)
body.mom += dt * force
body.pos += dt * body.mom / body.m
def model_step(bodies, theta, g, step):
root = None
for body in bodies:
body.reset_quad()
root = add_body(body, root)
verlet(bodies, root, theta, g, step)
########## Main Code ##########
# Parameters
Theta = 0.7
G = 1.e-6
dt = 1.e-2
N_bodies = 100
N_steps = 1000
# Fix Seed for Initialization
np.random.seed(123)
# Initial Conditions
Masses = np.random.random(N_bodies)*10
X0 = np.random.random(N_bodies)
Y0 = np.random.random(N_bodies)
PX0 = np.random.random(N_bodies) - 0.5
PY0 = np.random.random(N_bodies) - 0.5
# Initialize
Bodies = [node(x0, y0, pX0, pY0, masses) for (x0, y0, pX0, pY0, masses) in zip(X0, Y0, PX0, PY0, Masses)]
# Main Model Loop
def Model_Loop_BH(n):
for i in range(n):
model_step(Bodies, Theta, G, dt)
%timeit Model_Loop_BH(N_steps)
This code is not the most efficient possible for this model (in all likelihood one would only use Barnes-Hut for large galaxy simulations and so it's unlikely that Python would be the best choice). For 100 bodies and 1000 time steps this code runs on my machine in around 8s.
We can also code up a "brute force" approach (calculating forces between all pairs of bodies) to compare, it is likely that the cost of computing the quadtree could be more costly than just computing all forces directly when the number of bodies is suitably small. We could run these codes multiple times to find where this is the case.
import numpy as np
# from numba import njit
G = 1.e-6
dt = 1.e-2
N_bodies = 100
N_steps = 1000
# Fix Seed for Initialization
np.random.seed(123)
# Initial Conditions
Masses = np.random.random(N_bodies)*10
X = np.random.random(N_bodies)
Y = np.random.random(N_bodies)
PX = np.random.random(N_bodies) - 0.5
PY = np.random.random(N_bodies) - 0.5
pos = np.array((X,Y))
mom = np.array((PX, PY))
#@njit
def force_array(pos_arr, m_array):
n = pos_arr.shape[1]
force_arr = np.zeros((2 ,n, n))
for i in range(n):
for j in range(i):
force_arr[:, i, j] = G * m_array[i] * m_array[j] * (pos[:,i] - pos[:, j]) / np.abs((pos[:,i] - pos[:, j]))**3
force_arr[:, j, i] = - force_arr[:, i, j]
return force_arr
#@njit
def update_mom(step, mom_arr, force_arr):
n = mom_arr.shape[1]
del_mom = np.zeros_like(mom_arr)
for i in range(n):
for j in range(n):
del_mom[:, i] += step * force_arr[:, i, j]
return mom_arr + del_mom
#@njit
def update_pos(step, pos_arr, new_mom, m_arr):
return pos_arr + step * new_mom / m_arr
#@njit
def main_loop(n, pos_arr, mom_arr):
for i in range(n):
force = force_array(pos_arr, Masses)
mom_arr = update_mom(dt, mom_arr, force)
pos_arr = update_pos(dt, pos_arr, mom_arr, Masses)
return pos_arr
%timeit main_loop(N_steps, pos, mom)
This brute force code is not implemented in a particularly efficient way, however it is good enough for illustration purposes. Even for relatively small model sizes (100 bodies for 1000 time steps) the code takes considerably longer than the Barnes-Hut algorithm, with timeit giving an approximate timing of around 78s - about 10 times slower than the Barnes-Hut implementation.
If we use Numba with the njit decorator we can improve this to 3s - which while quicker is not considerably better than the Barnes-Hut algorithm. As the number of bodies increases (and perhaps some fine tuning of the Barnes-Hut parameters) we would expect even more drastic improvements.
import numpy as np
from numba import njit
G = 1.e-6
dt = 1.e-2
N_bodies = 100
N_steps = 1000
# Fix Seed for Initialization
np.random.seed(123)
# Initial Conditions
Masses = np.random.random(N_bodies)*10
X = np.random.random(N_bodies)
Y = np.random.random(N_bodies)
PX = np.random.random(N_bodies) - 0.5
PY = np.random.random(N_bodies) - 0.5
pos = np.array((X,Y))
mom = np.array((PX, PY))
@njit
def force_array(pos_arr, m_array):
n = pos_arr.shape[1]
force_arr = np.zeros((2 ,n, n))
for i in range(n):
for j in range(i):
force_arr[:, i, j] = G * m_array[i] * m_array[j] * (pos[:,i] - pos[:, j]) / np.abs((pos[:,i] - pos[:, j]))**3
force_arr[:, j, i] = - force_arr[:, i, j]
return force_arr
@njit
def update_mom(step, mom_arr, force_arr):
n = mom_arr.shape[1]
del_mom = np.zeros_like(mom_arr)
for i in range(n):
for j in range(n):
del_mom[:, i] += step * force_arr[:, i, j]
return mom_arr + del_mom
@njit
def update_pos(step, pos_arr, new_mom, m_arr):
return pos_arr + step * new_mom / m_arr
@njit
def main_loop(n, pos_arr, mom_arr):
for i in range(n):
force = force_array(pos_arr, Masses)
mom_arr = update_mom(dt, mom_arr, force)
pos_arr = update_pos(dt, pos_arr, mom_arr, Masses)
return pos_arr
%timeit main_loop(N_steps, pos, mom)
Conclusion
We have seen that even for relatively modest models the Barnes-Hut algorithm provides fairly efficient computation for the N-body problem. We would expect the improvement to be even more marked for larger models. However Python on its own is not the best place to implement this and a C++ implementation would offer better performance. With some work a Cython approach should be possible and offer performance improvements. The code would require some major re-working if Numba is to be used instead owing to the reliance on classes to generate the quadtree.