The N-Body Problem

It is well understood that for $N > 2$ that the problem of determining trajectories under gravitation for N-bodies cannot be solved analytically. Instead we must rely on computation to create trajectories. From Newton's equations we can write the force $\pmb{F_{i,j}}$ between two bodies $i, j$ with masses $m_i, m_j$ and positions $\pmb{p_i}, \pmb{p_j}$ as: $$ \pmb{F_{i,j}} = \frac{G m_i m_j (\pmb{p_i} - \pmb{p_j})} {\lVert \pmb{p_i} - \pmb{p_j} \rVert^3} $$

For $N$ bodies we therefore have $\frac{N(N-1)}{2}$ forces to be calculated. For small $N$ this is not a problem, however if we want to model galaxy dynamics with thousands (or more) of bodies this becomes a computational issue.

Barnes Hut Process

With Barnes Hut we can move from $O(N^2)$ complexity to $O(Nln(N))$ - this is a significant improvement for large $N$.

The crux of the Barnes Hut scheme is to approximate the effect of "far away" bodies but calculate exactly the forces attributed to "near" bodies. To do this a quadtree data structure is used (in 3d an octree or higher dimensions a $2^d$tree). To do this the space is partitioned into 4 equal squares at depth 1 (or 8 cubes in 3d or $2^d$ hypercubes in higher dimension) - at depth 2 each of these squares is then partitioned into 4 equal squares and so on. From this a tree is constructed with each node representing the number of bodies within the quadrant at that depth. This is repeated recursively until a depth is reached where each quadrant has 1 or 0 bodies in it. It is easier to understand this with a visual example:

With this data structure in place we can begin to simulate. For each quadrant in the quadtree we can calculate a centre of mass (in the obvious way). Then given a body we can calculate the force acting on it from a given quadrant by applying the Newtonian force calculation using the centre of mass. We could, for example, apply this to the 4 quadrants at depth 1 to approximate the force acting on the body. This of course might not be particularly accurate if there are many bodies present. To overcome this the Barnes Hut algorithm specifies a critical distance $\theta$ if the distance beween the body and the centre of mass of the quadrant is greater than $\theta$ then it is used as an approximation, if not the algorithm moves to the next depth of the quadtree and tries again, this happens recursively until the distance becomes below $\theta$ or there is only 1 body in the quadrant (as such setting $\theta = 0$ returns to the $O(N^2)$ brute force approach).

While this is presented as an algorithm for the N-body problem it can be used in other situations that involve a spatial dimension and many entities. I am currently considering an application to speed up an agent based model (potentially the subject of another blog post).

Implementation

Initially I will code up an example in Python, I may revisit this and create a more efficient implementation in Cython/C++/etc. I will ignore all "complications" for example I will not give each body a size nor will it consider collisions, etc. Each body is a "point mass" in this example. For now I'll also ignore the issues relating to plotting trajectories.

We first create a node class to represent nodes within the quadtree. Using this data structure we will then apply a verlet time step to calculate positions:

from copy import deepcopy
import numpy as np

class node:
    """
    A class for a node within the quadtree.
    
    We use the terminology "child" for nodes in the next depth level - consistent with tree nomenclature
    If a node is "childless" then it represents a body
    """
    def __init__(self, x, y, px, py, m):
        """
        Initializes a childless node
        
        m - Mass of node
        x - x-coordinate centre of mass
        y - y-coordinate centre of mass
        px - x- coordinate of momentum
        py - y-coordinate of momentum
        pos - centre of mass array
        mom - momentum array
        child - child node
        s - side-length (depth=0 s=1)
        relpos = relative position
        """
        self.m = m
        self.pos = np.array([x,y])
        self.mom = np.array([px,py])
        self.child = None
        
    def next_quad(self):
        """
        Places node in next quadrant and returns quadrant number
        """
        self.s = 0.5*self.s
        return self.divide_quad(1) + 2*self.divide_quad(0)
    
    def divide_quad(self, i):
        """
        Places node in next level quadrant and recomputes relative position
        """
        self.relpos[i] *= 2.0
        if self.relpos[i] < 1.0:
            quadrant = 0
        else:
            quadrant = 1
            self.relpos[i] -= 1.0
        return quadrant
    
    def reset_quad(self):
        """
        Repositions to the zeroth depth quadrant (full space)
        """
        self.s = 1.0
        self.relpos = self.pos.copy()
        
        
    def dist(self, other):
        """
        Calculates distance between node and another node
        """
        return np.linalg.norm(self.pos - other.pos)
    
    def force_ap(self, other):
        """
        Force applied from current node to other
        """
        d = self.dist(other)
        return (self.pos - other.pos) * (self.m * other.m / d**3)
    

def add_body(body, node):
    """
    Adds body to a node of quadtree. A minimum quadrant size is imposed
    to limit the recursion depth.
    """
    new_node = body if node is None else None
    min_quad_size = 1.e-5
    if node is not None and node.s > min_quad_size:
        if node.child is None:
            new_node = deepcopy(node)
            new_node.child = [None for i in range(4)]
            quad = node.next_quad()
            new_node.child[quad] = node
        else:
            new_node = node

        new_node.m += body.m
        new_node.pos += body.pos
        quad = body.next_quad()
        new_node.child[quad] = add_body(body, new_node.child[quad])
    return new_node

def force_on(body, node, theta):
    if node.child is None:
        return node.force_ap(body)
    
    if node.s < node.dist(body) * theta:
        return node.force_ap(body)

    return sum(force_on(body, c, theta) for c in node.child if c is not None)

def verlet(bodies, root, theta, G, dt):
    for body in bodies:
        force = G * force_on(body, root, theta)
        body.mom += dt * force
        body.pos += dt * body.mom / body.m

def model_step(bodies, theta, g, step):
    root = None
    for body in bodies:
        body.reset_quad()
        root = add_body(body, root)
    verlet(bodies, root, theta, g, step)

########## Main Code ##########
# Parameters
Theta = 0.7
G = 1.e-6
dt = 1.e-2
N_bodies = 100
N_steps = 1000

# Fix Seed for Initialization
np.random.seed(123)

# Initial Conditions
Masses = np.random.random(N_bodies)*10
X0 = np.random.random(N_bodies)
Y0 = np.random.random(N_bodies)
PX0 = np.random.random(N_bodies) - 0.5
PY0 = np.random.random(N_bodies) - 0.5

# Initialize
Bodies = [node(x0, y0, pX0, pY0, masses) for (x0, y0, pX0, pY0, masses) in zip(X0, Y0, PX0, PY0, Masses)]

# Main Model Loop
def Model_Loop_BH(n):
    for i in range(n):
        model_step(Bodies, Theta, G, dt)

%timeit Model_Loop_BH(N_steps)

7.97 s ± 446 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This code is not the most efficient possible for this model (in all likelihood one would only use Barnes-Hut for large galaxy simulations and so it's unlikely that Python would be the best choice). For 100 bodies and 1000 time steps this code runs on my machine in around 8s.

We can also code up a "brute force" approach (calculating forces between all pairs of bodies) to compare, it is likely that the cost of computing the quadtree could be more costly than just computing all forces directly when the number of bodies is suitably small. We could run these codes multiple times to find where this is the case.

import numpy as np
# from numba import njit

G = 1.e-6
dt = 1.e-2
N_bodies = 100
N_steps = 1000

# Fix Seed for Initialization
np.random.seed(123)

# Initial Conditions
Masses = np.random.random(N_bodies)*10
X = np.random.random(N_bodies)
Y = np.random.random(N_bodies)
PX = np.random.random(N_bodies) - 0.5
PY = np.random.random(N_bodies) - 0.5
pos = np.array((X,Y))
mom = np.array((PX, PY))

#@njit
def force_array(pos_arr, m_array):
    n = pos_arr.shape[1]
    force_arr = np.zeros((2 ,n, n))
    for i in range(n):
        for j in range(i):
                force_arr[:, i, j] = G * m_array[i] * m_array[j] * (pos[:,i] - pos[:, j]) / np.abs((pos[:,i] - pos[:, j]))**3
                force_arr[:, j, i] = - force_arr[:, i, j]
    return force_arr

#@njit
def update_mom(step, mom_arr, force_arr):
    n = mom_arr.shape[1]
    del_mom = np.zeros_like(mom_arr)
    for i in range(n):
        for j in range(n):
            del_mom[:, i] += step * force_arr[:, i, j]
    return mom_arr + del_mom

#@njit
def update_pos(step, pos_arr, new_mom, m_arr):
    return pos_arr + step * new_mom / m_arr

#@njit
def main_loop(n, pos_arr, mom_arr):
    for i in range(n):    
        force = force_array(pos_arr, Masses)
        mom_arr = update_mom(dt, mom_arr, force)
        pos_arr = update_pos(dt, pos_arr, mom_arr, Masses)
    return pos_arr

%timeit main_loop(N_steps, pos, mom)

1min 18s ± 1.89 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

This brute force code is not implemented in a particularly efficient way, however it is good enough for illustration purposes. Even for relatively small model sizes (100 bodies for 1000 time steps) the code takes considerably longer than the Barnes-Hut algorithm, with timeit giving an approximate timing of around 78s - about 10 times slower than the Barnes-Hut implementation.

If we use Numba with the njit decorator we can improve this to 3s - which while quicker is not considerably better than the Barnes-Hut algorithm. As the number of bodies increases (and perhaps some fine tuning of the Barnes-Hut parameters) we would expect even more drastic improvements.

import numpy as np
from numba import njit

G = 1.e-6
dt = 1.e-2
N_bodies = 100
N_steps = 1000

# Fix Seed for Initialization
np.random.seed(123)

# Initial Conditions
Masses = np.random.random(N_bodies)*10
X = np.random.random(N_bodies)
Y = np.random.random(N_bodies)
PX = np.random.random(N_bodies) - 0.5
PY = np.random.random(N_bodies) - 0.5
pos = np.array((X,Y))
mom = np.array((PX, PY))

@njit
def force_array(pos_arr, m_array):
    n = pos_arr.shape[1]
    force_arr = np.zeros((2 ,n, n))
    for i in range(n):
        for j in range(i):
                force_arr[:, i, j] = G * m_array[i] * m_array[j] * (pos[:,i] - pos[:, j]) / np.abs((pos[:,i] - pos[:, j]))**3
                force_arr[:, j, i] = - force_arr[:, i, j]
    return force_arr

@njit
def update_mom(step, mom_arr, force_arr):
    n = mom_arr.shape[1]
    del_mom = np.zeros_like(mom_arr)
    for i in range(n):
        for j in range(n):
            del_mom[:, i] += step * force_arr[:, i, j]
    return mom_arr + del_mom

@njit
def update_pos(step, pos_arr, new_mom, m_arr):
    return pos_arr + step * new_mom / m_arr

@njit
def main_loop(n, pos_arr, mom_arr):
    for i in range(n):    
        force = force_array(pos_arr, Masses)
        mom_arr = update_mom(dt, mom_arr, force)
        pos_arr = update_pos(dt, pos_arr, mom_arr, Masses)
    return pos_arr

%timeit main_loop(N_steps, pos, mom)

3 s ± 33.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Conclusion

We have seen that even for relatively modest models the Barnes-Hut algorithm provides fairly efficient computation for the N-body problem. We would expect the improvement to be even more marked for larger models. However Python on its own is not the best place to implement this and a C++ implementation would offer better performance. With some work a Cython approach should be possible and offer performance improvements. The code would require some major re-working if Numba is to be used instead owing to the reliance on classes to generate the quadtree.