This is the fourth blog post in a series - you can find the previous blog post here

Setup

In this blog we will limit ourselves to trying to find minimum energy states of the Ising model - we could use these schemes in order to estimate other thermodynamic properties also. The models will be in 2-dimensions on a square lattice since the results of these will be easier to display, it should be fairly obvious how we could modify this code for other dimensions. We will also use Gaussian distributed interaction strengths since we do not require mathematical tractability when we are simulating.

Interactions

Unlike the Sherrington-Kirkpatrick model we will have to be a bit "clever" in defining the interaction strengths. I have decided to do this using 4 (NxN) arrays representing the up, down, left and right interaction strengths. This is a bit wasteful in terms of memory use but I think it is worth the trade-off to improve readability of the code. We then have the following symmetries:

up[i,j] == down[i-1,j]
left[i,j] == right[i,j-1]

Where these indices exist, for the boundary coniditons we will take the following:

up[0,j] == 0
down[N-1,j] == 0
left[i,0] == 0
right[i,N-1] == 0

For all possible $i, j$. This is a fixed boundary condition, the edge cases will have only 3 neighbours and corner cases only 2. This can introduce some instability for small spin-glasses but this is fine for our purposes. Modifying this to allow for periodic boundary conditions (i.e. a toroidal geometry) would not be particularly difficult.

Mixing, Convergence, etc.

The methods presented below are Markov-Chain Monte-Carlo (MCMC) methods. These methods are notoriously finicky and require parameter tuning, running multiple chains, etc. in order to ensure good performance. As to not get distracted in this blog post I will not mention these concerns but please keep them in mind when reading on. For now we will run a single chain and look at the results it produces, in practice one would not do this.

Local Update Methods

We begin by looking at local update methods. In the Sherrington-Kirkpatrick blog post we looked at one such (very bad) method: the greedy gradient descent. Local update methods mean at each simulation step we update 1 site only.

Metropolis-Hastings

First we look at the "classic" approach to simulating the Ising model. In many texts and online references this will be the only method that is presented, it leads some to believe (incorrectly) that the simulation method is somehow part of the Ising model itself. This is the method I presented (without elaboration) in my Cython/Numba blog post.

We present Metropolis-Hastings in it's most general form first (our application to the Ising model will appear below):

Initialize the system in state $x_t$
Sample a new proposed state from some distribution $Q(x'_{t+1} | x_t)$
Accept this new state with probability $min\left[\alpha, 1 \right]$, else $x_{t+1} = x_t$, where: $$ \alpha = \frac{P(x'_{t+1})Q(x_t |x'_{t+1})} {P(x_t)Q(x'_{t+1} |x_{t})} $$
Repeat steps 2. and 3.

In this context $P(.)$ represents the probability distribution we wish to estimate. We notice that this does not need to be standardized (i.e. we can ignore any difficult partition functions) so this is a (relatively) easy algorithm to implement. We find that the sequence $x_t$ forms a Markov-Chain - hence the term Markov-Chain Monte Carlo (MCMC).

We can see that that Metropolis-Hastings consists of 2 steps: a proposal step then an acceptance/rejection step. We see that there are essentially no constraints on $Q(.)$ so we can use something easy to sample from. We only need to be careful that it has a support that contains the support of $P(.)$ - otherwise we will not be able to sample all possible values as required. Selecting a suitable $Q(.)$ is at the heart of the algorithm however, the best performance will be where $Q$ and $P$ are very similar. Since $P$ is generally unknown in advance this often requires a bit of trial and error. There are many ways to test whether the algorithm is doing a "good job" but we will not cover them here (in a later set of blog posts I should probably go deeper into MCMC methods - here we're just concerned with the Ising model in particular).

Moving back to the Ising model picture, we can apply the Metropolis-Hastings methodology via:

First pick a random site
Calculate the change in energy associated with "flipping" this sites spin (going from +1 to -1 or vice versa)
If the energy decreases accept the flipped state and start again with a new site
If energy increases accept the flipped state with some acceptance probability or keep the same state
Begin again

We can see that in contrast to the "greedy hill climber" approach in the Sherrington-Kirkpatrick blog post we are not simply declining into a local minima, there is some probability that we can jump to a state of higher energy and escape a local minima to find a "better" one. In theory if we wait long enough we should find ourselves in a global minimum state. We now derive the acceptance probability as given by Metropolis-Hastings:

Recall that the Gibbs distribution for the Ising model with a given configuration is: $$ P(\sigma) = \frac{exp(-\beta H_{\sigma})}{Z_{\beta}} $$

Since we are uniformly selecting new states the $Q(.)$ terms in $\alpha$ cancel. If we denote the propsed state as: $\hat{\sigma}$ then the relative likelihood is: $$ \alpha = \frac{P(\hat{\sigma})}{P(\sigma)} = \frac{exp(-\beta H_{\hat{\sigma}})}{exp(-\beta H_{\sigma})} = exp\left(-\beta \left(H_{\hat{\sigma}} - H_{\sigma}\right)\right) = exp(-\beta \Delta H) $$

To summarise by Metropolis-Hastings we then accept the new configuration with probability: $$ p_{flip} = min\left[1, exp(-\beta \Delta H) \right] $$

In this particular case we can calculate $\Delta H$ quite efficiently since we are changing the spin of only 1 site, the contributions of all other sites "cancels out". We can write this down as: $$ \Delta H = 2\sigma_{i,j} \sum_{(x,y) \in \langle i, j \rangle} J_{x,y} \sigma_{x,y} $$

This introduces some efficiency in a code implementation since we do not have to calculate the energy of the whole configuration each time we want to update a site's spin.

An example implementation of this can be seen below:

# An implementation of an Ising spin-glass of size NxN
# With fixed boundary conditions using Metropolis-Hastings
# Connectivity is initialized as a Gaussian distribution N(0, s^2/N)
# Updates occur at randomly selected sites

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Fix random seed
np.random.seed(123)

# Set size of model N and initial spins
N = 32
spins = np.random.choice([-1, 1], (N,N))

# Fix number of timesteps and some containers
timesteps = 10000
mag = np.zeros(timesteps+1)
energy = np.zeros(timesteps+1)

# Initialize interaction arrays
# Have 4 arrays: up, down, left right
# These represent the interaction strengths to the 
# up/down/left/right neighbours of a site
# There is a symmetry between these matrices
# This is not the most memory efficient solution
s_h = 1
s_v = 1
up = np.zeros((N,N))
down = np.zeros((N,N))
left = np.zeros((N,N))
right = np.zeros((N,N))

up[1:N,:] = np.random.rand(N-1,N) * s_v
down[0:N-1,:] = up[1:N,:]
left[:,1:N] = np.random.rand(N,N-1) * s_h
right[:,0:N-1] = left[:,1:N]

mag[0] = spins.sum()

for i in range(N):
    for j in range(N):
        if i == 0:
            up_neighbour = 0
            down_neighbour = spins[i+1,j]
        elif i == N-1:
            up_neighbour = spins[i-1,j]
            down_neighbour = 0
        else:
            up_neighbour = spins[i-1,j]
            down_neighbour = spins[i+1,j]
        if j == 0:
            left_neighbour = 0
            right_neighbour = spins[i,j+1]
        elif j == N-1:
            left_neighbour = spins[i,j-1]
            right_neighbour = 0
        else:
            left_neighbour = spins[i,j-1]
            right_neighbour = spins[i,j+1]
        
        energy[0] += spins[i,j]*(up[i,j]*up_neighbour + down[i,j]*down_neighbour + left[i,j]*left_neighbour + right[i,j]*right_neighbour)

# Avoid double count - each neighbour pair
# counted twice in above since loop over each site
energy[0] /= 2

# Fix beta (inverse temerature) - from analysis we know that
# system in glassy-phase for T<s so beta>1/s. Performance
# of random updates isn't good so don't select temperature
# too low
beta = 1

# Define proposal step
def proposal(s_array):
    _N = s_array.shape[0]
    return np.random.choice(_N, 2)

def energy_change(spin_site, bt, s_array, up_array, down_array, left_array, right_array):
    i = spin_site[0]
    j = spin_site[1]
    
    if i == 0:
        up_neighbour = 0
        down_neighbour = s_array[i+1,j]
    elif i == N-1:
        up_neighbour = s_array[i-1,j]
        down_neighbour = 0
    else:
        up_neighbour = s_array[i-1,j]
        down_neighbour = s_array[i+1,j]
    if j == 0:
        left_neighbour = 0
        right_neighbour = s_array[i,j+1]
    elif j == N-1:
        left_neighbour = s_array[i,j-1]
        right_neighbour = 0
    else:
        left_neighbour = s_array[i,j-1]
        right_neighbour = s_array[i,j+1]
    
    dE_tmp = 2*s_array[i,j]*(up_array[i,j]*up_neighbour + down_array[i,j]*down_neighbour + left_array[i,j]*left_neighbour + right_array[i,j]*right_neighbour)
    return dE_tmp

def acceptance(bt, energy):
    if energy <= 0:
        return -1
    else:
        prob = np.exp(-bt*energy)
        if prob > np.random.random():
            return -1
        else:
            return 1


# Define update step
dE = 0
dM = 0

def update(bt, s_array, up_array, down_array, left_array, right_array):
    global dE
    global dM
    
    # Proposal Step
    site = proposal(s_array)
    
    # Calculate energy change
    dE = energy_change(site, bt, s_array, up_array, down_array, left_array, right_array)
    dM = -2*s_array[site[0],site[1]]
    
    # Acceptance step
    accept = acceptance(bt, dE)
    
    if accept == -1:
        s_array[site[0], site[1]] *= -1
    else:
        dE = 0
        dM = 0
    
    return s_array

def _main_loop(ts, s_array, up_array, down_array, left_array, right_array):
    s_temp = s_array.copy()
    for i in range(ts):
        update_step = update(beta, s_temp, up_array, down_array, left_array, right_array)
        s_temp = update_step
        energy[i+1] = energy[i] + dE
        mag[i+1] = mag[i] + dM

#### Run Main Loop
_main_loop(timesteps, spins, up, down, left, right)
mag = mag / (N**2)
energy = energy / (N**2)

# plot magnetism and energy evolving in time
fig, ax1 = plt.subplots()
ax1.set_xlabel("Time step")
ax1.set_ylabel("Magnetism", color='blue')
ax1.plot(mag, color='blue')

ax2 = ax1.twinx()
ax2.set_ylabel("Energy", color='red')
ax2.plot(energy, color='red')

plt.show()

Gibbs Sampling

Many presentations of the Ising model only show the Metropolis-Hastings scheme, as such there is a misconception that the Metropolis-Hastings sampling is somehow part of the Ising model itself. This is not true, an alternative to Metropolis-Hastings is Gibbs-Sampling. We can describe Gibbs-Sampling in general terms as:

Pick an initial state $\pmb{x_t} = \left(x^1_t, x^2_t, ... , x^N_t\right)$
For each $N$ dimension sample: $x^i_{t+1} \sim P\left(x^i_{t+1} | x^i_{t+1}, x^2_{t+1}, ... , x^{i-1}_{t+1}, x^{i+1}_t, ... x^N_i\right)$ and define next state as: $\pmb{x_{i+1}} = \left(x_1^{i+1}, x_2^{i+1}, ... , x_N^{i+1}\right)$
Repeat sampling for each successive state

We can see this is just a special case of Metropolis-Hastings, if we denote $\pmb{x_t^{-j}} = \left(x_t^1, x_t^2, ... , x_t^{j-1}, x_t^{j+1}, ..., x_t^N\right)$ (all components apart from the jth). Then we can set: $$Q(x_{t+1}^j \pmb{x_t^{-j}} | \pmb{x_t} ) = P(x_{t+1}^j | \pmb{x_t^{-j}})$$ In the Metropolis-Hastings scheme, the acceptance probablilty then becomes: \begin{align} min\left[1 , \frac{Q(x_{t+1}^j \pmb{x_t^{-j}} | \pmb{x_t} ) P(\pmb{x_t})}{Q(\pmb{x_t} | x_{t+1}^j \pmb{x_t^{-j}} ) P(x_{t+1}^j \pmb{x_t^{-j}})} \right] & = min \left[1 , \frac{P(\pmb{x_t})P(x_{t+1}^j | \pmb{x_t^{-j}})}{P(x_{t+1}^j \pmb{x_t^{-j}})P(x_{t}^j | \pmb{x_t^{-j}})} \right] \\ &= min\left[1 , \frac{P(x_{t}^j | \pmb{x_t^{-j}})P(\pmb{x_t^{-j}})P(x_{t+1}^j | \pmb{x_t^{-j}})}{P(x_{t+1}^j | \pmb{x_t^{-j}})P(\pmb{x_t^{-j}})P(x_{t}^j | \pmb{x_t^{-j}})} \right] \\ &= min \left[1, 1 \right] = 1 \end{align} Which results in the Gibbs procedure as described above.

We can implement Gibbs sampling in the context of an Ising model as:

Pick a random site $(i,j)$
Set spin to +1 with probability $p_{ij}$, or -1 with probability $(1-p_{ij})$
Begin again

The probability is defined as: $$ p_{ij} = \frac{1}{1+ exp(- \beta \Delta H / \sigma_{i,j})} $$

At lower temperatures this should behave similarly to the Metropolis-Hastings. However the Metropolis-Hastings method has approximately twice the probability of accepting energetically unfavourable states, as such the Gibbs might be less efficient.

Note: this is true for the Ising model with methods as descibed - it is note a general point on Gibbs/Metropolis-Hastings. When sampling from higher dimensional distributions Gibbs samplers sample each dimension independently whereas Metropolis-Hastings samples points from the high-dimensional space. In some situations the Gibbs sampler can perform significantly better.

Since this is only a minor update to the Metropolis-Hastings code we will not present it here.

Glauber Dynamics (and Heat-Bath)

Another confusion I have seen is that Metropolis-Hastings is the only acceptance-rejection scheme. This is not true, we can also define alternative acceptance probabilites. One such example is Glauber dynamics where we can express the acceptance probability as: $$ p_{flip} = \frac{1}{2}\left( 1 - tanh\left( \beta \Delta H / 2\right) \right)\frac{exp(-\beta \Delta H/2)}{exp(\beta \Delta H/2) + exp(-\beta \Delta H/2)} $$ We will not derive this here nor simulate using Glauber dynamics (but we could modify 1 or 2 lines of the code above to ahcieve this), it is just to illustrate another option. For the Ising model Glauber dynamics has a lower acceptance probability for all possible states, as such its performance is likely to be slightly worse than Metropolis-Hastings.

For the Ising model Glauber-Dynamics is identical to the Heat-Bath method.

Simulated Annealing and Simulated Tempering

We now move onto our first "improvement" to the Metropolis-Hastings scheme: simulated annealing.

The concept is very simple and takes inspiration from physical systems. Essentially we just start the system in a high temperature and gradually cool it down. This makes intuitive sense since at higher temperatures we have an increased chance of jumping out of local-minima and get closer to a better overall minima. However we will struggle to actually locate the minima at high temperature for the same reason. In contrast with a low temperature we will be able to locate a nearby local-minima very accurately but will not be able to jump out of it to find a better one. By starting off "hot" the samples will jump between many local minima, as we slowly cool down there should be fewer jumps between local minima and it should eventually get stuck in the domain of a "good" local minima (not far from the global minima ideally) as we cool the temperature further we should get closer and closer to that minima. This is similar to the process of annealing metals by heating them then cooling them to reorganize the crystalline structure.

Simulated annealing has proved very useful in the field of combinatorial optimization in situations where we want to quickly generate "good" solutions (not necessarily "best"). Variations on the idea can be seen in many areas (e.g. variable learning rate algorithms in deep learning can be thought of as a form of simulated annealing). We can also note there is nothing in the method that is particular to Metropolis-Hastings (or the other variations presented) - it can be used with pretty much any simulation method.

We have not descibed "how" we would want to decrease the temperature over time. This is part of the "issue" with simulated annealing (and many MCMC algorithms in general) - there is just as much art as their is science to implementing them. There are no real hard fast rules for getting good results, one in essense has to just try various options until something works (or on well studied problems borrow schemes from others).

We will take a simple approach where we decrease temperature by 10% every 500 steps starting from a temperature of 4 (this was chosen arbitrarily - it is not a suggestion of what might work well in this situation!) We can make use of the code example above for Metropolis-Hastings to give a compact implementation of:

# An implementation of a Metropolis-Hastings algorithm
# with simulated annealing applied to a 2d Ising spin glass

# Fix random seed
np.random.seed(123)

# Set size of model N and initial spins
N = 32
spins = np.random.choice([-1, 1], (N,N))

# Set up initial beta
beta = 1/4

def _main_loop_SA(ts, bt_initial, s_array, up_array, down_array, left_array, right_array):
    s_temp = s_array.copy()
    bt_live = bt_initial
    for i in range(ts):
        if ts % 500 == 0:
            bt_live *= 1/0.9
        update_step = update(bt_live, s_temp, up_array, down_array, left_array, right_array)
        s_temp = update_step
        energy[i+1] = energy[i] + dE
        mag[i+1] = mag[i] + dM

#### Run Main Loop
_main_loop_SA(timesteps, beta, spins, up, down, left, right)
mag = mag / (N**2)
energy = energy / (N**2)

# plot magnetism and energy evolving in time
fig, ax1 = plt.subplots()
ax1.set_xlabel("Time step")
ax1.set_ylabel("Magnetism", color='blue')
ax1.plot(mag, color='blue')

ax2 = ax1.twinx()
ax2.set_ylabel("Energy", color='red')
ax2.plot(energy, color='red')

plt.show()

We can see that the system settled down to a lower energy state more quickly and smoothly than with the vanilla Metropolis-Hastings scheme.

Although simulated annealing can improve on vanilla Metropolis-Hastings it can still struggle to find a global minima of the system. There are various "hacks" that can further improve this however - one such example being the concept of restarting. Again this is a very "obvious" thing to try - we store the "best" state we've visited so far in a simulation, if we "get stuck" somewhere above this energy level we "jump back" to this best state and try again.

We can extend the simulated annealing idea further to the concept of "simulated tempering". Here we treat the temperature of the system as a variable in itself, the teperature can go up as well as down during the simulation. This can further improve convergence properties since it allows the system to escape energy boundaries more easily by increasing temperature. This can remove the need to use restarting since a higher temperature is always available as an option at all times.

One such simulated tempering scheme is "parallel tempering" - as the name suggests this involves running many Markov-Chains in parallel and "jumping" between chains as the algorithm progresses. In some instances the cost of running multiple chains is less than the improvement in performance. Again however there is an art to selecting the correct number of chains and temperatures to run in parallel, most times there is no substitute for just trying things and running tests on the results. For the interests of brevity we will not present a full code here - but we note that for 2 chains at temperatures $T_1$ and $T_2$ our proposed update is to switch the states between the two chains (or swap temperatures of the 2 chains) the acceptance probability is then: $$p_{flip} = min\left[1, exp((H_1-H_2)(\beta_1 - \beta_2)) \right] $$ Where $H_1$ is the energy as defined by the Hamiltonian for chain with temperature $T_1 = 1/\beta_1$ and similar for $H_2$. This can be easily adapted to more chains and temperatures.

Cluster Update Methods

We now have a few options for simulating the Ising model, however they are by no means perfect. The issue still remains of falling into local optima instead of a global optima. From our previous mathematical study we know that energy minima for the Ising model are "far away" from each other, that is they have very little overlapping spins. By flipping individual spins 1 by 1 it is very hard to make the chains explore the energy landscape fully. The natural way to solve this is to flip multiple spins simultaneously at each step. From the general definition of the Metropolis-Hastings method there is nothing stopping us in following this line of reasoning.

Unfortunately this makes things much harder, the complications arise in finding a valid scheme for flipping multiple spins at once. We have glossed over the mathematical foundations of MCMC here but the proposal/acceptance probabilities need to be selected in "smart way" in order for the resulting Markov chain to have certain properties. When looking at more than 1 spin at a time in the Ising model this proved fairly difficult. This is evidenced by the original Metropolis-Hastings scheme being proposed in 1953 yet the first multi-spin method not being proposed and justified until the late 1980s.

Wolff Algorithm

The main idea of the algorithm is to look for "clusters" of spin sites with the same spin. We then decide to flip the spin of all the sites within this cluster at once. We then pick a new cluster and repeat this process as necessary. The pseudocode for this algorithm as it applies to the Ising model is:

A site $x$ with spin $\sigma_x$ is selected at random and added to an empty cluster
For each neighbour $y$ of $x$ such that $\sigma_y = \sigma_x$ we add $y$ to the cluster stack with probability $p_{xy} = 1 - exp(-2\beta J_{xy})$ else move onto next neighbour
After all neighbours are exhausted select next site in the cluster stack and repeat the previous step until the cluster stack is exhausted
Once the cluster is fully specified flip the spins of all sites in the cluster and begin again

We can see that like the Gibbs sampling algorithm, here the Wolff algorithm is "rejection free" - that is all proposed sites are flipped. We also note that there is nothing in this method that is incompatible with simulated annealing/tempering - these techniques are often used together.

Some of the more computer-science focussed readers may be thinking: "creating clusters is computationally expensive, will a brute force local update method not be better?" Which is a valid concern; there a 2 things to consider here - firstly there are many efficient cluster generating algorithms from percolation theory which can help speed up this process (we presented a very simple method above for clarity.) And secondly the local update methods will take a very long time to make "big jumps" away from the current configuration - even though there is some probability to make unfavourable movements most of the time these will not be accepted, to jump to a different local energy minima we would need many such unfavourable moves which means we could be waiting for a very long time! This is why spin glass models form a good test bed for optimization algorithms as they are one of the simplest to define models with "difficult" energy landscapes.

We should find this algorithm performs better in general, especially near the 2nd order phase transition whereby successive samples become increasingly correlated (whereas we would want more independent samples). We will try and produce plots of the 2nd order themodynamic variables: heat capacity and susceptibility. We should expect to see an approximate discontinuity at the critical temperature. To do this we will re-run the Wolff algorithm multiple times for each target temperature. For each target temperature we will run 1500 "burn" steps and then evaluate our variables over the next 2500 steps. This should be long enough to get some reasonable results.

Note that the Wolff algorithm does not "converge" to a low energy state like the preceeding algorithms, instead it samples from the entire space in a "smart way" - even if it finds itself in the global energy minima there is still a relatively high probability of escaping. As such the graphs we produced before will look more "wiggly" - I've heard the term "fat caterpillar" used to describe the graph of a well mixed MCMC algorithm. If we were interested in finding a ground state we could keep track of the configuration corresponding to the lowest observed energy state so far (we will not do this in our code however).

In the example below we'll run a constant $J=1$ to try and reproduce the heat capacity we found analytically in the previous blog post as a proof of concept. I will leave the functionality for general interaction strengths should you wish to experiment.

An implementation of this method can be seen below (note: this is a very slow code since we're using Python lists! It would be a prime candidate for being sped up using Cython/Numba/etc.):

# An implementation of the Wolff algorithm
# With simulated annealing applied to a 
# 2d Ising spin glass

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Fix random seed
np.random.seed(123)

# Set size of model NxN and initial spins
N = 32
spins_initial = np.random.choice([-1, 1], (N,N))

# Fix time-steps
burn = 1500
evaluation = 2500
time_steps = burn + evaluation

# Initialize interaction arrays
# Have 4 arrays: up, down, left right
# These represent the interaction strengths to the 
# up/down/left/right neighbours of a site
# There is a symmetry between these matrices
# This is not the most memory efficient solution
s_h = 1
s_v = 1
up = np.zeros((N,N))
down = np.zeros((N,N))
left = np.zeros((N,N))
right = np.zeros((N,N))

# Using J=1 constant so graphs are easier to generate
# Replace with comments to give an EA spin glass
up[1:N,:] = 1 #np.random.rand(N-1,N) * s_v
down[0:N-1,:] = up[1:N,:]
left[:,1:N] = 1 #np.random.rand(N,N-1) * s_h
right[:,0:N-1] = left[:,1:N]

# Create function to find neigbour sites
def nbr_udlr(s_site, s_array):
    _N = s_array.shape[0]
    i = s_site[0]
    j = s_site[1]
    
    if i == 0:
        up_site = 0
        down_site = [i+1, j]
    elif i == _N-1:
        up_site = [i-1,j]
        down_site = 0
    else:
        up_site = [i-1,j]
        down_site = [i+1, j]
    if j == 0:
        left_site = 0
        right_site = [i,j+1]
    elif j == N-1:
        left_site = [i,j-1]
        right_site = 0
    else:
        left_site = [i,j-1]
        right_site = [i,j+1]
    
    return [up_site, down_site, left_site, right_site]

# Create function to return interactions strength
def int_strength(s_site, udlr, up_array, down_array, left_array, right_array):
    if udlr == 0:
        return up_array[s_site[0], s_site[1]]
    if udlr == 1:
        return down_array[s_site[0], s_site[1]]
    if udlr == 2:
        return left_array[s_site[0], s_site[1]]
    if udlr == 3:
        return right_array[s_site[0], s_site[1]]

def energy_calc(s_array, up_array, down_array, left_array, right_array):
    _N = s_array.shape[0]
    energy = 0
    for i in range(_N):
        for j in range(_N):
            if i == 0:
                up_neighbour = 0
                down_neighbour = s_array[i+1,j]
            elif i == N-1:
                up_neighbour = s_array[i-1,j]
                down_neighbour = 0
            else:
                up_neighbour = s_array[i-1,j]
                down_neighbour = s_array[i+1,j]
            if j == 0:
                left_neighbour = 0
                right_neighbour = s_array[i,j+1]
            elif j == N-1:
                left_neighbour = s_array[i,j-1]
                right_neighbour = 0
            else:
                left_neighbour = s_array[i,j-1]
                right_neighbour = s_array[i,j+1]
        
            energy += s_array[i,j]*(up_array[i,j]*up_neighbour + down_array[i,j]*down_neighbour + left_array[i,j]*left_neighbour + right_array[i,j]*right_neighbour)
    return -energy/2
    
def wolff_step(bt, s_array, up_array, down_array, left_array, right_array):
    _N = s_array.shape[0]
    initial_site = np.random.choice(_N, 2)
    initial_site = [initial_site[0], initial_site[1]]
    old_spin = s_array[initial_site[0], initial_site[1]]
    cluster = [initial_site]
    stack = [initial_site]
    
    while stack != []:
        site = stack[np.random.choice(len(stack))]
       
        # Cycle neigbours
        nbr = nbr_udlr(site, s_array)
        for i in range(4):
            nbr_live = nbr[i]
            if nbr_live == 0:
                continue
            nbr_spin = s_array[nbr_live[0], nbr_live[1]]
            if nbr_spin == old_spin:
                if nbr_live not in cluster:
                    p = 1 - np.exp(-2*bt*int_strength(site, i, up_array, down_array, left_array, right_array))
                    if np.random.random() < p:
                        cluster.append(nbr_live)
                        stack.append(nbr_live)
        stack.remove(site)
    
    for site in cluster:
        s_array[site[0], site[1]] *= -1
    return s_array

###### Main Code
# Create useful constants
N1 = evaluation*N*N
N2 = evaluation*evaluation*N*N

# Define temp ranges
temp_steps = 20
temp_min = 1.75
temp_max = 2.75

temp_array = np.linspace(temp_min, temp_max, num=temp_steps)
M = np.zeros(temp_steps)
E = np.zeros(temp_steps)
C = np.zeros(temp_steps)
X = np.zeros(temp_steps)

for t in range(temp_steps):
    spins = spins_initial.copy()
    M1 = 0
    M2 = 0
    E1 = 0
    E2 = 0
    beta = 1/temp_array[t]
    for i in range(time_steps):
        spins = wolff_step(beta, spins, up, down, left, right)
        if i > burn:
            mag_tmp = abs(spins.sum())
            M1 += mag_tmp
            M2 += mag_tmp**2
            energy_tmp = energy_calc(spins, up, down, left, right)
            E1 += energy_tmp
            E2 += energy_tmp**2   

        M[t] = M1 / N1
        E[t] = E1 / N1
        C[t] = (E2/N1 - E1**2/N2)*beta**2
        X[t] = (M2/N1 - M1**2/N2)*beta

# Create plots
fig, axs = plt.subplots(2, 2, figsize=(10,10), gridspec_kw={'hspace': 0.25, 'wspace': 0.25})
axs[0, 0].scatter(temp_array, M, color='Red')
axs[0, 0].set_title("Magnetism")
axs[0, 0].set(xlabel="T", ylabel="Magnetism")

axs[0, 1].scatter(temp_array, E, color='Blue')
axs[0, 1].set_title("Energy")
axs[0, 1].set(xlabel="T", ylabel="Energy")

axs[1, 0].scatter(temp_array, X, color='Red')
axs[1, 0].set_title("Susceptibility")
axs[1, 0].set(xlabel="T", ylabel="Susceptibility")

axs[1, 1].scatter(temp_array, C, color='Blue')
axs[1, 1].set_title("Heat Capacity")
axs[1, 1].set(xlabel="T", ylabel="Heat Capacity")

plt.show()

The plots here are a bit noisy but they loosely match our previous theoretical findings.

Swendsen-Wang Algorithm

Another cluster algorithm is the Swendsen-Wang. Unlike the Wolff algorithm Swendsen-Wang looks at multiple clusters concurrently and applies a spin-flip to all clusters. It was propsed 2 years prior to the Wolff method.

In pseudo code it can be presented as:

From an initialized spin configuration for each neighbour pair of sites $\langle x, y \rangle$ we specify a bond: $b_{x,y} \in \{0, 1 \}$ Where we sample according to: \begin{align} & \mathbb{P}(b_{x,y} = 0 | \sigma_x \neq \sigma_y) = 1 \\ & \mathbb{P}(b_{x,y} = 1 | \sigma_x \neq \sigma_y) = 0 \\ & \mathbb{P}(b_{x,y} = 0 | \sigma_x = \sigma_y) = exp(-2 \beta J_{xy}) \\ & \mathbb{P}(b_{x,y} = 1 | \sigma_x = \sigma_y) = 1 - exp(-2 \beta J_{xy}) \end{align}
Generate clusters using bonds. If there exists a bond between sites $b_{x,y} = 1$ then the sites belong to same cluster
For each cluster with probability 1/2 flip all spins within the cluster to get a new configuration
Repeat process of generating bonds and clusters

We can see this is slightly different to the Wolff algorithm since it looks at multiple clusters within a given step. The performance of the Swendsen-Wang is slightly worse than that of the Wolff since it has a lower probability of flipping large clusters (in the case of Ising models). Both algorithms have been adpated and used to alternative spin glass models (as well as models outside of spin glasses).

We won't present an implementation of this method here since we already looked at the Wolff algorithm for an example of a cluster algorithm.

Conclusion

In this blog post we have looked at a variety of MCMC methods for simulating Ising models. We started by looking at various local update methods, which we now know do not behave optimally. We extended these ideas to cluster update methods which can show better performance for Ising models. We also looked at the very intuitive simulated annealing and simulated tempering methods, which have been used in optimization problems far outside the realms of spin glass models or even statsitical physics in general.

This is the fourth blog post in a series - you can find the next blog post here