Jekyll2022-02-21T04:04:12+00:00https://sempwn.github.io/feed.xmlMike Irvinepersonal websiteChallenges in modelling interventions for COVID-192020-03-15T00:00:00+00:002020-03-15T00:00:00+00:00https://sempwn.github.io/blog/2020/03/15/covid19<link rel="stylesheet" type="text/css" href="https://sempwn.github.io/css/covid19/main.css" />
<figure class="figure">
<img class="center-block img-responsive" src="https://sempwn.github.io/img/covid19/virus.jpg" alt="covid-19 virus" />
<figcaption class="figure-caption text-center">
Photo by <a href="https://unsplash.com/@fusion_medical_animation?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Fusion Medical Animation on Unsplash.</a>
</figcaption>
</figure>
<h3 id="introduction">Introduction</h3>
<p class="lead">
To begin with a disclaimer, all simulations within this post are for educational
purposes only. Although I have tried to use parameters consistent with the current
covid-19 outbreak where possible, there are other factors such as incubation period,
heterogeneity of the population, or importation of cases that I haven't explicitly
included.
</p>
<p>Among many of the things the current covid-19 pandemic has shown is the difficulty
in predicting whether an outbreak of an infectious disease will grow into an
epidemic and what might be the potential impact of subsequent interventions. Using models we
can build up a picture of what this uncertainty might be and factor in some elements
that we don’t know about the disease such as if individuals can be asymptomatic carriers.</p>
<p>Another recent point of debate that’s been particularly exemplified in the <a href="https://www.theguardian.com/commentisfree/2020/mar/15/observer-view-on-the-government-coronavirus-strategy-must-face-scrutiny">UK</a> is
around whether it is best to build up herd immunity or to impose strict
controls on movement early in the epidemic as has been done in <a href="https://time.com/5802293/coronavirus-covid19-singapore-hong-kong-taiwan/">Singapore, Hong Kong, and Taiwan</a>. The apps below
will allow you to explore the impact of both intervention both early and later
in the epidemic.</p>
<h3 id="outbreak-control">Outbreak control</h3>
<p>We have to begin by defining the most important number in infectious disease
epidemiology, the $R_0$. It’s full definition is,</p>
<blockquote class="lead text-center">
<p class="mb-0">The average number of secondary cases for every
primary case in a completely susceptible population
</p>
</blockquote>
<p>This is a little technical, so let’s break down each part of the definition.
A <strong>completely susceptible population</strong> is where all individuals are able to be
infected by the virus, where no one has any prior immunity. A <strong>secondary case</strong>
following a <strong>primary case</strong> is the number of individuals who are infected by one
individual. The <strong>average number</strong> is also important here. For example if an infection
has an $R_0$ of 2, on average an infected individual would infect 2 others, but this
could potentially be more or less.</p>
<p>At the start of an epidemic, many random infection events can make it incredibly
difficult to predict how many cases we would expect even a week later. To explore this,
the simulation below shows a series of infection events following one infected
individual (the technical name for this type of model is a branching process).
Using the slider you can change the reproduction number $R_0$ and simulate three generations
ahead (for this purpose we can assume a generation is 7 days, so we are simulating three
weeks ahead).</p>
<hr />
<div class="form-inline">
<div class="form-group col-md-6">
<label id="f-inputr0-label" for="f-inputr0" class="ol-form-label">R0: 2.50</label>
<i class="fa fa-question-circle" href="#" data-toggle="tooltip" data-placement="bottom" title="
Basic reproduction number. Average number
of secondary cases following a primary case.
"></i>
<input id="f-inputr0" type="range" min="0" max="400" value="250" class="slider" />
</div>
<div class="form-group col-md-3">
<button id="r0DiagramSimulate" class="btn btn-primary">Simulate!</button>
</div>
</div>
<div id="r0Diagram"></div>
<hr />
<p>Early estimates of the $R_0$ for covid-19 are <a href="https://www.thelancet.com/journals/laninf/article/PIIS1473-3099%2820%2930144-4/fulltext">around 2.5</a> although there is a large
amount of uncertainty around this. Even if we did knew the $R_0$ exactly due to
the randomness with how infection events transpire we still see some scenarios
where there are a large number of cases after three generations and so where
there aren’t any.</p>
<p>Now let’s consider this process happening several times, where we keep simulating
what would transpire if one individual is infected. This builds up a
probability of an outbreak occurring and what would be the size of the outbreak.
Using the app below, you can change the initial $R_0$ and observe what the final
number of cases after three generations. As more simulations are run a pattern
begins to build up that describes the distribution of all possible infection
scenarios for that particular $R_0$.</p>
<hr />
<div class="form-group">
<label id="input-outbreakr0-label" for="input-outbreakr0" class="ol-form-label">R0: 2.50</label>
<i class="fa fa-question-circle" href="#" data-toggle="tooltip" data-placement="bottom" title="
Basic reproduction number. Average number
of secondary cases following a primary case.
"></i>
<input id="input-outbreakr0" type="range" min="0" max="400" value="250" class="slider" />
</div>
<div id="outbreak-simulation"></div>
<hr />
<p>Although I mentioned above that current estimates of the $R_0$ for covid-19 are
around 2.5, this is dependent on there being no intervention to its spread. Lots
of factors may help to limit the spread and reduce the $R_0$ including social distancing,
self-quarantining and contact tracing. Try reducing the $R_0$ above and see
how it impacts the probability of an outbreak occurring. You’ll notice that if the
$R_0$ is below 1 then there is a zero probability of the epidemic taking off.</p>
<h3 id="later-epidemic-case-management">Later epidemic case management</h3>
<p>Many countries are now observing sustained community-based transmission where the
majority of new cases are from individuals becoming infected in their own community
and not individuals who had recently travelled abroad. In this situation outbreak
control is no longer feasible and so other measures must be used including encouraging
social distancing and hand washing. In more extreme cases countries, <a href="https://en.wikipedia.org/wiki/2020_Italy_coronavirus_lockdown">such as Italy</a> have begun to impose national quarantines
for a given period.</p>
<p>Both the timing and duration of these interventions can be incredibly important
for controlling the total number of individuals who become infected, but also location and
height of the epidemic peak when the most individuals are infected in a given week. The idea is
to <a href="https://www.washingtonpost.com/graphics/2020/world/corona-simulator/">“flatten the curve”</a>
so as to not overwhelm a nation’s healthcare services and give them more time to
respond.</p>
<p>The simulator below lets you explore the consequences of an intervention event where
the risk of transmission is reduced. The top-left graph shows the curve of the
epidemic where there is intervention and the counterfactual scenario where no intervention occurs.
The top-right shows the total number of infected individuals at the end of
an epidemic and those that had remained susceptible. The bottom graph below shows
the effective $R_0$ at a point in time, this is the average number of individuals
infected from a case in that moment. Try moving the $R_0$ slider below to see how
this impacts the size of the epidemic and its peak.</p>
<!--R0 slider -->
<div class="form-group form-control-lg">
<label id="inputr0-label" for="inputr0" class="col-6 col-form-label">R0: 2.50</label><i class="fa fa-question-circle" href="#" data-toggle="tooltip" data-placement="bottom" title="
Basic reproduction number. Average number
of secondary cases following a primary case.
"></i>
<div class="col-12">
<input id="range-inputr0" type="range" min="0" max="400" value="250" class="slider" />
</div>
</div>
<!-- Graph -->
<div class="row justify-content-center">
<div class="col-lg-8">
<div id="SIRGraphDiv"></div>
</div>
<div class="col-lg-4">
<div id="totalGraphDiv"></div>
</div>
</div>
<div class="row justify-content-center">
<div id="reffGraphDiv"></div>
</div>
<!-- Parameters -->
<div class="row justify-content-center">
<div class="col-6">
<div class="form-group form-control-lg">
<label id="input-rt-label" for="input-rt" class="col-9 col-form-label">Reduction in transmission: 0%</label>
<i class="fa fa-question-circle" href="#" data-toggle="tooltip" data-placement="bottom" title="
Percentage reduction of R0 during the intervention period.
">
</i>
<div class="col-12">
<input id="range-input-rt" type="range" min="0" max="100" value="0" class="slider" />
</div>
</div>
</div>
<div class="col-6">
<div class="form-group form-control-lg">
<label id="input-start-label" for="input-start" class="col-9 col-form-label">Start: day 0</label>
<i class="fa fa-question-circle" href="#" data-toggle="tooltip" data-placement="bottom" title="
Start of intervention
">
</i>
<input id="range-input-start" type="range" min="0" max="100" value="0" class="slider" />
</div>
<div class="form-group form-control-lg">
<label id="input-duration-label" for="input-duration" class="col-9 col-form-label">Intervention duration: 0 days</label>
<i class="fa fa-question-circle" href="#" data-toggle="tooltip" data-placement="bottom" title="
Duration of social isolation event
">
</i>
<input id="range-input-duration" type="range" min="0" max="365" value="0" class="slider" />
</div>
</div>
</div>
<p>The sliders above control how much the intervention reduces the spread of the infection,
where the start of the intervention occurs and its total duration. For this simulation
as soon as an intervention stops the $R_0$ returns to its initial value, which
depending on when the intervention starts can lead to the epidemic being delayed
or creating a double-peak epidemic. Also if the intervention begins too late after the
peak then there is little impact on the overall epidemic.</p>
<h2 id="conclusions">Conclusions</h2>
<p>As <a href="https://personalpages.manchester.ac.uk/staff/thomas.house/blog/blog.html">Thomas House from Manchester University</a> has also commented on, initial interventions that lower the effective
reproduction may only be delaying those individuals from becoming infected, however
does reduce the peak of the epidemic. Many factors impact the overall epidemiology
of a virus and how that translates into the total cases infected. This is especially
problematic in the current pandemic where estimates of infectivity, incubation period,
and recovery time all have large uncertainty. Even more, it is not clear how much
these current or future interventions will impact the ability for covid-19 to spread.</p>
<script src="https://cdnjs.cloudflare.com/ajax/libs/raphael/2.2.7/raphael.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/flowchart/1.7.0/flowchart.min.js"></script>
<!-- D3 script -->
<script type="text/javascript" src="https://d3js.org/d3.v3.min.js"></script>
<!-- Bootstrap -->
<script src="https://code.jquery.com/jquery-3.2.1.slim.min.js" integrity="sha384-KJ3o2DKtIkvYIK3UENzmM7KCkRr/rE9/Qpg6aAZGJwFDMVNA/GpGFF93hXpG5KkN" crossorigin="anonymous"></script>
<!-- bootstrap spinner -->
<script src="https://sempwn.github.io/js/spinner.js"></script>
<!-- Plot.ly js -->
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.11.0/umd/popper.min.js" integrity="sha384-b/U6ypiBEHpOf/4+1nzFpr53nxSS+GLCkfwBdFNTxtclqqenISfwAzpKaMNFNmj4" crossorigin="anonymous"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-beta/js/bootstrap.min.js" integrity="sha384-h0AbiXch4ZDo7tp9hKZ4TsHbi047NrKGLO3SEJAg45jXxnGIfYzk4Si90RDIqNm1" crossorigin="anonymous"></script>
<script src="https://sempwn.github.io/js/covid19/flowcharts.js"></script>
<script src="https://sempwn.github.io/js/covid19/histogrammer.js"></script>
<script src="https://sempwn.github.io/js/covid19/SIR.js"></script>
<script type="text/x-mathjax-config">
MathJax.Hub.Config({
tex2jax: {inlineMath: [['$','$'], ['\\(','\\)']]}
});
</script>
<script type="text/javascript" async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_CHTML">
</script>Testing binary classifiers2018-10-24T00:00:00+00:002018-10-24T00:00:00+00:00https://sempwn.github.io/blog/2018/10/24/roc<style>
.bar rect {
fill: steelblue;
}
.bar text {
fill: #fff;
font: 10px sans-serif;
}
body {
font: 10px sans-serif;
}
.line {
stroke: #000;
stroke-width: 1.5px;
}
.axis path,
.axis line {
fill: none;
stroke: #000;
shape-rendering: crispEdges;
}
.overlay {
fill: none;
pointer-events: all;
}
.focus circle {
fill: none;
}
</style>
<h2 id="introduction-to-binary-classifiers">Introduction to binary classifiers</h2>
<p>Classification is a classic problem in machine learning and statistics, where we
have some data and wish to choose a single category for each data point. The simplest
form of this is binary classification, where each data point can represent one of
two states. For example, is an email spam or not spam? Does a medical test mean a patient
does or doesn’t have a particular disease? or does a picture contain a <a href="https://www.kaggle.com/c/dogs-vs-cats">cat or a dog</a>? Consider a series of emails that have been hand-labeled to
be either spam or not spam. A simple (and probably pretty poor) binary classifier
would be whether the email contained the word “spam” itself. In this way we could automatically
classify each email and then compare to the hand-labelled category.</p>
<p>No classifier is perfect. Sometimes spam is missed or a medical test gives a wrong result.
In order to determine how well a classifier can perform a number of different performance statistics
have been developed. This article briefly goes through some of the main ones with interactive graphs to
demonstrate how they’re applied and some of their consequences.</p>
<h2 id="the-statistics">The statistics</h2>
<p>For a population of data-points (e.g. emails, patients, or images) and a binary classifier (e.g. checking the email contains the word spam), each data-point can be divided into one of four categories.
These are given by whether the data-point is positive or negative (e.g. actually is spam or has a disease)
against whether the point tested positive or negative (e.g. the email contains the word “spam”). The four groupings are then</p>
<ul>
<li><strong>True Positive (TP).</strong> A point that tested positive and is actually positive.</li>
<li><strong>True Negative (TN).</strong> A point that tested negative and is actually negative.</li>
<li><strong>False Positive (FP).</strong> A point that tested positive, but is actually negative.</li>
<li><strong>False Negative (FN).</strong> A point that tested negative, but is actually positive.</li>
</ul>
<p>The diagram below splits the population up into positive (in orange) and negative (in blue),
with those that tested positive darker than those that tested negative. Each area represents the
total number for each category.</p>
<hr />
<div id="positive-negative-explorer" class="row"></div>
<hr />
<p>The above diagram shows some of the main classifier statistics used:</p>
<ul>
<li><strong>Sensitivity (or recall).</strong> Out of how many points that are positive actually tested positive.</li>
<li><strong>Specificity</strong>. Out of how many points that are negative actually tested negative.</li>
<li><strong>Precision</strong> (or positive predictive value). The probability that a point that tested positive is actually positive.</li>
</ul>
<p>Each button above shows how to calculate these in practice.</p>
<h2 id="putting-it-into-practice">Putting it into practice</h2>
<p>Thinking about these statistics is important especially when the prevalence of a
positive case is low. If only one in a thousand emails is spam and our test is
not that sensitive, then most points that test positive actually won’t be spam.</p>
<p>The interactive diagram below simulates the consequences of varying <strong>prevalence</strong>, <strong>sensitivity</strong>,
and <strong>specificity</strong> for a population of points (shown as circles). Use the sliders first to determine how prevalent a positive case is along with the properties of the test. Then simulate the positives using the <strong>set positives</strong> button. Next apply the <strong>test</strong> with the specified <strong>sensitivity</strong> and <strong>specificity</strong>. Finally <strong>sort</strong> the data points to calculate the statistics.</p>
<hr />
<div id="test-treat" class="row"></div>
<hr />
<p>The data points are sorted into a <a href="https://en.wikipedia.org/wiki/Confusion_matrix">confusion matrix</a>, with testing negative and positive
arranged in columns and the points actually being negative and positive arranged into rows. Once sorted the true positive, false positive, true negative and false negatives can be calculated as entries in each part of the matrix. A number of statistics can then be calculated including <strong>sensitivity</strong>, <strong>specificity</strong> (coming from the sample as opposed to the underlying test), and the <strong>precision</strong>. Others include the <a href="https://en.wikipedia.org/wiki/Positive_and_negative_predictive_values">negative predictive value</a>, the accuracy (proportion of points correctly classified) and the <a href="https://en.wikipedia.org/wiki/F1_score">F1 score</a>.</p>
<h2 id="the-receiver-operator-characteristic">The receiver operator characteristic</h2>
<p>For some purposes a higher number of false positives or false negatives can be tolerated. Often a binary classifier will report a continuous value instead of true positive or negative (such as a probability or a concentration of a species in a blood test). In order to classify each point as being negative or positive, we are free to set a threshold on these values wherever we like. For example, if there is a high cost with missing a positive point then we might set this threshold low, so anything above it is classified as positive. In general, we want to be able to determine how well a classifier is performing over a range of these thresholds. This can be done using the <a href="https://en.wikipedia.org/wiki/Receiver_operating_characteristic">Receiver Operating Characteristic</a> (ROC) curve.</p>
<p>The diagram below demonstrates the ROC for varying classifiers. We can imagine that a classifier assigns a value to each point. If the point is positive this value is determined by a distribution (we’re using a normal distribution here, but the exact shape of the distribution doesn’t matter) and if the point is negative, the classifier value is determined by a different distribution. The sliders below can change the mean and variance of the positive distribution.</p>
<hr />
<div id="main-roc" class="row"></div>
<hr />
<p>The diagram on the left shows how the classifier translates each point into their corresponding values depending on whether they’re positive or negative. The diagram on the right maps out for each possible threshold, the corresponding false positive rate (1 - specificity) and true positive rate (sensitivity). By experimenting, you can see that a good classifier can maximize the true positive rate, whilst minimizing the false positive rate. The area underneath the whole curve (AUC) then summarizes how well the classifier performs over the whole range of possible thresholds. It turns out that, if you pick at random a positive point and a negative point, then the probability the value of the positive point is higher than the negative point is equal to the AUC (see below for more of an explanation).</p>
<p>You can also use the diagram above to explore how the shape of the ROC relates to different parts of where
the classifier may be failing. For example, if the ROC curve dips below the
diagonal at any point this is an indication that there are
more negative cases greater than positive cases at that threshold.</p>
<h2 id="wrapping-up">Wrapping up</h2>
<p>There’s a huge plethora of statistics for binary classifiers. These statistics can all have their subtleties and can be used in conjunction to understand issues and assess how well the classifier is performing.</p>
<h3 id="understanding-the-math-behind-the-auc">Understanding the math behind the AUC</h3>
<p>I mentioned briefly above that the probability that the value of a positive case being greater than a negative case is equal to the AUC. In order to understand this, we can simply integrate over the ROC and use the definition of the true positive rate $tpr$ and the false positive rate $fpr$ in terms of the probability density for a negative $f_0$ and positive $f_1$ point in terms of the value $x$,</p>
<h4 id="beginalign-auc--int_01-fprxdtpr---int_-inftyinfty-fprxtprxdx----int_-inftyinfty-tprx-f_0x-dx---int_-inftyinfty-int_xinfty-f_1y-dy-f_0x-dx---int_-inftyinfty-int_-inftyinfty-iy--x-f_1y--f_0x-dx-dy---py--x-endalign">\(\begin{align} AUC &= \int_0^1 fpr(x)d(tpr), \\ &= \int_{-\infty}^\infty fpr(x)tpr'(x)dx, \\ &= \int_{-\infty}^\infty tpr(x) f_0(x) dx, \\ &= \int_{-\infty}^\infty \int_{x}^\infty f_1(y) dy f_0(x) dx, \\ &= \int_{-\infty}^\infty \int_{-\infty}^\infty I(y > x) f_1(y) f_0(x) dx dy, \\ &= P(Y > X). \end{align}\)</h4>
<h2 id="acknowledgements">Acknowledgements</h2>
<p>All the interactive examples were coded in <a href="https://d3js.org">d3</a>, which has many fantastic <a href="https://bl.ocks.org">examples</a> for
data visualization.</p>
<!-- jQuery -->
<script src="https://sempwn.github.io/js/jquery.min.js"></script>
<!-- Plugin JavaScript -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery-easing/1.3/jquery.easing.min.js"></script>
<!-- d3 js v4 -->
<script src="https://d3js.org/d3.v4.min.js"></script>
<!-- numeric -->
<script src="https://sempwn.github.io/js/numeric.js"></script>
<!-- Plotly.js -->
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
<!-- slider -->
<script src="https://sempwn.github.io/js/bootstrap-slider.js"></script>
<!-- main js -->
<script src="https://sempwn.github.io/js/roc.js"></script>
<script src="https://sempwn.github.io/js/test_treat.js"></script>
<script src="https://sempwn.github.io/js/true_false_chart.js"></script>Probabilistic programming 4: Markov Chain Monte Carlo2018-01-11T00:00:00+00:002018-01-11T00:00:00+00:00https://sempwn.github.io/blog/2018/01/11/mcmc-4<div id="main-div">
</div>
<div id="button-group"></div>
<h2 id="introduction">Introduction</h2>
<p>At the heart of practical Bayesian inference is Monte Carlo sampling. For a background
on this see the previous blog posts: <a href="https://sempwn.github.io/">Monte Carlo Method</a>, <a href="https://sempwn.github.io/blog/2017/06/04/mcmc-2">Markov Chains</a> and <a href="https://sempwn.github.io/blog/2017/07/04/mcmc-3">Bayesian inference</a>. It can often be mysterious to understand
exactly what a sampler is doing, which makes it difficult to diagnose problems.
The above tool uses several example two-dimensional posterior distributions to demonstrate
how issues such as strong dependency between two parameters or multi-modality can
lead to poor performance of certain samplers and how to counteract this.</p>
<p>The walker (black circle) moves around the probability landscape producing random samples that are recorded
by the two histograms along the axis. The contours and colours represent the probability
surface. At each step, the walker will propose a new location (red circle) either independent
of the landscape (Metropolis-Hastings) or with some dependency on it (Slice and Hamiltonian).
The walker may then accept the proposed location and the new position is recorded.
Further descriptions for each of the samplers can be found below.</p>
<h3 data-toggle="collapse" data-target="#metropolis-hastings" class="clickable panel panel-default panel-heading" style="cursor: pointer;">Metropolis-Hastings</h3>
<div id="metropolis-hastings" class="accordion collapse">
<p>
<a href="https://en.wikipedia.org/wiki/Metropolis–Hastings_algorithm">Metropolis-Hastings</a> is one of the more intuitive sampling algorithms.
The idea is to perform a random walk and selectively move to a new position
dependent on whether the new position has a higher probability compared to the
current position. If you only moved when the probability was higher the random
walker would quickly end up in a local maxima and would no longer move. This means
in order to sample the region properly you would occasionally want to move to an area
of lower probability. At each step the walker draws a potential new position
with some distribution around the current position (two-dimensional normal in this case).
The probability of the potential new position is then compared to the probability of the
current position, if it is higher then the walker moves, otherwise the walker moves
with a probability dependent on the ratio of the new position to the old. So if the new
position has only a slightly lower probability than the current it is more likely to
accept than if the new position has a far lower probability.
</p>
<p>
There can be many aspects that need fine-tuning for Metropolis-Hastings. One of
the key aspects is how far on average the random walker steps at each iteration
(step-size). You can adjust the step-size in the above tool. Notice that when the
step-size is too small, the walker poorly explores landscape and it would take
many iterations to build up sufficient samples. However, if the step-size is too
large then the walker's proposed positions are rarely accepted and it can become stuck.
The skewed distribution shows an extreme case of this, where there is only one
direction for the walker to move in with comparable probability. There exists a
sweet spot for the step-size, however this can be problem-dependent and it may not
be clear what is optimal.
</p>
</div>
<h3 data-toggle="collapse" data-target="#slice-sampling" class="clickable panel panel-default panel-heading" style="cursor: pointer;">Slice Sampling</h3>
<div id="slice-sampling" class="accordion collapse">
<p>
<a href="https://en.wikipedia.org/wiki/Slice_sampling">Slice sampling</a> tries to take a more global approach to
sampling than with Metropolis-Hastings. There are two parts to generating a new
sample, an expansion and contraction phase.
</p>
<p>
In the expansion phase, units of a given
step-size are taken around the current position of the walker and added to an interval (you can control the step-size with the slider above). These step-sizes
are added until the interval contains points with a smaller probability than the current position.
</p>
<p>
In the contraction phase points are uniformly sampled from the constructed interval and the interval is cut
at the point if its probability is less than the current position of the walker.
Finally a point is randomly sampled from the interval and the walker is updated.
</p>
<p>
You'll notice that slice sampling is much more efficient when the surface is multimodal.
It still struggles with the bimodal surface as parameters are being updated independently.
For the skew distribution slice sampling is also fairly inefficient as it has to take many steps
to traverse the surface.
</p>
</div>
<h3 data-toggle="collapse" data-target="#HMC" class="clickable panel panel-default panel-heading" style="cursor: pointer;">Hamiltonian Monte Carlo</h3>
<div id="HMC" class="accordion collapse">
<p>
Hamiltonian Monte Carlo (HMC) uses the gradient of the probability surface
to provide a more efficient sampling scheme. The idea is to imagine the walker
as being a massive object in a potential landscape (imagine a ball rolling down a hill).
In the beginning the walker is given a "kick" with random strength and in a random direction (the step-size slider controls how large the kick is).
The walker then uses this momentum to travel through the potential landscape for a set number of time-steps. The walker is then updated with the same update rule as for Metropolis-Hastings i.e. if the new probability is higher then accept otherwise accept randomly with a probability dependent on the ratio of the old probability to the new.
</p>
<p>
You'll observe that HMC is far more efficient than the other methods for sampling the skew distribution and
works well with the other unimodal distributions. For the multimodal distributions you still need to provide a sufficient kick in order for the walker to jump between the regions of high probability.
</p>
</div>
<h3 id="conclusion">Conclusion</h3>
<p>It can be tempting to try and find a single sampler that suits all purposes.
Whilst some are able to perform sampling well in most cases (e.g. <a href="https://arxiv.org/abs/1111.4246">NUTS</a>) there
will still be cases where they may fall down (becoming stuck, or inefficiently
sampling the posterior). It is important to note that the examples given are
for two-dimensional cases only and typically modern Bayesian inference will
use many more dimensions, where other problems can start to creep in. Hopefully
the above tool sheds some light on the issues around sampling schemes used in
Bayesian inference.</p>
<h3 id="acknowledgements">Acknowledgements</h3>
<p>All the interactive examples were coded in <a href="https://d3js.org">d3</a>, which has many fantastic <a href="https://bl.ocks.org">examples</a> for
data visualization.</p>
<p>This post was broadly inspired by this <a href="http://www.benfrederickson.com/numerical-optimization/">blog post</a> on numerical optimization and
this <a href="http://playground.tensorflow.org/#activation=tanh&batchSize=10&dataset=circle&regDataset=reg-plane&learningRate=0.03&regularizationRate=0&noise=0&networkShape=4,2&seed=0.42620&showTestData=false&discretize=false&percTrainData=50&x=true&y=true&xTimesY=false&xSquared=false&ySquared=false&cosX=false&sinX=false&cosY=false&sinY=false&collectStats=false&problem=classification&initZero=false&hideText=false">tool</a> on exploring training of neural nets.</p>
<script src="https://d3js.org/d3.v4.min.js"></script>
<script src="https://d3js.org/d3-contour.v1.min.js"></script>
<script src="https://d3js.org/d3-scale-chromatic.v1.min.js"></script>
<!-- mcmc4 -->
<script src="https://sempwn.github.io/js/mcmc4.js"></script>When zombies attack: The infectious disease modelling app2017-11-02T00:00:00+00:002017-11-02T00:00:00+00:00https://sempwn.github.io/blog/2017/11/02/when-zombies-attack<style>
.hovereffect {
width: 100%;
height: 100%;
float: left;
overflow: hidden;
position: relative;
text-align: center;
cursor: default;
}
.hovereffect .overlay {
position: absolute;
overflow: hidden;
width: 80%;
height: 80%;
left: 10%;
top: 10%;
border-bottom: 1px solid #FFF;
border-top: 1px solid #FFF;
-webkit-transition: opacity 0.35s, -webkit-transform 0.35s;
transition: opacity 0.35s, transform 0.35s;
-webkit-transform: scale(0,1);
-ms-transform: scale(0,1);
transform: scale(0,1);
}
.hovereffect:hover .overlay {
opacity: 1;
filter: alpha(opacity=100);
-webkit-transform: scale(1);
-ms-transform: scale(1);
transform: scale(1);
}
.hovereffect img {
display: block;
position: relative;
-webkit-transition: all 0.35s;
transition: all 0.35s;
}
.hovereffect:hover img {
filter: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg"><filter id="filter"><feComponentTransfer color-interpolation-filters="sRGB"><feFuncR type="linear" slope="0.6" /><feFuncG type="linear" slope="0.6" /><feFuncB type="linear" slope="0.6" /></feComponentTransfer></filter></svg>#filter');
filter: brightness(0.6);
-webkit-filter: brightness(0.6);
}
.hovereffect h2 {
text-transform: uppercase;
text-align: center;
position: relative;
font-size: 17px;
background-color: transparent;
color: #FFF;
padding: 1em 0;
opacity: 0;
filter: alpha(opacity=0);
-webkit-transition: opacity 0.35s, -webkit-transform 0.35s;
transition: opacity 0.35s, transform 0.35s;
-webkit-transform: translate3d(0,-100%,0);
transform: translate3d(0,-100%,0);
}
.hovereffect a, .hovereffect p {
color: #FFF;
padding: 1em 0;
opacity: 0;
filter: alpha(opacity=0);
-webkit-transition: opacity 0.35s, -webkit-transform 0.35s;
transition: opacity 0.35s, transform 0.35s;
-webkit-transform: translate3d(0,100%,0);
transform: translate3d(0,100%,0);
}
.hovereffect:hover a, .hovereffect:hover p, .hovereffect:hover h2 {
opacity: 1;
filter: alpha(opacity=100);
-webkit-transform: translate3d(0,0,0);
transform: translate3d(0,0,0);
}
.hovereffect {
cursor: pointer;
}
</style>
<p>I recently participated in a science outreach event where I developed a
web app to introduce infectious disease modelling to the general public. The
main idea is that there has been a recent zombie outbreak and, given some data
and models, the participant was required to investigate various hypotheses on
its transmission, estimate the transmissibility and finally simulate what types
of interventions would be required for its control.</p>
<p><a href="https://sempwn.github.io/zombie-game/">Here</a> is the link to the app or you can
navigate to a section of the app from the images below.</p>
<div class="col-lg-4 col-md-4 col-sm-4 col-xs-6 clickableDiv" data-href="https://sempwn.github.io/zombie-game/">
<div class="hovereffect">
<img class="img-responsive" src="https://sempwn.github.io/img/zombie_outbreak/branch.png" alt="instructions" />
<div class="overlay">
<h2>Introduction</h2>
<p>
<a href="https://sempwn.github.io/zombie-game/">Click here</a>
</p>
</div>
</div>
</div>
<div class="col-lg-4 col-md-4 col-sm-4 col-xs-6 clickableDiv" data-href="https://sempwn.github.io/zombie-game/fitting.html">
<div class="hovereffect">
<img class="img-responsive" src="https://sempwn.github.io/img/zombie_outbreak/curve.png" alt="estimation" />
<div class="overlay">
<h2>Model fitting</h2>
<p>
<a href="https://sempwn.github.io/zombie-game/fitting.html">Click here</a>
</p>
</div>
</div>
</div>
<div class="col-lg-4 col-md-4 col-sm-4 col-xs-6 clickableDiv" data-href="https://sempwn.github.io/zombie-game/simulation.html">
<div class="hovereffect">
<img class="img-responsive" src="https://sempwn.github.io/img/zombie_outbreak/map.png" alt="simulation" />
<div class="overlay">
<h2>Simulation</h2>
<p>
<a href="https://sempwn.github.io/zombie-game/simulation.html">Click here</a>
</p>
</div>
</div>
</div>
<!-- jQuery -->
<script src="https://sempwn.github.io/js/jquery.min.js"></script>
<script>
$(".clickableDiv").click(function() {
window.location = $(this).data("href");
return false;
});
</script>Measuring hidden populations2017-09-05T00:00:00+00:002017-09-05T00:00:00+00:00https://sempwn.github.io/blog/2017/09/05/hidden-populations<style>
.bar rect {
fill: steelblue;
}
.bar text {
fill: #fff;
font: 10px sans-serif;
}
</style>
<figure class="figure">
<img class="center-block img-responsive" src="https://sempwn.github.io/img/hidden_pop/cat.jpg" alt="complex models" />
<figcaption class="figure-caption text-center">
<a href="https://www.flickr.com/photos/83613432@N02/8257427196/in/photolist-dzFtqq-9TJEtN-S9iPAL-fBxxcz-4dr7ba-VF1XSV-2wabB-8yhtUw-rqfwx-5LWXVp-dXMfvk-huuB91-qZD7X9-dEFtkj-8uYCTR-6cdS9r-9BJArW-obqCLX-nRvggo-gWefq-aohS4v-9GSk2N-bmEZdU-9Mt6Yw-A8Ush-jZwHdv-5T3vWi-rwgu8V-qHYvyr-bZMUa1-SovhJw-frbsN-4Fy9rU-gWeey-QJp1sp-PT4pw-rdr7va-6Xgfjr-fc6Bg7-5pe6hY-scNmY-4TzqwM-cg2brA-6Csnwt-5m47cs-nwZRvp-4XZL1G-6oof9c-h6T5GF-hUNpb">Cat hiding.</a>
</figcaption>
</figure>
<p>Often the data that comes to us is only partially observed: website users out of
all potential customers, patients displaying symptoms out of a total population
of infected individuals, the number of animals observed in a study etc. One of
the challenges with this type of data is to be able to determine the size and
characteristics of the total population based on our (<a href="https://en.wikipedia.org/wiki/Sampling_bias#Historical_examples">hopefully unbiased</a>)
sample.</p>
<p>How we can determine what the size is based on the type of data we observe? We’ll
look at two examples here: the zero-truncated count data and mark-recapture
data. Zero-truncated data comes about when we observe the same individuals multiple
times (but obviously don’t measure the individuals we don’t see). For example
from a <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4011782/">health registry</a>,
<a href="http://onlinelibrary.wiley.com/doi/10.1111/1467-9574.00232/abstract">police arrest records</a>
or <a href="http://www.personal.soton.ac.uk/dab1f10/jrssc.pdf">number of non-clonal tomato plants</a>.</p>
<p>Mark-recapture data is slightly different. Here there are two or more distinct phases
or surveys in the observations and the number detected in each survey plus the
overlap is used to estimate the whole population. Traditionally these techniques
are used in ecology, where an animal is marked or tagged and released and then
recaptured at a later date (e.g. measuring a population of <a href="http://onlinelibrary.wiley.com/doi/10.1890/0012-9658%282006%2987%5B2925:ATPDUP%5D2.0.CO;2/full">tigers</a>).
These techniques are also used where individuals may be captured by different data
sources, such as in estimating prevalence of <a href="http://jech.bmj.com/content/58/9/766.short">injection drug use</a>.</p>
<p>Let’s explore these concepts with a simulation. Below is our “field” with
individuals of a species (circles) moving around randomly. In the centre is
a larger circle denoting our field of view. If an individual wanders into our
field of view then we measure it (denoted by a change of colour). However, we
can’t directly observe the individuals outside the field of view who haven’t been
measured already. You can see that it would take a long time for all individuals
to wander into the field of view making it impractical to view all individuals
this way.</p>
<div id="demo"></div>
<h3 id="zero-truncated-poisson">Zero-truncated poisson</h3>
<p>First we can imagine that we record the number of times each individual wanders
into our field of view. We can use these statistics to build up an estimate of
the total population by making some assumptions around how these counts are distributed.
Assuming there’s a small, but constant probability of any individual wandering into
the field of view at any time leads to a Poisson distribution of counts i.e. the probability
of a randomly sampled (from the entire population) individual’s count being $x$ is</p>
<h3 id="px--x--frace-lambdalambdaxx">\(P(X = x) = \frac{e^{-\lambda}\lambda^x}{x!}.\)</h3>
<p>However, we don’t observe the individuals with zero counts so our data is truncated.
The probability of not being observed ($p_0$) is $e^{-\lambda}$.
Therefore the probability of a random variate $X$ being observed $x$ times is</p>
<h3 id="px--x--x--0--fracpxxpx0---fraclambdaxx-frace-lambda1-e-lambda">\(P(X = x | X > 0) = \frac{P(X=x)}{P(X>0)} = \frac{\lambda^x}{x!} \frac{e^{-\lambda}}{1-e^{-\lambda}}\)</h3>
<p>For our data ${x_0,\ldots,x_{n-1} }$, the associated log-likelihood is</p>
<h3 id="-nlambda--nlog1-e-lambda---lambda-sum_i0n-1logx_i--sum_i0n-1logx_i">\(-n\lambda -n\log(1-e^{-\lambda}) + \lambda \sum_{i=0}^{n-1}\log(x_i) -\sum_{i=0}^{n-1}\log(x_i!)\)</h3>
<p>Using the zero-truncated Poisson model, we can estimate the odds ratio of being
observed against not being observed, $p_0/(1-p_0)$. We can calculate this empirically
if we knew the number of individuals who weren’t observed ($f_0$) divided by the number
that were observed ($N_{obs}$). Combining these together we get the following,</p>
<h3 id="hatf_0--fracp_01-p_0n_obs">\(\hat{f_0} = \frac{p_0}{1-p_0}N_{obs}.\)</h3>
<p>So now all we need is an estimate of $\lambda$. We can do this by maximizing the
likelihood. The tool below shows this in action, notice that for this estimator
to work you need to have viewed at least one individual twice or more.
Notice that the estimator isn’t dependent on the size of the capture circle as
this is incorporated into the rate $\lambda$.</p>
<div id="sim"></div>
<h3 id="mark-recapture">Mark recapture</h3>
<p>We now look at a slightly different type of study where there a two distinct phases
of measuring a population. In an ecological study this could be where an initial
survey is conducted to measure the population size of a particular species and
each individual is tagged. At a later stage the population is measured again and the
number of individuals as well as the number of tagged individuals are recorded.</p>
<p>Let’s set-up some notation for the problem: $N$ is the unknown total population size,
$n$ is the number of individuals marked in the first survey, $K$ is the number of individuals
observed in the second survey and $k$ is the number of individuals observed in the
second survey who were tagged in the first survey.</p>
<p>A simple estimator of the population size known as the <a href="https://en.wikipedia.org/wiki/Lincoln_index">Lincoln estimator</a>
(although why it’s called this is a slight <a href="http://bit-player.org/2010/the-thrill-of-the-chase">mystery</a> and
is probably an example of <a href="https://en.wikipedia.org/wiki/Stigler%27s_law_of_eponymy">Stigler’s law of eponymy</a>).
This estimator assumes that the probability of being observed in the second
survey is the same as in the first survey. The empirical probability of being
observed in the first survey is $n/N$ and the probability of being observed again is
$k/K$. If these two probabilities are independent we can equate them and get the
following estimator</p>
<h3 id="hatn--fracnkk">\(\hat{N} = \frac{nK}{k}.\)</h3>
<p>It turns out this only performs well for large sample sizes. For smaller sample
sizes we can use the Chapman estimator:</p>
<h3 id="hatn--fracn1k1k1---1">\(\hat{N} = \frac{(n+1)(K+1)}{k+1} - 1.\)</h3>
<p>Use the tool below to explore these estimators with our simulation.</p>
<div id="sim2"></div>
<p>You can see under what conditions the Chapman estimator outperforms the Lincoln
estimator. Although with both we have no estimate of the uncertainty around them.
If we wanted to actually put this into practice then we might consider something
like the R package <a href="https://cran.r-project.org/web/packages/multimark/index.html">multimark</a>
or in Python using <a href="http://docs.pymc.io">PyMC</a> (e.g. this <a href="https://github.com/pymc-devs/pymc/wiki/Mt">example</a>).</p>
<h3 id="acknowledgements">Acknowledgements</h3>
<p>All the interactive examples were coded in <a href="https://d3js.org">d3</a>, which has many fantastic <a href="https://bl.ocks.org">examples</a> for
data visualization.</p>
<!-- jQuery -->
<script src="https://sempwn.github.io/js/jquery.min.js"></script>
<!-- Plugin JavaScript -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery-easing/1.3/jquery.easing.min.js"></script>
<!-- d3 js v4 -->
<script src="https://d3js.org/d3.v4.min.js"></script>
<!-- numeric -->
<script src="https://sempwn.github.io/js/numeric.js"></script>
<!-- Plotly.js -->
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
<!-- slider -->
<script src="https://sempwn.github.io/js/bootstrap-slider.js"></script>
<!-- venn -->
<script src="https://sempwn.github.io/js/venn.js"></script>
<!-- main js -->
<script src="https://sempwn.github.io/js/capture.js"></script>Probabilistic programming 3: Bayesian probability2017-07-04T00:00:00+00:002017-07-04T00:00:00+00:00https://sempwn.github.io/blog/2017/07/04/mcmc-3<h3 id="introduction">Introduction</h3>
<div class="text-center" id="venn"></div>
<p>This is part three in a series on probabilistic programming. <a href="https://sempwn.github.io/blog/2017/04/20/mcmc">Part one</a> introduces
Monte Carlo simulation and <a href="https://sempwn.github.io/blog/2017/06/04/mcmc-2">part two</a> introduces the concept of the Markov chain.
In this post I’ll introduce the concept of Bayes rule, which
is the main machinery at the heart of Bayesian inference.</p>
<p>The diagram above represents a probability of two events: A and B. A could be testing
positive for an infection and B could be actually have the infection. These events are
clearly not independent and so we need some way of establishing their relationship as
the probability of observing one is dependent on observing the other.
How can we quantify this? One way is to look at the intersection of both events
or in set notation $A \cap B$. We call the corresponding probability of both events
happening $P(A,B)$. What if we already know that an event has occurred? Say we know event
$A$ already, what would the probability of $B$ be given we know $A$ is true? Looking at the above
diagram we see this would be the area of the intersection divided by the total area of $A$.
We can represent this formula as $P(B | A) = P(A,B)/P(A)$. Which can be read as “The probability
of B given A is the probability of A and B divided by the probability of A”.</p>
<p>What if we now wanted to know what the probability of A is given B? Well, we can just swap
the symbols around in the previous formula to get $P(A | B) = P(A,B)/P(B)$. You’ll notice
that we can now define the probability of A and B based on conditional probabilities,
$P(A|B)P(B)$ or $P(B|A)P(A)$. If we equate these two and re-arrange we get the following:</p>
<h2 id="pab--fracpbapapb">\(P(A|B) = \frac{P(B|A)P(A)}{P(B)}\)</h2>
<p>This is known as Bayes’ rule and is the entire basis of Bayesian statistics. Its
power comes in the ability to take the probability of A on B
and invert the relationship to give the probability of B given A.</p>
<h3 id="sensitivity-and-specificity">Sensitivity and specificity</h3>
<p>One application of this rule is in DNA or disease testing. We can discover the
probability of testing positive given that an individual actually has the disease
through repeated measurements of a given test. However, what we’d really want to
know is whether someone actually has a disease given that they tested positive. Test accuracy
can be described in terms of their sensitivity and specificity. Sensitivity is the probability
of testing positive given a positive case and specificity is the probability of testing negative
given a negative case. We can invert this relationship using Bayes rule:</p>
<h3 id="pve--ve-test--fracptextve-test--textve-ptextveptextve-test">\(P(+ve | +ve test) = \frac{P(\text{+ve test} | \text{+ve}) P(\text{+ve})}{P(\text{+ve test})}.\)</h3>
<p>There’s a trick here where we can define the probability of a positive test by considering all the
associated conditional probabilities (known as the <a href="https://en.wikipedia.org/wiki/Law_of_total_probability">law of total probability</a>),</p>
<h3 id="ptextve-test--ptextve-test--textveptextve--ptextve-test--text-veptext-ve">\(P(\text{+ve test}) = P(\text{+ve test} | \text{+ve})P(\text{+ve}) + P(\text{+ve test} | \text{-ve})P(\text{-ve}).\)</h3>
<p>So it turns out in order to find the probability of being positive given a positive test we need to know what the underlying
probability of actually being positive is (known as the base rate). We can see what the consequences of this are in the interactive diagram below where we imagine that 1000 people have been tested and we also know their disease status. Try playing with the base rate, sensitivity and specificity.</p>
<div class="form-group text-center">
<label for="sensitivity">sensitivity</label>
<input id="sensitivity" data-slider-id="sensitivity" type="text" data-slider-min="0.01" data-slider-max="0.99" data-slider-step="0.01" data-slider-value="0.5" />
<div id="sensitivity-value" data-value="0.5"></div>
<label for="specificity">specificity</label>
<input id="specificity" data-slider-id="specificity" type="text" data-slider-min="0.01" data-slider-max="0.99" data-slider-step="0.01" data-slider-value="0.5" />
<div id="specificity-value" data-value="0.5"></div>
<label for="base-rate">base rate</label>
<input id="base-rate" data-slider-id="base-rate" type="text" data-slider-min="0.01" data-slider-max="0.99" data-slider-step="0.01" data-slider-value="0.1" />
<div id="base-rate-value" data-value="0.1"></div>
</div>
<div id="sankey"></div>
<p>If the base rate is low (1%), then even with a 99% sensitivity and specificity
the probability of actually being positive if you tested positive is approximately
50%. This is an example of the <a href="https://en.wikipedia.org/wiki/Prosecutor%27s_fallacy">prosecutor’s fallacy</a> and shows
how important it is to consider the base rate is of what you’re testing for.</p>
<p>This type of reasoning can be applied to a diverse set of problems. If you know the probability that someone has a
certain gene given they have developed cancer, what is the probability that someone will develop cancer given they have that gene. In a spam filter you can calculate the probability that an email is spam if it contains a set of keywords in terms of the probability of an email containing certain keywords given it’s spam.</p>
<h2 id="causal-belief">Causal belief</h2>
<figure class="figure">
<img class="center-block img-responsive" src="https://sempwn.github.io/img/mcmc3/umbrella.jpg" alt="complex models" />
<figcaption class="figure-caption text-center">
<a href="https://www.flickr.com/photos/kurotango/19272216640/in/photolist-vn29zf-ToVFYW-d1n8Zj-CPZ4oQ-BgWzTH-p1rMAp-fPty97-P7fWzg-pehe6o-vWgFGN-b7ins6-owxUw2-61W5d-dS5daw-2ApBw-diMjmG-spo8Yk-9eKrdT-dRgRA4-5NSwAh-eXR1wH-8WTuiQ-6ou6jp-qffzKr-pg845T-jz2te8-aPFfjD-dgDk1m-oDpMpZ-8DfaZa-HCSiNr-JYErZD-e7T7Ap-aay1w3-6u3sML-5PCfi-zyPoRy-8Vtbyt-6bCsrJ-eaFCJv-dZ84H1-ecrVVi-9qjn36-8WF7YL-bTBxTp-4xB5aU-xjdmG-2JT8v-7TsRtA-74TUgA">Umbrella.</a>
</figcaption>
</figure>
<p>We don’t have to stop with one event dependent on another. We could also consider
the impact of multiple events on one another in terms of their probabilities. This
generalization is called <a href="https://en.wikipedia.org/wiki/Bayesian_network">Bayesian network theory</a>.</p>
<p>Let’s consider the example where we’re deciding whether to take an umbrella with us or not based
on the weather forecast. Causally, this looks like the following diagram:</p>
<figure class="figure">
<img class="center-block img-responsive" src="https://sempwn.github.io/img/mcmc3/diagram.png" alt="diagram" />
<figcaption class="figure-caption text-center">
Causal Bayesian diagram. The probability of rain is dependent on it being
forecast, and the probability of using an umbrella is dependent on it
raining.
</figcaption>
</figure>
<p>Here we’re modelling the conditional dependencies of three events: whether
rain is forecast, whether it’s actually raining and whether you end up using
an umbrella. Bayes rule gives us a way of inverting this dependency, by for example
considering the probability of using an umbrella given rain was forecast. If
A = rain forecast, B = raining and C = using an umbrella then the joint probability
(the probability of all three events occurring) can be written as</p>
<h3 id="pabc--papbapcb">\(P(A,B,C) = P(A)P(B|A)P(C|B)\)</h3>
<p>You can see how changes in these probabilities impacts the conditional dependencies
in the tool below.</p>
<div class="form-group text-center">
<label for="prob-rain">Forecasted probability of rain</label>
<input id="prob-rain" data-slider-id="prob-rain" type="text" data-slider-min="0.01" data-slider-max="0.99" data-slider-step="0.01" data-slider-value="0.5" />
<div id="prob-rain-value" data-value="0.5"></div>
<label for="rtos">Probability no rain given not forecasted</label>
<input id="rtos" data-slider-id="rtos" type="text" data-slider-min="0.01" data-slider-max="0.99" data-slider-step="0.01" data-slider-value="0.5" />
<div id="rtos-value" data-value="0.5"></div>
<label for="rtow">Probability use umbrella given it's raining</label>
<input id="rtow" data-slider-id="rtow" type="text" data-slider-min="0.01" data-slider-max="0.99" data-slider-step="0.01" data-slider-value="0.1" />
<div id="rtow-value" data-value="0.1"></div>
<label for="stow">Probability of rain given rain forecasted</label>
<input id="stow" data-slider-id="stow" type="text" data-slider-min="0.01" data-slider-max="0.99" data-slider-step="0.01" data-slider-value="0.1" />
<div id="stow-value" data-value="0.1"></div>
</div>
<div class="text-center" id="sankey-two"></div>
<div class="text-center" id="venn-two"></div>
<p>This may seem like a slightly trivial example, but this type of model has a
huge amount of application. See for example, <a href="https://en.wikipedia.org/wiki/Latent_Dirichlet_allocation">Latent Dirichlet Allocation</a> in
natural language processing, <a href="https://en.wikipedia.org/wiki/Bayesian_hierarchical_modeling">Bayesian Hierarchical Models</a>
or <a href="https://en.wikipedia.org/wiki/Restricted_Boltzmann_machine">Restricted Boltzmann Machines</a>.</p>
<h2 id="inference">Inference</h2>
<p>One of the biggest applications of Bayes rule is in Bayesian inference. This
is where we have some data $D$ (ex. No. of heads in 100 coin flips. or heights of individuals in a population) and a model parameterised by a set of parameters
$\theta$ (e.g. probability of heads for a certain coin or mean and variance of population height) and we wish to calculate the probability $P(\theta|D)$. That is the
probability of a certain set of parameters given the data we have observed (e.g. probability coin produces heads is 50% given we’ve observed 100 coin flips that produced 75 heads).
Applying Bayes rule, we see that</p>
<h3 id="ptheta--d--fracpd--theta-pthetapd">\(P(\theta | D) = \frac{P(D | \theta) P(\theta)}{P(D)}\).</h3>
<p>We can therefore write down our desired probability in terms of the <em>probability
of observing some data given some model parameters</em> (known as the likelihood) and
the <em>probability of a given set of parameters</em> (known as the prior). We also have
a tricky <em>Probability of observing the data $D$</em> to deal with- let’s ignore this
for now.</p>
<p>What if we observed some new data $D_2$? Say we decided to flip the coin again
100 times or we get more data on a population. We can apply Bayes’ rule again in
a similar fashion as before to produce,</p>
<h3 id="ptheta--d_2d--fracpd_2--theta--pd--theta-pthetapd-pd_2">\(P(\theta | D_2,D) = \frac{P(D_2 | \theta ) P(D | \theta) P(\theta)}{P(D) P(D_2)}\).</h3>
<p>We can then update our posterior each time we observe new data.</p>
<p>This is all a bit abstract, let’s look at an actual example. Imagine we have
discovered a new cure for a disease and we want to estimate its efficacy, but we
only had a limited amount of data to date. We can assume each patient is independent
of one another (is this reasonable? Could how a trial has been set-up change this?)
and say that someone is cured with the treatment with a probability $p$. Each
individual then has a probability of being cured $p$ and not being cured $1-p$.
Now what if our trial had $N$ patients, what is the probability of seeing $x$
people cured? We can calculate this using the binomial distribution (this is
just a distribution that counts up all the ways $x$ people can be cured out
of $N$ people and sums up the probability for each event). We now have a
likelihood, but we also need a prior in order to perform inference. There’s
a trick where we can take a prior that has a special shape, so that we
can write down the posterior analytically. This trick is called <a href="https://en.wikipedia.org/wiki/Conjugate_prior">conjugate priors</a>. For this example
we can interpret the prior as having observed a previous trial where $y$ people
were cured out of $M$ individuals.</p>
<p>The tool below allows us to explore the consequences of this. We can change
how many patients were in the prior dataset and also in the current dataset
as well as change how many individuals were cured or not (by clicking on them).
Below that is a plot of the likelihood for the new dataset as well as the
posterior representing the probability of the cure rate incorporating both
datasets.</p>
<h3 class="text-center">Prior</h3>
<div id="prior-circles"></div>
<h3 class="text-center">Data</h3>
<div id="data-circles"></div>
<div id="prob-graph"></div>
<p>Some interesting things spring out of this. First, in the limit of a small amount
of data, even if all patients are cured we don’t spring to the conclusion that
the cure rate is 100%. Similarly, if we have little new information then this
doesn’t change our posterior beliefs all that much.</p>
<p>This works nicely for such a simple example, but what if patients came from
different populations where we know the cure rate is different? This leads on to
<a href="https://en.wikipedia.org/wiki/Bayesian_hierarchical_modeling">Bayesian Hierarchical Modelling</a>, but makes the inference calculation a lot
more involved.</p>
<p>The power of Bayesian probability comes from its ability to deal with
combining new information with other information or beliefs and its ability to
deal with small or missing data.</p>
<h3 id="acknowledgements">Acknowledgements</h3>
<p>The code for the Venn diagram can be found <a href="https://github.com/benfred/venn.js">here</a>.</p>
<p>All the interactive examples were coded in <a href="https://d3js.org">d3</a>, which has many fantastic <a href="https://bl.ocks.org">examples</a> for
data visualization.</p>
<style>
.node rect {
cursor: move;
fill-opacity: .9;
shape-rendering: crispEdges;
}
.node text {
pointer-events: none;
text-shadow: 0 1px 0 #fff;
}
.link {
fill: none;
stroke: #000;
stroke-opacity: .2;
}
.link:hover {
stroke-opacity: .5;
}
</style>
<!-- jQuery -->
<script src="https://sempwn.github.io/js/jquery.min.js"></script>
<!-- Plugin JavaScript -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery-easing/1.3/jquery.easing.min.js"></script>
<!-- d3 js v4 -->
<script src="https://d3js.org/d3.v4.min.js"></script>
<!-- sankey -->
<script src="https://sempwn.github.io/js/sankey.js"></script>
<!-- Plotly.js -->
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
<!-- slider -->
<script src="https://sempwn.github.io/js/bootstrap-slider.js"></script>
<!-- venn -->
<script src="https://sempwn.github.io/js/venn.js"></script>
<!-- main js -->
<script src="https://sempwn.github.io/js/mcmc3.js"></script>IntroductionProbabilistic programming 2: Markov Chains2017-06-04T00:00:00+00:002017-06-04T00:00:00+00:00https://sempwn.github.io/blog/2017/06/04/mcmc-2<h3 id="introduction">Introduction</h3>
<div class="row">
<div class="col-md-8 col-md-offset-2 col-xs-10 col-xs-offset-1">
<div class="text-center">
<div id="simple-walker-example"></div>
</div>
</div>
</div>
<p>This is part two of a blog post on probabilistic programming. The first part of
the blog can be found <a href="https://sempwn.github.io/blog/2017/04/20/mcmc">here</a>.</p>
<p>Markov chains are mathematical constructs with a wide range of applications in
physics, mathematical biology, speech recognition, statistics and many others.
The simplest way to think about them is considering the above animation. A person (the circle)
is trying to find out where their friend lives in a neighbourhood block. Unfortunately
all the houses (the squares) look the same and have no numbers. Each time they get to
a house they knock on the door, but then immediately forgets where they are. They can
then randomly choose to go left or right before trying another house. We could ask how long on
average it would take for them to find their friend’s house or what the probability is that they’d
find the house after a certain number of steps. This can be easily done as long as the person
forgets where they are each time they visit a house. This is the <a href="https://en.wikipedia.org/wiki/Markov_property">Markov property</a> and is crucial in
order to keep the computations of the system reasonable. This is where the person only has knowledge of their
current state (house) and have no memory of their previous states.
It turns out that there are lots of systems that have this property, however.</p>
<h3 id="cereal-toy-collector">Cereal toy collector</h3>
<figure class="figure">
<img class="center-block img-responsive" src="https://sempwn.github.io/img/mcmc2/cereal.jpg" alt="complex models" />
<figcaption class="figure-caption text-center">
<a href="https://www.flickr.com/photos/j0annie/15535885191/in/photolist-pERtpv-31eME-gx746-zajaP-q2x7DD-gUMcE-a4icZ-onbTxb-7Nnc-byFmWe-zH1rW-p9HQ93-p9uhrv-ph39UU-qe2np6-rud3Q1-pWtnjw-qzUq8F-pCUMxd-qzQwim-pD99sc-qiuj3v-qikHhW-pCUNTj-qiumHv-qisGzk-qxCd1u-qikEVG-qimsrJ-jJYgZ-8ML189-4CvrPN-6vXBPn-9ga8DM-p7HFiy-5cWLPV-oYm1Rt-p7HN8W-qzQxG3-qimpAd-qzQvwG-pD99Zz-nDJxTK-pD98AH-qxCfCU-qzUo7r-qxCg8w-qzJCBK-dgot1w-p9KDMk">Bowl of cereal.</a>
</figcaption>
</figure>
<p>One simple system to think about is the <a href="https://en.wikipedia.org/wiki/Coupon_collector%27s_problem">coupon collector’s problem</a>. Let’s think
about this in terms of toys that are given away in <a href="https://en.wikipedia.org/wiki/Cereal_box_prize">packets of cereal</a>.
Imagine a cereal company has a promotion on their cereal and are giving away four
toys with their cereal. When you buy one cereal packet you’ll receive one toy, but you don’t
know what toy you’ll receive until you buy the packet.</p>
<p>Let’s simulate this to think about the problem, below is an overview of our system
where all the toys we have collected (not including duplicates) up until that point are in the top row; the toy we
have just received is below along with a counter for how many cereal boxes we’ve bought. Below that is our abstract way
of thinking about this problem. If all we care about is how long it takes to complete the collection then instead of keeping
track of all the toys we currently have or all the specific toys we have collected without duplicates, we can instead track
how many unique toys up until that point we’ve collected. This forms a Markov chain, with the number of unique toys we have collected up until that point is our state. We can see that the probability of transitioning to a new state (receiving a new toy we haven’t already got) is only dependent on the current unique number of toys we’ve collected, so the Markov property holds.</p>
<div class="row">
<div class="col-md-8 col-md-offset-2 col-xs-10 col-xs-offset-1">
<div class="text-center">
<div class="btn-group">
<button id="buy-cereal" type="button" class="btn btn-default btn-lg"><span class="glyphicon glyphicon-shopping-cart" aria-hidden="true"></span>Buy Cereal</button>
<button id="reset-cereal" type="button" class="btn btn-primary btn-lg"><span class="glyphicon glyphicon-step-backward" aria-hidden="true"></span>Reset</button>
</div>
<div id="cereal-toy-example"></div>
</div>
</div>
</div>
<p>Try simulating cereal purchases a few times. You should be able to estimate on average how much cereal you’d need to buy to
complete a collection. Sometimes we’re lucky and can do this in just four purchases. Other times we’re unlucky and have to buy many more. We can work out how much our purchases might vary by calculating the variation in the number of purchases required to complete a collection.</p>
<h3 id="analysing-the-coupon-collector-problem">Analysing the coupon collector problem</h3>
<p>We can actually work out what the expected number of boxes we need to purchase to
complete the collection by hand. First let’s think about what the probability
is of moving from 0 unique toys to 1 unique toys. This is one as we’ll always
gain a toy we have collected before. Moving from 1 to 2, the probability is $3/4$
of gaining a new unique toy. Similarly, transitioning from 2 to 3 is $1/2$ and from
3 to 4 is a $1/4$.</p>
<p>What is the probability of gaining a new unique toy in $x$ purchases if the
probability of a new unique toy in one purchase is $p$?
This is the probability of not purchasing a new unique toy $x-1$ times followed
by purchasing a new unique toy, or $(1-p)^{x-1}p$. This is the geometric distribution
and we can quickly calculate the expectation and variance as,</p>
\[\begin{align}
\mathbb{E}[X] &= \frac{1}{p}\\
\text{Var}[X] &= \frac{1-p}{p^2}
\end{align}\]
<p>As all these events are independent of one another (the amount of cereal you
have to buy to get the second toy is independent of the amount you have to
buy to get the third toy for example), we can calculate the total expected
number and variance of the total cereal we have to buy to complete the collection
using the sum of expectation and sum of variance rule</p>
\[\begin{align}
\mathbb{E}[Y] &= \sum_{i=1}^{4}\mathbb{E}[X_i] \\
\text{Var}[Y] &= \sum_{i=1}^{4}\text{Var}[X_i]
\end{align}\]
<p>So inputing our values for $p$ ($1$,$3/4$,$1/2$,$1/4$), we find an expected number
of purchases of around $8.3$ with variance $14.4$. We can check these values by
simulating the process below multiple times and recording the mean and variance.</p>
<div class="row">
<div class="col-md-8 col-md-offset-2 col-xs-10 col-xs-offset-1">
<div class="text-center">
<div class="btn-group">
<button id="buy-cereals" type="button" class="btn btn-default btn-lg"><span class="glyphicon glyphicon-shopping-cart" aria-hidden="true"></span>Buy Cereals</button>
<button id="reset-cereals" type="button" class="btn btn-primary btn-lg"><span class="glyphicon glyphicon-step-backward" aria-hidden="true"></span>Reset</button>
</div>
<div id="cereal-multiple"></div>
</div>
</div>
</div>
<h3 id="walker-on-a-graph">Walker on a graph</h3>
<p>Markov chains don’t need to have a finite number of states. Consider our random
walker in the beginning. Instead of just visiting four houses, imagine that there
are an infinite number of houses on the block. Another way to think about this a
gambler’s winnings (as long as the gambler can go into an infinite amount of debt).
Let’s imagine the gambler plays a simple game where they flip a coin and if it’s heads
they gain one dollar and if it’s tails they lose a dollar. Let’s imagine that this is a fair
coin so the probability of a win is a half. We could ask how long it might take for the gambler
to lose all their money and end up at zero. It turns out this has a surprising answer.</p>
<p>First let’s simulate this process a few times below.</p>
<div class="row">
<div class="col-md-10 col-md-offset-1 col-xs-10 col-xs-offset-1">
<div class="text-center">
<div class="btn-group">
<button id="walker-run" type="button" class="btn btn-default btn-lg"><span class="glyphicon glyphicon-play" aria-hidden="true"></span>Run</button>
<button id="walker-reset" type="button" class="btn btn-primary btn-lg"><span class="glyphicon glyphicon-step-backward" aria-hidden="true"></span>Reset</button>
</div>
<div id="graphing-walker-example"></div>
</div>
</div>
</div>
<p>We can actually figure out some properties of this without the need to do
multiple simulations. The first trick is to figure out what the expected winnings
will be for the gambler after one one game. We know that with probability $p$ it will go
up one and with probability $1-p$ it will go down one. So
$1\times p + (-1)\times(1-p)$ is $2p -1$. Note that when the probability of gaining a dollar
is a half, the expected overall step is zero. This is because half the time
they lose a dollar and half the time they gain a dollar, so they both cancel.</p>
<p>Does this mean that after a hundred steps we would expect the winnings to be 0?
Clearly, if we experiment by running a few simulations this doesn’t appear to
be the case. There is in fact quite a bit of variation between each simulation.
What’s the variation in one time-step? Starting at 0 if we’re just as likely to
go up one or down one then we could guess the variance is 1. We can work this
out, using the formal definition of the variance
\(\text{Var}[X] = \mathbb{E}[X^2]-\mathbb{E}[X]^2\).
The first expectation is $1^2\times p + -1^2 \times (1-p)$ and the second term is
$(1\times p + -1 \times (1-p))^2$. This gives a formula for the variance as
$1-(2p-1)^2$.</p>
<p>The great thing about the Markov property is that the probability of moving to a given state
depends only on the previous state. For this simple model the probability of moving up or down
is actually independent of state. So to calculate the variance after $t$ games
we just need to sum up the variance for each step. In other words, the formula
turns out to be
\((1-(2p-1))^2 t\)</p>
<p>The variance is then at its maximum when $p$ is a half or when there’s equal
chance of moving up or down. After one hundred time steps the variance is one
hundred. This gives a probability of being at position 0 after one hundred
steps of only $8\%$. In fact over time, the probability of returning to the origin
diminishes. It turns out that the expected time to return to $0$ is actually infinite,
the gambler can take arbitrarily long excursions as there’s nothing bounding their winnings
(or losses).</p>
<h3 id="walker-with-drift">Walker with drift</h3>
<p>Imagine instead if the coin had a slight amount of bias i.e. $p\neq 1/2$. We can see from
our variance formula above that the more bias there is the lower the variance of
the gambler’s winnings. We can see the impact of a slight bias by varying the
probability of a win in the simulation below. The more we increase the probability
the smaller the variance becomes.</p>
<div class="row">
<div class="col-md-10 col-md-offset-1 col-xs-10 col-xs-offset-1">
<div class="text-center">
<div class="form-group">
<label for="prob-walker">Probability</label>
<input id="prob-walker" data-slider-id="prob-walker" type="text" data-slider-min="0.01" data-slider-max="0.99" data-slider-step="0.01" data-slider-value="0.5" />
<div id="prob-walker-value" data-value="0.5"></div>
</div>
<div class="btn-group">
<button id="drift-walker-run" type="button" class="btn btn-default btn-lg"><span class="glyphicon glyphicon-play" aria-hidden="true"></span>Run</button>
<button id="drift-walker-reset" type="button" class="btn btn-primary btn-lg"><span class="glyphicon glyphicon-step-backward" aria-hidden="true"></span>Reset</button>
</div>
<div id="graphing-drift-walker-example"></div>
</div>
</div>
</div>
<h3 id="application-pagerank">Application: PageRank</h3>
<p>Finally let’s look at a real application of a Markov chain with the
<a href="https://en.wikipedia.org/wiki/PageRank">PageRank</a> algorithm. This is the original
algorithm used by Google to rank web pages. A more in depth look can be found
<a href="http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm">here</a> .</p>
<p>We can think about the algorithm in terms of a random walker “walking” through
web pages by clicking on links. When it arrives at a web page, it finds all the
links and then picks one at random to click on. For simplicity let’s imagine that
every page has a link back to the page that linked to it. Each time the walker
passes through a page we record this by adding a point to the page. After a long
time the pages can be ranked by how many times the walker has visited it. This
allows a way of quantifying how “central” a page is on the web. One issue is
that the rank number for each page keeps increasing. We can add a dampening factor
by making the page’s point smaller if the walker visits later in its walk.</p>
<p>We can simulate the process on a small random network below. The colour denotes
how recently the walker had visited that page and the size denotes its PageRank.</p>
<div class="row">
<div class="col-md-10 col-md-offset-1 col-xs-10 col-xs-offset-1">
<div class="text-center">
<div class="btn-group">
<button id="network-play" type="button" class="btn btn-default btn-lg"><span class="glyphicon glyphicon-glyphicon-play" aria-hidden="true"></span>Run</button>
<button id="network-reset" type="button" class="btn btn-primary btn-lg"><span class="glyphicon glyphicon-step-backward" aria-hidden="true"></span>Reset</button>
</div>
<div id="network-example"></div>
</div>
</div>
</div>
<p>In reality there are far quicker ways to calculate the PageRank than
to just simulate it. By setting the problem up as a Markov chain though, we’re
able to use a lot of mathematical machinery to calculate it efficiently.
<!-- jQuery -->
<script src="https://sempwn.github.io/js/jquery.min.js"></script></p>
<!-- Plugin JavaScript -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery-easing/1.3/jquery.easing.min.js"></script>
<!-- Bootstrap -->
<script src="https://sempwn.github.io/js/bootstrap-slider.js"></script>
<!-- D3 js -->
<script src="https://d3js.org/d3.v3.js"></script>
<!-- main js -->
<script src="https://sempwn.github.io/js/mcmc2.js"></script>IntroductionProbabilistic programming 1: Monte Carlo Method2017-04-20T00:00:00+00:002017-04-20T00:00:00+00:00https://sempwn.github.io/blog/2017/04/20/mcmc<h3 id="introduction">Introduction</h3>
<p>This is the first post in a series on <a href="https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo">Markov Chain Monte Carlo</a> (MCMC), a powerful technique used in performing inference on probabilistic models. We’ll unpack
what each of these terms mean: what a <a href="https://en.wikipedia.org/wiki/Markov_chain">Markov Chain</a>
is, what <a href="https://en.wikipedia.org/wiki/Monte_Carlo_algorithm">Monte Carlo simulation</a>
is and then finally how it all fits together to in the framework of MCMC.</p>
<h3 id="background-monte-carlo-method">Background: Monte Carlo method</h3>
<p>The main idea behind the Monte Carlo method is to use simulate randomly from a
probability distribution where it is difficult or not possible to have a direct
numerical solution of the probability.</p>
<p>Imagine we have a complex deterministic model such as those used in hydrodynamic
flow or climate change. We might have many inputs into this model and these inputs
are all likely to have some uncertainty around them. A way to understand how this
uncertainty impacts the model prediction is to just simulate from it many times
using inputs taken from the uncertainty distributions. Notice that this automatically
penalizes against rare scenarios. If the probability of an input is low then it is
unlikely to be selected and therefore unlikely to contribute to the model prediction.</p>
<p>The method came about from the Manhattan project. Nicholas Metropolis and his
team developed the technique and needed some code name for it. They decided to
name it after the Monte Carlo casino where the uncle of Stanislaw Ulam
(another member of the team) often gambled. The intuition is that if you want to
understand say, what the probability of winning at roulette given a certain
strategy is, then one solution is to just play it many many times and recorded
how often you win and lose. Then after going through your entire savings you
divide the number of times you had a win by the total number of times you
played and that’s your estimate of your probability of a win. If you don’t want
to burn through all your money to understand this then you can create a computer
simulation of a roulette wheel and use that to perform your experiment.</p>
<h3 id="simple-example-the-binomial-process">Simple example: The binomial process</h3>
<p>This is a bit abstract so let’s look at a simple example of performing a series
of coin tosses. Each coin toss can be heads with probability $p$ (normally
0.5 is chosen for a fair coin, although there’s some <a href="http://statweb.stanford.edu/~susan/papers/headswithJ.pdf">evidence that isn’t always the case</a> ).
In a series of $N$ coin tosses the probability of $k$ heads follows a binomial distribution
like this</p>
<h3 id="pknp--binomnk-pk1-pn-k">\(P(k|N,p) = \binom{N}{k} p^k(1-p)^{N-k}.\)</h3>
<p>Let’s imagine we didn’t have a formula for the probability. We could instead
repeatedly simulate coin tosses, recording the number of heads in order to build
up an empirical distribution that asymptotically converges to the binomial
distribution. You can use the tool below to simulate this, changing the probability and speed
of the simulation.</p>
<div class="row">
<div class="col-md-8 col-md-offset-2 col-xs-10 col-xs-offset-1">
<div class="text-center">
<div class="btn-group">
<button id="start-binom" type="button" class="btn btn-default btn-lg"><span class="glyphicon glyphicon-play" aria-hidden="true"></span>Start</button>
<button id="reset-binom" type="button" class="btn btn-primary btn-lg"><span class="glyphicon glyphicon-step-backward" aria-hidden="true"></span>Reset</button>
</div>
<div class="form-group">
<label for="speed-binom">Speed</label>
<input id="speed-binom" data-slider-id="speed-binom" type="text" data-slider-min="0.1" data-slider-max="1" data-slider-step="0.1" data-slider-value="0.5" data-tooltip="hide" />
<div id="speed-binom-value" data-value="0.5"></div>
<label for="prob-binom">Probability</label>
<input id="prob-binom" data-slider-id="prob-binom" type="text" data-slider-min="0.1" data-slider-max="0.9" data-slider-step="0.1" data-slider-value="0.5" />
<div id="prob-binom-value" data-value="0.5"></div>
</div>
</div>
<div class="text=center">
<div id="bin-proc"></div>
</div>
</div>
</div>
<h3 id="calculating-pi">Calculating pi</h3>
<p>Another example of Monte Carlo simulation is in the calculation of $\pi$. One example
of this is <a href="https://en.wikipedia.org/wiki/Buffon%27s_needle">Buffon’s needle</a>, which can be done
by throwing down matchsticks and observing how often they cross a series of parallel lines.</p>
<p>It’s maybe easier to think instead about random points falling onto a square of length
one and seeing how often the points fall inside a circle of diameter one centered
in the middle of the square. The probability of the point landing in the circle is the
same as the area of the circle divided by the area of the square. From basic geometry, the circle area
is
\(\pi r^2 = \pi (1/2)^2 = \pi/4\)
and the area of the square is one. Therefore the probability that a random point
lands in the circle is $\pi/4$. Note that in order to check if a point is contained
in the circle, you just need to check if the sum of squares of its coordinates are
less than one, so this method doesn’t implicitly include $\pi$ anywhere.</p>
<p>You can simulate this process in the tool below.</p>
<div class="row">
<div class="col-md-8 col-md-offset-2 col-xs-10 col-xs-offset-1">
<div class="text-center">
<div class="btn-group">
<button id="start-pi" type="button" class="btn btn-default btn-lg"><span class="glyphicon glyphicon-play" aria-hidden="true"></span>Start</button>
<button id="reset-pi" type="button" class="btn btn-primary btn-lg"><span class="glyphicon glyphicon-step-backward" aria-hidden="true"></span>Reset</button>
</div>
</div>
<div id="pimc"></div>
</div>
</div>
<h3 id="the-permutation-method">The permutation method</h3>
<p>Finally, let’s look at a less trivial example. Suppose that we conducted a
public health intervention and we want to compare the outcome between a group
who did not receive the intervention and those who did. We could look at an outcome
measure (blood pressure, BMI etc.) and compare the mean in both groups. How do
we know if the difference of the means is significant? We might use something
like a <a href="https://en.wikipedia.org/wiki/Student%27s_t-test">t-test</a>, but this
introduces a few assumptions about the data such as <a href="https://en.wikipedia.org/wiki/Normality_test">normality</a>.</p>
<p>Let’s assume we don’t know where the data came from, we could instead simulate by
dividing the data we have into two groups many times and comparing the difference
of means in the two groups. Dividing into two groups is the same as random sampling
without replacement, hence why this technique is known as the <a href="https://en.wikipedia.org/wiki/Resampling_(statistics)#Permutation_tests">permutation test</a>.</p>
<p>Below we have a tool that randomly samples a set of data from two normal distributions, with
one mean centered at zero and another that you can vary. You can also vary the population size
of each group. When the simulation starts a random permutation occurs and then the new difference
in the means of both groups is recorded. If it’s greater than the original difference, then this simulation is counted.
These counts are then used to estimate the probability that a randomly selected permutation has a difference
of means at least as large as the original data. You can see after many iterations this value converges and
you can use a pre-determined p-value to see whether this is significant or not.</p>
<div class="row" id="lattice-epidemic-tool">
<div class="col-md-8 col-md-offset-2 col-xs-10 col-xs-offset-1">
<div class="text-center">
<div class="btn-group">
<button id="start-bootstrap" type="button" class="btn btn-default btn-lg"><span class="glyphicon glyphicon-play" aria-hidden="true"></span>Start</button>
<button id="reset-bootstrap" type="button" class="btn btn-primary btn-lg"><span class="glyphicon glyphicon-step-backward" aria-hidden="true"></span>Reset</button>
</div>
<div class="form-group">
<label for="mean-bootstrap">Group Mean</label>
<input id="mean-bootstrap" data-slider-id="mean-bootstrap" type="text" data-slider-min="0.0" data-slider-max="5" data-slider-step="0.1" data-slider-value="0.5" data-tooltip="hide" />
</div>
<div class="form-group">
<label for="sample-bootstrap">Sample Number</label>
<input id="sample-bootstrap" data-slider-id="sample-bootstrap" type="text" data-slider-min="5" data-slider-max="20" data-slider-step="1" data-slider-value="10" />
</div>
</div>
<div id="bootstrap-example"></div>
</div>
</div>
<p>As always there’s some caveats with this technique. Try playing around with small population sizes when the mean of the
two distributions are the same to see if you can still end up with something significant. The moral is to always be suspicious
when the size of the data is small.</p>
<h4 id="acknowledgements">Acknowledgements</h4>
<p>For the calculation of pi example, I adapted code from <a href="https://bl.ocks.org/bricedev/33de9b3c78b442938d52">here</a>.</p>
<p>All the interactive examples were coded in <a href="https://d3js.org">d3</a>, which has many fantastic <a href="https://bl.ocks.org">examples</a> for
data visualization.
<!-- jQuery -->
<script src="https://sempwn.github.io/js/jquery.min.js"></script></p>
<!-- Plugin JavaScript -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery-easing/1.3/jquery.easing.min.js"></script>
<!-- Bootstrap -->
<script src="https://sempwn.github.io/js/bootstrap-slider.js"></script>
<!-- D3 js -->
<script src="https://d3js.org/d3.v3.js"></script>
<!-- main js -->
<script src="https://sempwn.github.io/js/mcmc.js"></script>IntroductionPresenting & Communicating models: Creating online web applications2017-04-12T00:00:00+00:002017-04-12T00:00:00+00:00https://sempwn.github.io/blog/2017/04/12/online_web_tools<p>This blog came about from a recently published <a href="http://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0005206">article</a> I had in
<a href="http://journals.plos.org/plosntds/">PLoS NTD</a> that I also recently gave as a talk.
The main idea is that it’s becoming increasingly easy to create front-ends or
dashboards for an epidemic model or models in general and so to lay out some
of the tools that we could potentially utilise. Creating a dashboard for a model comes with
its own set of challenges and questions: how much can a user play around with the model?
what should be outputted and in what format? how can you easily show what’s going into
the model?</p>
<p>These are difficult questions to answer and would certainly be answered differently
on a case by case basis. We tried laying out in the article and here some of the things that
need to be considered on a conceptual level and some of the advantages and disadvantages
to this approach. Finally I go through a few different technologies for generating a
modelling tool and then talk through some examples.</p>
<h3 id="challenges-in-communicating-models">Challenges in communicating models</h3>
<p>Developing user friendly interfaces we probably want to consider the following:</p>
<ul>
<li>Access—for users with limited modelling expertise.</li>
<li>Speed—analyses produced quickly without expensive computer resources.</li>
<li>Characterisation of uncertainty—usually through repeated runs of the model, resulting in a higher processing burden.</li>
<li>Ease of use—requires design choices, including instructive inputs.</li>
<li>Clarity of presentation—limiting misunderstanding of the model and its outputs.</li>
<li>Responsiveness to needs—flexibility to iteratively update the interface through a consultation with intended end-users.</li>
<li>Range of users—different users have different needs, and it is challenging to survey and understand all of these needs.</li>
</ul>
<h3 id="advantages--disadvantages-for-a-web-interface">Advantages & disadvantages for a web interface</h3>
<p>The main advantage is that it can give access to the model for non-expert users.
They are also able to generate results from the model in real-time using their
own processing power. Interactive input and output can be tailored directly to
disease-specific goals, such as seeing what impact increasing the coverage of a
vaccine has.</p>
<p>There are some things to consider however. There could be potential misinterpretation
or misuse of results due to lack of expert guidance; for example, the dynamics
of breaking transmission are likely to be highly locally specific and the
modeling results should considered in this context.
Part of this means that limited parameters can be changed in the model and
end-users don’t have full access to model through interface. This also means it
is limited either tailoring to local settings or can only deal with very generic
and perhaps unrealistic scenarios.</p>
<h3 id="can-a-browser-really-run-complex-model-simulations">Can a browser really run complex model simulations?</h3>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/online_tools/image1.png" alt="complex models" /></p>
<p>The short answer is yes! The <a href="https://en.wikipedia.org/wiki/Browser_wars#Second_Browser_War">Second Browser War</a> lead to development of Google’s <a href="https://developers.google.com/v8/">V8</a> engine in 2011.
Other browsers followed in suit and now most browsers can run standard ECMAscript.
There is also an increasing trend in single page applications shifting computation
from server to client, meaning more libraries and tools are available with this
in mind.</p>
<p>Some examples include</p>
<ul>
<li>Neural networks: <a href="http://playground.tensorflow.org/">Tensorflow</a>, <a href="http://cs.stanford.edu/people/karpathy/convnetjs/">ConvNetjs</a></li>
<li>Particle simulator: <a href="http://google.github.io/liquidfun/">LiquidFun</a></li>
<li>PDE, ODE solvers: <a href="http://www.numericjs.com">numericjs</a></li>
<li>3d simulations</li>
<li>Many more…</li>
</ul>
<h3 id="potential-web-interface-technologies">Potential web interface technologies</h3>
<p>Now we’ve established what we can do in a browser we can look at some of the
technologies currently available.</p>
<h4 id="shiny">Shiny</h4>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/online_tools/image2.png" alt="complex models" /></p>
<p><a href="https://shiny.rstudio.com">Shiny</a> is a way of writing apps in the statistical
language <a href="https://www.r-project.org">R</a>. It’s becoming an increasingly popular
way of creating a modelling dashboard and is under active development so there’ll
be lots of new features being added. If the model’s already written in <code class="language-plaintext highlighter-rouge">R</code> then
it should be easy to implement something quickly.</p>
<h4 id="python-jupyter-notebook">Python jupyter notebook</h4>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/online_tools/image3.png" alt="complex models" /></p>
<p><a href="http://jupyter.org">jupyter</a> notebooks are another viable option for
developing a simple dashboard. Again, this doesn’t require an explicit knowledge
of web technologies although they become helpful for more complex tasks. They’re
extremely easy to set up and host on somewhere like <a href="https://github.com">Github</a>
and are great for quickly displaying concepts and creating tutorials/blog posts.
As with Shiny, there comes a limit where you won’t be able to accomplish everything
that you’d want to do however when trying to create tools for more complex
models.</p>
<h4 id="native-javascript">Native JavaScript</h4>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/online_tools/image4.png" alt="complex models" /></p>
<p>As with all other libraries or APIs that provide a way of generating a website
or tool, at some point you’re going to want to do something that has been
explicitly accounted for by the framework. It’s worth considering creating something
in more native javascript therefore, although this is going to require some
degree of knowledge of web development (CSS,HTML etc). What you get from this is
access to very powerful libraries for visualisation such as <a href="https://d3js.org">d3.js</a> and
<a href="https://plot.ly">Plotly.js</a>. I’d recommend browsing the <code class="language-plaintext highlighter-rouge">d3</code> gallery to see the range
of interactive graphs and figures it can achieve. Another advantage of going down
the javascript route is providing an easier way of linking to external databases and
other website APIs. As mentioned before browser js engines are powerful and there are
plenty of libraries out there for solving ODEs and PDEs as well as plenty of scope
for writing your own solver libraries.</p>
<p>Although there’s a bit more overhead with something like <code class="language-plaintext highlighter-rouge">Shiny</code> you can generate
dashboards using a HTML/CSS library such as <a href="http://getbootstrap.com">Bootstrap</a>.
There also exist lots of example dashboard templates that are often free to use.</p>
<h3 id="some-examples">Some examples</h3>
<p>Let’s look at some specific examples written in <code class="language-plaintext highlighter-rouge">js</code>, <code class="language-plaintext highlighter-rouge">plotly</code> and <code class="language-plaintext highlighter-rouge">d3</code>. For these
examples we’ll use the SIR model. This is one of the simplest epidemic models, where
individuals progress through three disease stages: susceptible (\(S\)), infected (\(I\))
and recovered (\(R\)) as in the diagram below</p>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/online_tools/SIR_model.png" alt="complex models" /></p>
<p>We’ll assume a fixed population where everyone is susceptible except for a couple of
infections at the start.</p>
<p>The first example is an individual simulation, with individuals represented as coloured
circles with blue denoting susceptible, red denoting infected and gray denoting recovered.
We can interact with the simulation as it progresses by clicking on individuals to
“vaccinate” them. All other parameters such as the rate of infectivity and the infectious period
can’t be adjusted.</p>
<div class="row" id="lattice-epidemic-tool">
<div class="col-md-8 col-md-offset-2 col-xs-10 col-xs-offset-1">
<div class="text-center">
<div class="btn-group">
<button id="pause" type="button" class="btn btn-default btn-lg"><span class="glyphicon glyphicon-pause" aria-hidden="true"></span>Pause</button>
<button id="reset" type="button" class="btn btn-primary btn-lg"><span class="glyphicon glyphicon-step-backward" aria-hidden="true"></span>Reset</button>
</div>
</div>
<div class="text=center">
<div id="latticeEpidemic">
</div>
</div>
</div>
</div>
<p>Going further into the concept of the SIR, we might want to explore what impact
the different parameters have on the dynamics of the infection. Below we simulate
a deterministic (ODE or in a very large population) epidemic and explore what impact
the basic reproduction number \(R_0\) and the infectious period has. \(R_0\) can be a slightly
tricky concept to understand, it’s defined as the average number of secondary cases from
one primary case in a completely susceptible population. Notice that if it’s less than one
then there’s no chance of an epidemic taking off.</p>
<div id="SIRGraphDiv">
</div>
<form class="form">
<div class="form-group">
<label for="inputr0">Basic reproduction number</label>
<input id="inputr0" data-slider-id="ex1Slider" type="text" data-slider-min="0" data-slider-max="10" data-slider-step="0.1" data-slider-value="2" />
<small class="form-text text-muted">Above one leads to epidemic.</small>
</div>
<div class="form-group">
<label for="inputgamma">Infectious period</label>
<input id="inputgamma" data-slider-id="ex1Slider" type="text" data-slider-min="0.1" data-slider-max="30" data-slider-step="1" data-slider-value="3" />
<small class="form-text text-muted">.</small>
</div>
</form>
<p>We can also explore this concept using a stochastic as opposed to a deterministic
model. Here we simulate the epidemic starting with one infected individual and
repeat the simulation multiple times to create a distribution of the epidemic curves.</p>
<div id="StochSIRGraphDiv">
</div>
<form class="form">
<div class="form-group">
<label for="inputr0s">Basic reproduction number</label>
<input id="inputr0s" data-slider-id="ex1Slider" type="text" data-slider-min="0" data-slider-max="10" data-slider-step="0.1" data-slider-value="2" />
<small class="form-text text-muted">Above one leads to epidemic.</small>
</div>
<div class="form-group">
<label for="inputgammas">Infectious period</label>
<input id="inputgammas" data-slider-id="ex1Slider" type="text" data-slider-min="0.1" data-slider-max="30" data-slider-step="1" data-slider-value="3" />
<small class="form-text text-muted">.</small>
</div>
</form>
<p>For a more sophisticated example, although one that uses the same basic principles,
go to <a href="http://www.ntdmodelling.org/transfil/">ntdmodelling.org/transfil</a> for a dashboard to model
intervention strategies for lymphatic filariasis (a <a href="http://www.who.int/neglected_diseases/diseases/en/">neglected tropical disease</a>) that
we developed.</p>
<h3 id="conclusion">Conclusion</h3>
<p>There are many potential technologies out there to build single-page applications and dashboards
and it’s becoming easier to produce some really powerful, user-friendly tools.
One of the big advantages for having interactive plots is when it comes to geographic data.
This is again where a library like <code class="language-plaintext highlighter-rouge">d3</code> really shines and there are some fantastic examples
out there.</p>
<p>All of these examples can run in browser, which offers a lot of advantages. Users won’t
need to install any software, they can store data locally, interact with other website APIs to pull
in other data sources and if the application is updated then these a pushed to the user immediately.</p>
<p>Even if the model can’t be coded up in <code class="language-plaintext highlighter-rouge">js</code>, having a few interactive plots goes a long way to
explaining complex results and can provide a more succinct way of conveying geographic data.</p>
<p>The original article that inspired this blog is open-access and can be found below.</p>
<p><a href="http://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0005206">Irvine, Michael A., and T. Deirdre Hollingsworth. “Making transmission models accessible to end-users: the example of TRANSFIL.” <em>PLoS Neglected Tropical Diseases</em> 11.2 <strong>(2017)</strong></a></p>
<!-- jQuery -->
<script src="https://sempwn.github.io/js/jquery.min.js"></script>
<!-- Plot.ly js -->
<script src="https://cdn.plot.ly/plotly-latest.min.js"></script>
<!-- Plugin JavaScript -->
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery-easing/1.3/jquery.easing.min.js"></script>
<!-- Bootstrap -->
<script src="https://sempwn.github.io/js/bootstrap-slider.js"></script>
<!-- D3 js -->
<script src="https://sempwn.github.io/js/d3.js"></script>
<script src="https://sempwn.github.io/js/d3.layout.js"></script>
<script src="https://sempwn.github.io/js/d3.geom.js"></script>
<script src="https://sempwn.github.io/js/d3.grid.js"></script>
<script src="https://sempwn.github.io/js/SIR.js"></script>This blog came about from a recently published article I had in PLoS NTD that I also recently gave as a talk. The main idea is that it’s becoming increasingly easy to create front-ends or dashboards for an epidemic model or models in general and so to lay out some of the tools that we could potentially utilise. Creating a dashboard for a model comes with its own set of challenges and questions: how much can a user play around with the model? what should be outputted and in what format? how can you easily show what’s going into the model?Introduction to convolutional neural networks2017-04-06T00:00:00+00:002017-04-06T00:00:00+00:00https://sempwn.github.io/blog/2017/04/06/conv_net_intro<h3 id="preamble">Preamble</h3>
<p>Python notebook for this post can be found at <a href="https://github.com/sempwn/keras-intro">https://github.com/sempwn/keras-intro</a></p>
<p>Before starting we’ll need to make sure tensorflow and keras are installed. Open a terminal and type the following commands:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>pip <span class="nb">install</span> <span class="nt">--user</span> tensorflow
pip <span class="nb">install</span> <span class="nt">--user</span> keras <span class="nt">--upgrade</span>
</code></pre></div></div>
<p>The back-end of keras can either use theano or tensorflow. Verify that keras will use tensorflow by using the following command:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sed</span> <span class="nt">-i</span> <span class="s1">'s/theano/tensorflow/g'</span> <span class="nv">$HOME</span>/.keras/keras.json
</code></pre></div></div>
<p>Note that this post was written in keras 2.0, where there have been a number of changes from version 1.0. We begin by loading in the libraries we’ll be using in the notebook</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">%</span><span class="n">pylab</span> <span class="n">inline</span>
<span class="kn">import</span> <span class="nn">keras</span>
<span class="kn">from</span> <span class="nn">keras.datasets</span> <span class="kn">import</span> <span class="n">mnist</span>
<span class="kn">from</span> <span class="nn">keras.models</span> <span class="kn">import</span> <span class="n">Sequential</span>
<span class="kn">from</span> <span class="nn">keras.layers</span> <span class="kn">import</span> <span class="n">Dense</span><span class="p">,</span> <span class="n">Dropout</span><span class="p">,</span> <span class="n">Activation</span><span class="p">,</span> <span class="n">Flatten</span>
<span class="kn">from</span> <span class="nn">keras.layers</span> <span class="kn">import</span> <span class="n">Conv2D</span><span class="p">,</span> <span class="n">MaxPooling2D</span>
<span class="kn">from</span> <span class="nn">keras.utils</span> <span class="kn">import</span> <span class="n">np_utils</span>
<span class="kn">from</span> <span class="nn">keras</span> <span class="kn">import</span> <span class="n">backend</span> <span class="k">as</span> <span class="n">K</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Populating the interactive namespace from numpy and matplotlib
</code></pre></div></div>
<h2 id="convolutional-neural-networks--a-very-brief-introduction">Convolutional neural networks : A very brief introduction</h2>
<p>To quote wikipedia:</p>
<blockquote>
<p>Convolutional neural networks are biologically inspired variants of multilayer perceptrons, designed to emulate the behaviour of a visual cortex. These models mitigate the challenges posed by the MLP architecture by exploiting the strong spatially local correlation present in natural images.</p>
</blockquote>
<p>One principle in machine learning is to create a feature map for data and then use your favourite classifier on those features. For image data this might be presence of straight lines, curved lines, placement of holes etc. This strategy can be very problem dependent. Instead of having to feature engineer for each specific problem, it would be better to automatically generate the features and combine with the classifier. CNNs are a way to achieve this.</p>
<h3 id="automatic-feature-engineering">Automatic feature engineering</h3>
<p>Filters or convolution kernels can be treated like automatic feature detectors. A number of filters can be set before hand. For each filter, a convolution with this and part of the input is done for each part of the image. Weights for each filter are shared to reduce location dependency and reduce the number of parameters. The end result is a multi-dimensional matrix of copies of the original data with each filter applied to it.</p>
<p><img class="center-block img-responsive" src="http://cs231n.github.io/assets/cnn/depthcol.jpeg" alt="complex models" /></p>
<p>For a classification task, after one or more convolutional layers a number of fully connected layers can be added. The final layer has the same output as the number of classes.</p>
<h3 id="pooling">Pooling</h3>
<p>Once convolutions have been performed across the whole image, we need someway of down-sampling. The easiest and
most common way is to perform max pooling. For a certain pool size return the maximum from the filtered image of that subset is given as the output. A diagram of this is shown below</p>
<p><img class="center-block img-responsive" src="https://upload.wikimedia.org/wikipedia/commons/e/e9/Max_pooling.png" alt="complex models" /></p>
<h3 id="mnist-data-set">MNIST data set</h3>
<p>We’ll begin by loading in the MNIST data set, which is a standard set of 28x28 grayscale images of handwritten numerical digits. Keras comes with it built in and automatically splits the data into a training and validation set.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># the data, shuffled and split between train and test sets
</span><span class="p">(</span><span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">),</span> <span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">)</span> <span class="o">=</span> <span class="n">mnist</span><span class="p">.</span><span class="n">load_data</span><span class="p">()</span>
</code></pre></div></div>
<h3 id="convolutions-on-image">Convolutions on image</h3>
<p>Let’s get some insight into what a random filter applied to a test image does. We’ll compare this to the trained filters at the end.</p>
<p>Each filtered pixel in the image is defined by \(C_i = \sum_j{I_{i+j-k} W_j}\), where \(W\) is the filter (sometimes known as a kernel), \(j\) is the 2D spatial index over \(W\), \(I\) is the input and \(k\) is the coordinate of the center of \(W\), specified by origin in the input parameters.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">scipy</span> <span class="kn">import</span> <span class="n">signal</span>
<span class="n">i</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="n">x_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">x_train</span><span class="p">[</span><span class="n">i</span><span class="p">,:,:]</span>
<span class="n">plt</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">c</span><span class="p">,</span><span class="n">cmap</span><span class="o">=</span><span class="s">'gray'</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'original image'</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">18</span><span class="p">,</span><span class="mi">8</span><span class="p">))</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">):</span>
<span class="n">k</span> <span class="o">=</span> <span class="o">-</span><span class="mf">1.0</span> <span class="o">+</span> <span class="mf">1.0</span><span class="o">*</span><span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">rand</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">3</span><span class="p">)</span>
<span class="n">c_digit</span> <span class="o">=</span> <span class="n">signal</span><span class="p">.</span><span class="n">convolve2d</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">boundary</span><span class="o">=</span><span class="s">'symm'</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s">'same'</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="mi">5</span><span class="p">,</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">c_digit</span><span class="p">,</span><span class="n">cmap</span><span class="o">=</span><span class="s">'gray'</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">);</span>
</code></pre></div></div>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/conv_intro/output_8_0.png" alt="complex models" /></p>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/conv_intro/output_8_1.png" alt="complex models" /></p>
<p>As you can see the random filters aren’t capable of differentiating different parts or features of the image. We do know however that non-random filters are very good at things like edge detection. Let’s compare these random filters above to a standard <a href="https://en.wikipedia.org/wiki/Kernel_%28image_processing%29">edge-detection filter</a>. One such filter is used below</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#define edge-detection filter
</span><span class="n">k</span> <span class="o">=</span> <span class="p">[</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span><span class="o">-</span><span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">],</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
<span class="p">]</span>
<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">();</span>
<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">c</span><span class="p">,</span><span class="n">cmap</span><span class="o">=</span><span class="s">'gray'</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'original image'</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">);</span>
<span class="n">c_digit</span> <span class="o">=</span> <span class="n">signal</span><span class="p">.</span><span class="n">convolve2d</span><span class="p">(</span><span class="n">c</span><span class="p">,</span> <span class="n">k</span><span class="p">,</span> <span class="n">boundary</span><span class="o">=</span><span class="s">'symm'</span><span class="p">,</span> <span class="n">mode</span><span class="o">=</span><span class="s">'same'</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">c_digit</span><span class="p">,</span><span class="n">cmap</span><span class="o">=</span><span class="s">'gray'</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'edge-detection image'</span><span class="p">);</span>
</code></pre></div></div>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/conv_intro/output_10_0.png" alt="png" /></p>
<h3 id="keras-introduction">Keras introduction</h3>
<p>We’re using <a href="keras.io">keras</a> to construct and fit the convolutional neural network. Quoting their website</p>
<blockquote>
<p>Keras is a high-level neural networks API, written in Python and capable of running on top of either <a href="https://www.tensorflow.org">TensorFlow</a> or <a href="http://deeplearning.net/software/theano/">Theano</a>. It was developed with a focus on enabling fast experimentation.
Being able to go from idea to result with the least possible delay is key to doing good research.</p>
</blockquote>
<p>We can rapidly develop a convolutional neural network in order to experiment with our image classification task. The first step will be to pre-process the data into a form that can be fed into a keras model</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">batch_size</span> <span class="o">=</span> <span class="mi">128</span>
<span class="n">nb_classes</span> <span class="o">=</span> <span class="mi">10</span>
<span class="n">nb_epoch</span> <span class="o">=</span> <span class="mi">6</span>
<span class="c1"># input image dimensions
</span><span class="n">img_rows</span><span class="p">,</span> <span class="n">img_cols</span> <span class="o">=</span> <span class="mi">28</span><span class="p">,</span> <span class="mi">28</span>
<span class="c1"># number of convolutional filters to use
</span><span class="n">nb_filters</span> <span class="o">=</span> <span class="mi">32</span>
<span class="c1"># size of pooling area for max pooling
</span><span class="n">pool_size</span> <span class="o">=</span> <span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">)</span>
<span class="c1"># convolution kernel size
</span><span class="n">kernel_size</span> <span class="o">=</span> <span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
<span class="k">if</span> <span class="n">K</span><span class="p">.</span><span class="n">image_data_format</span><span class="p">()</span> <span class="o">==</span> <span class="s">'channels_first'</span><span class="p">:</span>
<span class="n">x_train</span> <span class="o">=</span> <span class="n">x_train</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">x_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">1</span><span class="p">,</span> <span class="n">img_rows</span><span class="p">,</span> <span class="n">img_cols</span><span class="p">)</span>
<span class="n">x_test</span> <span class="o">=</span> <span class="n">x_test</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">x_test</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">1</span><span class="p">,</span> <span class="n">img_rows</span><span class="p">,</span> <span class="n">img_cols</span><span class="p">)</span>
<span class="n">input_shape</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">img_rows</span><span class="p">,</span> <span class="n">img_cols</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">x_train</span> <span class="o">=</span> <span class="n">x_train</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">x_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">img_rows</span><span class="p">,</span> <span class="n">img_cols</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">x_test</span> <span class="o">=</span> <span class="n">x_test</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="n">x_test</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">img_rows</span><span class="p">,</span> <span class="n">img_cols</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">input_shape</span> <span class="o">=</span> <span class="p">(</span><span class="n">img_rows</span><span class="p">,</span> <span class="n">img_cols</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="c1">#sub-sample of test data to improve training speed. Comment out
#if you want to train on full dataset.
</span><span class="n">x_train</span> <span class="o">=</span> <span class="n">x_train</span><span class="p">[:</span><span class="mi">20000</span><span class="p">,:,:,:]</span>
<span class="n">y_train</span> <span class="o">=</span> <span class="n">y_train</span><span class="p">[:</span><span class="mi">20000</span><span class="p">]</span>
<span class="c1">#normalise the images and double check the shape and size of the image data
</span><span class="n">x_train</span> <span class="o">=</span> <span class="n">x_train</span><span class="p">.</span><span class="n">astype</span><span class="p">(</span><span class="s">'float32'</span><span class="p">)</span>
<span class="n">x_test</span> <span class="o">=</span> <span class="n">x_test</span><span class="p">.</span><span class="n">astype</span><span class="p">(</span><span class="s">'float32'</span><span class="p">)</span>
<span class="n">x_train</span> <span class="o">/=</span> <span class="mi">255</span>
<span class="n">x_test</span> <span class="o">/=</span> <span class="mi">255</span>
<span class="k">print</span><span class="p">(</span><span class="s">'x_train shape:'</span><span class="p">,</span> <span class="n">x_train</span><span class="p">.</span><span class="n">shape</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">x_train</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="s">'train samples'</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="n">x_test</span><span class="p">.</span><span class="n">shape</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="s">'test samples'</span><span class="p">)</span>
<span class="c1"># convert class vectors to binary class matrices
</span><span class="n">y_test_inds</span> <span class="o">=</span> <span class="n">y_test</span><span class="p">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">y_train_inds</span> <span class="o">=</span> <span class="n">y_train</span><span class="p">.</span><span class="n">copy</span><span class="p">()</span>
<span class="n">y_train</span> <span class="o">=</span> <span class="n">keras</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">to_categorical</span><span class="p">(</span><span class="n">y_train</span><span class="p">,</span> <span class="n">nb_classes</span><span class="p">)</span>
<span class="n">y_test</span> <span class="o">=</span> <span class="n">keras</span><span class="p">.</span><span class="n">utils</span><span class="p">.</span><span class="n">to_categorical</span><span class="p">(</span><span class="n">y_test</span><span class="p">,</span> <span class="n">nb_classes</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>('x_train shape:', (20000, 28, 28, 1))
(20000, 'train samples')
(10000, 'test samples')
</code></pre></div></div>
<h3 id="tricks-to-avoid-overfitting">Tricks to avoid overfitting</h3>
<p>20000 data-points isn’t a huge amount for the size of the models we’re considering.</p>
<ul>
<li>One trick to avoid overfitting is to use <a href="http://jmlr.org/papers/v15/srivastava14a.html">drop-out</a>. This is where a weight is randomly assigned zero with a given probability to avoid the model becoming too dependent on a small number of weights.</li>
<li>We can also consider <a href="https://en.wikipedia.org/wiki/Tikhonov_regularization">ridge</a> or <a href="https://en.wikipedia.org/wiki/Lasso_%28statistics%29">LASSO</a> regularisation as a way of trimming down the dependency and effective number of parameters.</li>
<li><a href="https://en.wikipedia.org/wiki/Early_stopping">Early stopping</a> and <a href="https://arxiv.org/abs/1502.03167">Batch Normalisation</a> are other strategies to help control over-fitting.</li>
</ul>
<h3 id="constructing-the-model">Constructing the model</h3>
<p>The model we’ll be using for classification will be a simple one convolutional layer plus one fully connected layer convolutional neural network. This is probably the simplest convolutional neural network that could be constructed so it’ll be interesting to see how it performs. We also introduce dropout between the two layers as our preferred method of avoiding overfitting.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#Create sequential convolutional multi-layer perceptron with max pooling and dropout
#uncomment layers below to produce a more accurate score (in the interest of time we use a shallower model)
</span><span class="n">model</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">()</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="n">nb_filters</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span>
<span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span>
<span class="n">input_shape</span><span class="o">=</span><span class="n">input_shape</span><span class="p">))</span>
<span class="c1">#model.add(Conv2D(64, (3, 3), activation='relu'))
</span><span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">MaxPooling2D</span><span class="p">(</span><span class="n">pool_size</span><span class="o">=</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span> <span class="mi">2</span><span class="p">)))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.25</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Flatten</span><span class="p">())</span>
<span class="c1">#model.add(Dense(128, activation='relu'))
</span><span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dropout</span><span class="p">(</span><span class="mf">0.5</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Dense</span><span class="p">(</span><span class="n">nb_classes</span><span class="p">,</span> <span class="n">activation</span><span class="o">=</span><span class="s">'softmax'</span><span class="p">))</span>
<span class="n">model</span><span class="p">.</span><span class="nb">compile</span><span class="p">(</span><span class="n">loss</span><span class="o">=</span><span class="n">keras</span><span class="p">.</span><span class="n">losses</span><span class="p">.</span><span class="n">categorical_crossentropy</span><span class="p">,</span>
<span class="n">optimizer</span><span class="o">=</span><span class="n">keras</span><span class="p">.</span><span class="n">optimizers</span><span class="p">.</span><span class="n">Adam</span><span class="p">(),</span>
<span class="n">metrics</span><span class="o">=</span><span class="p">[</span><span class="s">'accuracy'</span><span class="p">])</span>
</code></pre></div></div>
<p>let’s see what we’ve constructed layer by layer. This is useful for checking the shapes of each layer are what you expect.
Note that in the first layer the images are now 26x26. This is due to the convolution avoiding going over the edge of the
image and so chopping off the outer border of the image.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span><span class="p">.</span><span class="n">summary</span><span class="p">()</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 26, 26, 32) 320
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 13, 13, 32) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 5408) 0
_________________________________________________________________
dropout_2 (Dropout) (None, 5408) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 54090
=================================================================
Total params: 54,410.0
Trainable params: 54,410.0
Non-trainable params: 0.0
_________________________________________________________________
</code></pre></div></div>
<h3 id="model-fitting">Model fitting</h3>
<p>We can now fit the model to the data. We provide the batch size, number of epochs as well as the validation data. We also want the output to be verbose so we’re able to see how the log-loss and accuracy in both the test and validation set changes at the end of each epoch.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">model</span><span class="p">.</span><span class="n">fit</span><span class="p">(</span><span class="n">x_train</span><span class="p">,</span> <span class="n">y_train</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">epochs</span><span class="o">=</span><span class="n">nb_epoch</span><span class="p">,</span>
<span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">,</span> <span class="n">validation_data</span><span class="o">=</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">))</span>
<span class="n">score</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">evaluate</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">y_test</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Test loss:'</span><span class="p">,</span> <span class="n">score</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="k">print</span><span class="p">(</span><span class="s">'Test accuracy:'</span><span class="p">,</span> <span class="n">score</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Train on 20000 samples, validate on 10000 samples
Epoch 1/6
20000/20000 [==============================] - 16s - loss: 0.7113 - acc: 0.8046 - val_loss: 0.2958 - val_acc: 0.9171
Epoch 2/6
20000/20000 [==============================] - 15s - loss: 0.3009 - acc: 0.9114 - val_loss: 0.2093 - val_acc: 0.9425
Epoch 3/6
20000/20000 [==============================] - 15s - loss: 0.2325 - acc: 0.9317 - val_loss: 0.1689 - val_acc: 0.9548
Epoch 4/6
20000/20000 [==============================] - 16s - loss: 0.1853 - acc: 0.9460 - val_loss: 0.1385 - val_acc: 0.9620
Epoch 5/6
20000/20000 [==============================] - 15s - loss: 0.1610 - acc: 0.9524 - val_loss: 0.1216 - val_acc: 0.9660
Epoch 6/6
20000/20000 [==============================] - 15s - loss: 0.1451 - acc: 0.9571 - val_loss: 0.1103 - val_acc: 0.9685
('Test loss:', 0.11029711169451475)
('Test accuracy:', 0.96850000000000003)
</code></pre></div></div>
<h2 id="results">Results</h2>
<p>Let’s take a random digit example to find out how confident the model is at classifying the correct category</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#choose a random data from test set and show probabilities for each class.
</span><span class="n">i</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="nb">len</span><span class="p">(</span><span class="n">x_test</span><span class="p">))</span>
<span class="n">digit</span> <span class="o">=</span> <span class="n">x_test</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">reshape</span><span class="p">(</span><span class="mi">28</span><span class="p">,</span><span class="mi">28</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">();</span>
<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Example of digit: {}'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">y_test_inds</span><span class="p">[</span><span class="n">i</span><span class="p">]));</span>
<span class="n">plt</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">digit</span><span class="p">,</span><span class="n">cmap</span><span class="o">=</span><span class="s">'gray'</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">);</span>
<span class="n">probs</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">digit</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">28</span><span class="p">,</span><span class="mi">28</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span><span class="n">batch_size</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Probabilities for each digit class'</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">bar</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span><span class="n">probs</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span><span class="n">align</span><span class="o">=</span><span class="s">'center'</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">xticks</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">10</span><span class="p">).</span><span class="n">astype</span><span class="p">(</span><span class="nb">str</span><span class="p">));</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1/1 [==============================] - 0s
</code></pre></div></div>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/conv_intro/output_21_1.png" alt="png" /></p>
<h3 id="wrong-predictions">Wrong predictions</h3>
<p>Let’s look more closely at the predictions on the test data that weren’t correct</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">predictions</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">predict_classes</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 9792/10000 [============================>.] - ETA: 0s
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">inds</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">predictions</span><span class="p">))</span>
<span class="n">wrong_results</span> <span class="o">=</span> <span class="n">inds</span><span class="p">[</span><span class="n">y_test_inds</span><span class="o">!=</span><span class="n">predictions</span><span class="p">]</span>
</code></pre></div></div>
<h3 id="example-of-an-incorrectly-labelled-digit">Example of an incorrectly labelled digit</h3>
<p>We’ll choose randomly from the test set a digit that was incorrectly labelled and then plot the probabilities predicted
for each class. We find that for an incorrectly labelled digit, the probabilities are in general lower and more spread between
classes than for a correctly labelled digit.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#choose a random wrong result from the test set
</span><span class="n">i</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="nb">len</span><span class="p">(</span><span class="n">wrong_results</span><span class="p">))</span>
<span class="n">i</span> <span class="o">=</span> <span class="n">wrong_results</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="n">digit</span> <span class="o">=</span> <span class="n">x_test</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">reshape</span><span class="p">(</span><span class="mi">28</span><span class="p">,</span><span class="mi">28</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">();</span>
<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Digit {}'</span><span class="p">.</span><span class="nb">format</span><span class="p">(</span><span class="n">y_test_inds</span><span class="p">[</span><span class="n">i</span><span class="p">]));</span>
<span class="n">plt</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">digit</span><span class="p">,</span><span class="n">cmap</span><span class="o">=</span><span class="s">'gray'</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">);</span>
<span class="n">probs</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">digit</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">28</span><span class="p">,</span><span class="mi">28</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span><span class="n">batch_size</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">2</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Digit classification probability'</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">bar</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span><span class="n">probs</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span><span class="n">align</span><span class="o">=</span><span class="s">'center'</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">xticks</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">10</span><span class="p">),</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="mi">10</span><span class="p">).</span><span class="n">astype</span><span class="p">(</span><span class="nb">str</span><span class="p">));</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1/1 [==============================] - 0s
</code></pre></div></div>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/conv_intro/output_26_1.png" alt="png" /></p>
<h3 id="comparison-between-incorrectly-labelled-digits-and-all-digits">Comparison between incorrectly labelled digits and all digits</h3>
<p>It seems like for the example digit the prediction is a lot less confident when it’s wrong. Is this always the case? Let’s
look at this by examining the maximum probability in any category for all digits that are incorrectly labelled.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">prediction_probs</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">predict_proba</span><span class="p">(</span><span class="n">x_test</span><span class="p">,</span> <span class="n">batch_size</span><span class="o">=</span><span class="mi">32</span><span class="p">,</span> <span class="n">verbose</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> 9856/10000 [============================>.] - ETA: 0s
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">wrong_probs</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">prediction_probs</span><span class="p">[</span><span class="n">ind</span><span class="p">][</span><span class="n">digit</span><span class="p">]</span> <span class="k">for</span> <span class="n">ind</span><span class="p">,</span><span class="n">digit</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">wrong_results</span><span class="p">,</span><span class="n">predictions</span><span class="p">[</span><span class="n">wrong_results</span><span class="p">])])</span>
<span class="n">all_probs</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">([</span><span class="n">prediction_probs</span><span class="p">[</span><span class="n">ind</span><span class="p">][</span><span class="n">digit</span><span class="p">]</span> <span class="k">for</span> <span class="n">ind</span><span class="p">,</span><span class="n">digit</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">arange</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="n">predictions</span><span class="p">)),</span><span class="n">predictions</span><span class="p">)])</span>
<span class="c1">#plot as histogram
</span><span class="n">plt</span><span class="p">.</span><span class="n">hist</span><span class="p">(</span><span class="n">wrong_probs</span><span class="p">,</span><span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span><span class="n">normed</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span><span class="n">label</span><span class="o">=</span><span class="s">'wrongly-labeled'</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">hist</span><span class="p">(</span><span class="n">all_probs</span><span class="p">,</span><span class="n">alpha</span><span class="o">=</span><span class="mf">0.5</span><span class="p">,</span><span class="n">normed</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span><span class="n">label</span><span class="o">=</span><span class="s">'all labels'</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">legend</span><span class="p">();</span>
<span class="n">plt</span><span class="p">.</span><span class="n">title</span><span class="p">(</span><span class="s">'Comparison between wrong and correctly classified labels'</span><span class="p">);</span>
<span class="n">plt</span><span class="p">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s">'highest probability'</span><span class="p">);</span>
</code></pre></div></div>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/conv_intro/output_29_0.png" alt="png" /></p>
<p>It appears in general that when a digit is wrongly labelled, the model provides it with a lower probability than when it’s correctly labelled. We would expect these two groups to become more separate as the model accuracy increases.</p>
<h3 id="whats-been-fitted-">What’s been fitted ?</h3>
<p>Let’s look at the convolutional layer of the model and the kernels that have been learnt. First we’ll check the dimensions of the first layer to see what we need to extract.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">print</span> <span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">layers</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">get_weights</span><span class="p">()[</span><span class="mi">0</span><span class="p">].</span><span class="n">shape</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(3, 3, 1, 32)
</code></pre></div></div>
<p>Now let’s visualise the learnt filters. Remember each of these filters are convolved with the image in order to produce a set of filtered images that can be used for classification.</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">weights</span> <span class="o">=</span> <span class="n">model</span><span class="p">.</span><span class="n">layers</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="n">get_weights</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">nb_filters</span><span class="p">):</span>
<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">6</span><span class="p">,</span><span class="mi">6</span><span class="p">,</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">weights</span><span class="p">[:,:,</span><span class="mi">0</span><span class="p">,</span><span class="n">i</span><span class="p">],</span><span class="n">cmap</span><span class="o">=</span><span class="s">'gray'</span><span class="p">,</span><span class="n">interpolation</span><span class="o">=</span><span class="s">'none'</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">);</span>
</code></pre></div></div>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/conv_intro/output_34_0.png" alt="png" /></p>
<h3 id="visualising-intermediate-layers-in-the-cnn">Visualising intermediate layers in the CNN</h3>
<p>In order to visualise the activations half-way through the CNN and have some sense of what these convolutional kernels do to the input we need to create a new model with the same structure as before, but with the final layers missing. We then give it the weights it had previously and then predict on a given input. We now have a model that provides as output the convolved input passed through the activation for each of the learnt filters (32 all together).</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#Create new sequential model, same as before but just keep the convolutional layer.
</span><span class="n">model_new</span> <span class="o">=</span> <span class="n">Sequential</span><span class="p">()</span>
<span class="n">model_new</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">Conv2D</span><span class="p">(</span><span class="n">nb_filters</span><span class="p">,</span> <span class="n">kernel_size</span><span class="o">=</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">3</span><span class="p">),</span>
<span class="n">activation</span><span class="o">=</span><span class="s">'relu'</span><span class="p">,</span>
<span class="n">input_shape</span><span class="o">=</span><span class="n">input_shape</span><span class="p">))</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#set weights for new model from weights trained on MNIST.
</span><span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1</span><span class="p">):</span>
<span class="n">model_new</span><span class="p">.</span><span class="n">layers</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">set_weights</span><span class="p">(</span><span class="n">model</span><span class="p">.</span><span class="n">layers</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">get_weights</span><span class="p">())</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#pick a random digit and "predict" on this digit (output will be first layer of CNN)
</span><span class="n">i</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">random</span><span class="p">.</span><span class="n">randint</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="nb">len</span><span class="p">(</span><span class="n">x_test</span><span class="p">))</span>
<span class="n">digit</span> <span class="o">=</span> <span class="n">x_test</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">reshape</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">28</span><span class="p">,</span><span class="mi">28</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">pred</span> <span class="o">=</span> <span class="n">model_new</span><span class="p">.</span><span class="n">predict</span><span class="p">(</span><span class="n">digit</span><span class="p">)</span>
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#check shape of prediction
</span><span class="k">print</span> <span class="n">pred</span><span class="p">.</span><span class="n">shape</span>
</code></pre></div></div>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(1, 26, 26, 32)
</code></pre></div></div>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#For all the filters, plot the output of the input
</span><span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">18</span><span class="p">,</span><span class="mi">18</span><span class="p">))</span>
<span class="n">filts</span> <span class="o">=</span> <span class="n">pred</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">nb_filters</span><span class="p">):</span>
<span class="n">filter_digit</span> <span class="o">=</span> <span class="n">filts</span><span class="p">[:,:,</span><span class="n">i</span><span class="p">]</span>
<span class="n">plt</span><span class="p">.</span><span class="n">subplot</span><span class="p">(</span><span class="mi">6</span><span class="p">,</span><span class="mi">6</span><span class="p">,</span><span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">filter_digit</span><span class="p">,</span><span class="n">cmap</span><span class="o">=</span><span class="s">'gray'</span><span class="p">);</span> <span class="n">plt</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">);</span>
</code></pre></div></div>
<p><img class="center-block img-responsive" src="https://sempwn.github.io/img/conv_intro/output_40_0.png" alt="png" /></p>
<p>The filters pick out a lot of details from the image including horizontal and vertical lines as well as edges and potentially the terminal points of lines. We’ve only created one convolutional layer in our model. The real power comes when these convolutional layers are stacked together, creating a mechanism by which more general filters can be learnt.</p>Preamble