Usage

using Random
Random.seed!(0) # seed the random number generator to 0, for a reproducible demonstration
using BayesNets
using TikzGraphs # required to plot tex-formatted graphs (recommended), otherwise GraphPlot.jl is used
using TikzPictures

Representation

Bayesian Networks are represented with the BayesNet type. This type contains the directed acyclic graph (a LightTables.DiGraph) and a list of conditional probability distributions (a list of CPDs). Here we construct the BayesNet $a \rightarrow b$, with Gaussians $a$ and $b$:

\[a = \mathcal{N}(0,1) \qquad b = \mathcal{N}(2a +3,1)\]

bn = BayesNet()
push!(bn, StaticCPD(:a, Normal(1.0)))
push!(bn, LinearGaussianCPD(:b, [:a], [2.0], 3.0, 1.0))
plot = BayesNets.plot(bn)
TikzPictures.save(SVG("plot1"), plot)

Conditional Probability Distributions

Conditional Probability Distributions, $P(x_i \mid \text{parents}(x_i))$, are defined in BayesNets.CPDs. Each CPD knows its own name, the names of its parents, and is associated with a distribution from Distributions.jl.

`CPDForm`	Description
`StaticCPD`	Any `Distributions.distribution`; independent of any parents
`FunctionalCPD`	Allows for a CPD defined with a custom eval function
`ParentFunctionalCPD`	Modification to `FunctionalCPD` allowing the parent values to be passed in
`CategoricalCPD`	Categorical distribution, assumes integer parents in $1:N$
`LinearGaussianCPD`	Linear Gaussian, assumes target and parents are numeric
`ConditionalLinearGaussianCPD`	A linear Gaussian for each discrete parent instantiation

Each CPD can be learned from data using fit. Here we learn the same network as above.

a = randn(100)
b = randn(100) .+ 2*a .+ 3

data = DataFrame(a=a, b=b)
cpdA = fit(StaticCPD{Normal}, data, :a)
cpdB = fit(LinearGaussianCPD, data, :b, [:a])

bn2 = BayesNet([cpdA, cpdB])

Each CPD implements four functions:

name(cpd) - obtain the name of the variable target variable
parents(cpd) - obtain the list of parents
nparams(cpd - obtain the number of free parameters in the CPD
cpd(assignment) - allows calling cpd() to obtain the conditional distribution
Distributions.fit(Type{CPD}, data, target, parents)

cpdB(:a=>0.5)

Normal{Float64}(μ=4.163989765940806, σ=2.3368053800537045)

Several functions conveniently condition and then produce their return values:

rand(cpdB, :a=>0.5) # condition and then sample
pdf(cpdB, :a=>1.0, :b=>3.0) # condition and then compute pdf(distribution, 3)
logpdf(cpdB, :a=>1.0, :b=>3.0) # condition and then compute logpdf(distribution, 3);

The NamedCategorical distribution allows for String or Symbol return values. The FunctionalCPD allows for crafting quick and simple CPDs:

bn2 = BayesNet()
push!(bn2, StaticCPD(:sighted, NamedCategorical([:bird, :plane, :superman], [0.40, 0.55, 0.05])))
push!(bn2, FunctionalCPD{Bernoulli}(:happy, [:sighted], a->Bernoulli(a == :superman ? 0.95 : 0.2)))

Variables can be removed by name using delete!. A warning will be issued when removing a CPD with children.

delete!(bn2, :happy)

Likelihood

A Bayesian Network represents a joint probability distribution, $P(x_1, x_2, \ldots, x_n)$. Assignments are represented as dictionaries mapping variable names (Symbols) to variable values. We can evaluate probabilities as we would with Distributions.jl, only we use exclamation points as we modify the internal state when we condition:

pdf(bn, :a=>0.5, :b=>2.0) # evaluate the probability density

0.01900834726778591

We can also evaluate the likelihood of a dataset:

data = DataFrame(a=[0.5,1.0,2.0], b=[4.0,5.0,7.0])
pdf(bn, data)    #  0.00215
logpdf(bn, data) # -6.1386;

Or the likelihood for a particular cpd:

pdf(cpdB, data)    #  0.006
logpdf(cpdB, data) # -5.201

-186.09098034539505

Sampling

Assignments can be sampled from a BayesNet.

rand(bn)

Dict{Symbol, Any} with 2 entries:
  :a => 2.34707
  :b => 7.92928

rand(bn, 5)

5×2 DataFrame

Row	a	b
	Float64	Float64
1	2.01047	7.05883
2	1.32894	5.24559
3	2.40162	6.82519
4	-0.602496	0.985195
5	1.03993	3.68806

In general, sampling can be done according to rand(BayesNet, BayesNetSampler, nsamples) to produce a table of samples, rand(BayesNet, BayesNetSampler) to produce a single Assignment, or rand!(Assignment, BayesNet, BayesNetSampler) to modify an assignment in-place. New samplers need only implement rand!. The functions above default to the DirectSampler, which samples the variables in topographical order.

Rejection sampling can be used to draw samples that are consistent with a provided assignment:

bn = BayesNet()
push!(bn, StaticCPD(:a, Categorical([0.3,0.7])))
push!(bn, StaticCPD(:b, Categorical([0.6,0.4])))
push!(bn, CategoricalCPD{Bernoulli}(:c, [:a, :b], [2,2], [Bernoulli(0.1), Bernoulli(0.2), Bernoulli(1.0), Bernoulli(0.4)]))

rand(bn, RejectionSampler(:c=>1), 5)

One can also use weighted sampling:

rand(bn, LikelihoodWeightedSampler(:c=>1), 5)

One can also use Gibbs sampling. More options are available than are shown in the example below.

bn_gibbs = BayesNet()
push!(bn_gibbs, StaticCPD(:a, Categorical([0.999,0.001])))
push!(bn_gibbs, StaticCPD(:b, Normal(1.0)))
push!(bn_gibbs, LinearGaussianCPD(:c, [:a, :b], [3.0, 1.0], 0.0, 1.0))

evidence = Assignment(:c => 10.0)
initial_sample = Assignment(:a => 1, :b => 1, :c => 10.0)
gsampler = GibbsSampler(evidence, burn_in=500, thinning=1, initial_sample=initial_sample)
rand(bn_gibbs, gsampler, 5)

Parameter Learning

BayesNets.jl supports parameter learning for an entire graph.

fit(BayesNet, data, (:a=>:b), [StaticCPD{Normal}, LinearGaussianCPD])

fit(BayesNet, data, (:a=>:b), LinearGaussianCPD)

Fitting can be done for specific BayesNets types as well:

data = DataFrame(c=[1,1,1,1,2,2,2,2,3,3,3,3],
b=[1,1,1,2,2,2,2,1,1,2,1,1],
a=[1,1,1,2,1,1,2,1,1,2,1,1])

bn5 = fit(DiscreteBayesNet, data, (:a=>:b, :a=>:c, :b=>:c))

Fitting a DiscreteCPD, which is a CategoricalCPD{Categorical}, can be done with a specified number of categories. This prevents cases where your test data does not provide an example for every category.

cpd = fit(DiscreteCPD, DataFrame(a=[1,2,1,2,2]), :a, ncategories=3);
cpd = fit(DiscreteCPD, data, :b, [:a], parental_ncategories=[3], target_ncategories=3);

Inference

Inference methods for discrete Bayesian networks can be used via the infer method:

bn = DiscreteBayesNet()
push!(bn, DiscreteCPD(:a, [0.3,0.7]))
push!(bn, DiscreteCPD(:b, [0.2,0.8]))
push!(bn, DiscreteCPD(:c, [:a, :b], [2,2],
        [Categorical([0.1,0.9]),
         Categorical([0.2,0.8]),
         Categorical([1.0,0.0]),
         Categorical([0.4,0.6]),
        ]))

ϕ = infer(bn, :c, evidence=Assignment(:b=>1))

2×2 DataFrame

Row	c	potential
	Int64	Float64
1	1	0.17
2	2	0.83

Several inference methods are available. Exact inference is the default.

`Inference Method`	Description
`ExactInference`	Performs exact inference using discrete factors and variable elimination
`LikelihoodWeightingInference`	Approximates p(query \ evidence) with N weighted samples using likelihood weighted sampling
`LoopyBelief`	The loopy belief propagation algorithm
`GibbsSamplingNodewise`	Gibbs sampling where each iteration changes one node
`GibbsSamplingFull`	Gibbs sampling where each iteration changes all nodes

ϕ = infer(GibbsSamplingNodewise(), bn, [:a, :b], evidence=Assignment(:c=>2))

Inference produces a Factor type. It can be converted to a DataFrame.

convert(DataFrame, ϕ)

Structure Learning

Structure learning can be done as well.

using Discretizers
using RDatasets
iris = dataset("datasets", "iris")
names(iris)
data = DataFrame(
    SepalLength = iris[!,:SepalLength],
    SepalWidth = iris[!,:SepalWidth],
    PetalLength = iris[!,:PetalLength],
    PetalWidth = iris[!,:PetalWidth],
    Species = encode(CategoricalDiscretizer(iris[!,:Species]), iris[!,:Species]),
)

data[1:3,:] # only display a subset...

3×5 DataFrame

Row	SepalLength	SepalWidth	PetalLength	PetalWidth	Species
	Float64	Float64	Float64	Float64	Int64
1	5.1	3.5	1.4	0.2	1
2	4.9	3.0	1.4	0.2	1
3	4.7	3.2	1.3	0.2	1

Here we use the K2 structure learning algorithm which runs in polynomial time but requires that we specify a topological node ordering.

parameters = K2GraphSearch([:Species, :SepalLength, :SepalWidth, :PetalLength, :PetalWidth],
                       ConditionalLinearGaussianCPD,
                       max_n_parents=2)
bn = fit(BayesNet, data, parameters)

CPD types can also be specified per-node. Note that complete CPD definitions are required - simply using StaticCPD is insufficient as you need the target distribution type as well, as in StaticCPD{Categorical}.

Changing the ordering will change the structure.

CLG = ConditionalLinearGaussianCPD
parameters = K2GraphSearch([:Species, :PetalLength, :PetalWidth, :SepalLength, :SepalWidth],
                        [StaticCPD{Categorical}, CLG, CLG, CLG, CLG],
                        max_n_parents=2)
fit(BayesNet, data, parameters)

A ScoringFunction allows for extracting a scoring metric for a CPD given data. The negative BIC score is implemented in NegativeBayesianInformationCriterion.

A GraphSearchStrategy defines a structure learning algorithm. The K2 algorithm is defined through K2GraphSearch and GreedyHillClimbing is implemented for discrete Bayesian networks and the Bayesian score:

data = DataFrame(c=[1,1,1,1,2,2,2,2,3,3,3,3],
                 b=[1,1,1,2,2,2,2,1,1,2,1,1],
                 a=[1,1,1,2,1,1,2,1,1,2,1,1])
parameters = GreedyHillClimbing(ScoreComponentCache(data), max_n_parents=3, prior=UniformPrior())
bn = fit(DiscreteBayesNet, data, parameters)

We can specify the number of categories for each variable in case it cannot be correctly inferred:

bn = fit(DiscreteBayesNet, data, parameters, ncategories=[3,3,2])

A whole suite of features are supported for DiscreteBayesNets. Here, we illustrate the following:

Obtain a list of counts for a node
Obtain sufficient statistics from a discrete dataset
Obtain the factor table for a node
Obtain a factor table matching a particular assignment

We also detail obtaining a Bayesian score for a network structure in the next section.

count(bn, :a, data)

2×2 DataFrame

Row	a	count
	Int64	Int64
1	1	9
2	2	3

statistics(bn.dag, data)

3-element Vector{Matrix{Int64}}:
 [4; 4; 4;;]
 [3 1 3; 1 3 1]
 [3 1 … 2 0; 0 0 … 1 1]

table(bn, :b)

6×3 DataFrame

Row	a	b	potential
	Int64	Int64	Float64
1	1	1	0.666667
2	2	1	0.166667
3	1	2	0.25
4	2	2	0.666667
5	1	3	0.0833333
6	2	3	0.166667

table(bn, :c, :a=>1)

9×4 DataFrame

Row	b	a	c	potential
	Int64	Int64	Int64	Float64
1	1	1	1	0.4
2	2	1	1	0.2
3	3	1	1	0.333333
4	1	1	2	0.2
5	2	1	2	0.6
6	3	1	2	0.333333
7	1	1	3	0.4
8	2	1	3	0.2
9	3	1	3	0.333333

Reading from XDSL

Discrete Bayesian Networks can be read from the .XDSL file format.

bn = readxdsl(joinpath(dirname(pathof(BayesNets)), "..", "test", "sample_bn.xdsl"))

Bayesian Score for a Network Structure

The Bayesian score for a discrete-valued BayesNet can be calculated based only on the structure and data (the CPDs do not need to be defined beforehand). This is implemented with a method of bayesian_score that takes in a directed graph, the names of the nodes and data.

data = DataFrame(c=[1,1,1,1,2,2,2,2,3,3,3,3],
                 b=[1,1,1,2,2,2,2,1,1,2,1,1],
                 a=[1,1,1,2,1,1,2,1,1,2,1,1])
g = DAG(3)
add_edge!(g,1,2); add_edge!(g,2,3); add_edge!(g,1,3)
bayesian_score(g, [:a,:b,:c], data)

-29.642240688899513