Who learns better Bayesian network structures: Accuracy and speed of structure learning algorithms

Abstract

Three classes of algorithms to learn the structure of Bayesian networks from data are common in the literature: constraint-based algorithms, which use conditional independence tests to learn the dependence structure of the data; score-based algorithms, which use goodness-of-fit scores as objective functions to maximise; and hybrid algorithms that combine both approaches. Constraint-based and score-based algorithms have been shown to learn the same structures when conditional independence and goodness of fit are both assessed using entropy and the topological ordering of the network is known (Cowell, 2001). In this paper, we investigate how these three classes of algorithms perform outside the assumptions above in terms of speed and accuracy of network reconstruction for both discrete and Gaussian Bayesian networks. We approach this question by recognising that structure learning is defined by the combination of a statistical criterion and an algorithm that determines how the criterion is applied to the data. Removing the confounding effect of different choices for the statistical criterion, we find using both simulated and real-world complex data that constraint-based algorithms are often less accurate than score-based algorithms, but are seldom faster (even at large sample sizes); and that hybrid algorithms are neither faster nor more accurate than constraint-based algorithms. This suggests that commonly held beliefs on structure learning in the literature are strongly influenced by the choice of particular statistical criteria rather than just by the properties of the algorithms themselves.


Notes

Background and notation

Set of random variablesassociated to nodes of a directed acyclic graph (DAG). We indicate withthe set of arcs of. Graphical separation inconditional independence between the respective variables. As a result the following factorization hold

indicate the set of parameters of the global distribution of,. The global distribution decomposes in one local distribution for each(with parameters) conditional on its parents.

The DAGdoes not uniquely identify a Bayesian Network (BN).

A v-structure in a BN is a pattern of arcs like.

An equivalence class of BN is defined by the same

  • underlying undirected graph and
  • v-structures

Gaussian Bayesian Networks (GBNs) (5) assume that theΒ are univariate normal random variables linked by linear dependencies to their parents,

in what is essentially a linear regression model of Xi against thewith regression coefficients. is then multivariate normal, and we generally assume that its covariance matrixis positive definite. Equivalently [6], we can consider the precision matrixand parametrizewith the partial correlations

betweenand each parentgiven the rest, since


Climate data (skin temperature) network case-study

The climate case study from the document involves modeling global surface temperature anomalies using Bayesian networks (BNs). The study investigates the dependencies, including both local and long-range (teleconnected) spatial dependencies, within climate data. Below is a breakdown of the procedure and the algorithms used:

Data Preparation

  1. Data Source:

β€’ Monthly surface temperature values on a global 10Β° grid (approx. 1000 km resolution) from the NCEP/NCAR reanalysis for the period 1981–2010.

  1. Preprocessing:

β€’ Calculated anomalies by removing the mean annual cycle from raw temperature data, month by month, over the 30-year period. β€’ Represented each grid point as a node in the BN, resulting in 648 nodes (18 latitude Γ— 36 longitude).

Modeling Approach

  1. Assumptions:

β€’ The temperature at each grid point follows a Gaussian distribution. β€’ BNs model spatial dependencies, where nodes represent grid points, and edges encode dependencies.

  1. Algorithm Choices:

β€’ Compared constraint-based, score-based, and hybrid structure learning algorithms. β€’ Used extended Bayesian Information Criterion () for flexibility in enforcing sparsity in networks.

Structure Learning Algorithms Used

  1. Constraint-Based Algorithms:

β€’ PC-Stable β€’ Grow-Shrink (GS)

  1. Score-Based Algorithms:

β€’ Tabu Search β€’ Hill Climbing (HC)

  1. Hybrid Algorithms:

β€’ Max-Min Hill Climbing (MMHC) β€’ H2PC

Adjustments for Complex Data

Recognized that constraint-based algorithms (e.g., PC-Stable, GS) struggle with complex climate data due to:

β€’ High connectivity in locally dense regions.

β€’ Conflict in arc directions leading to invalid CPDAGs.

β€’ Introducedto enforce sparsity and address issues:

β€’ Regularization coefficient for penalizing the number of arcs.

Evaluation Metrics

  1. Accuracy:

β€’ Log-likelihood of the learned BN. β€’ Analysis of long-distance arcs (teleconnections) and their suitability for inference. β€’ Conditional dependence structure using unshielded v-structures.

  1. Speed:

β€’ Measured by the number of calls to the statistical criterion.

  1. Inference Validation:

β€’ Tested propagation of El NiΓ±o-like evidence (e.g., high tropical Pacific temperatures) and its effect on regional probabilities.

Key Observations

  1. Constraint-Based Algorithms:

β€’ Best for small, sparse networks. β€’ Often fail to produce valid CPDAGs in dense, complex data.

  1. Score-Based Algorithms (Tabu Search and HC):

β€’ Learned large, dense networks capturing both local and teleconnected dependencies. β€’ Performed better in propagating evidence and capturing global climatic phenomena.

  1. Hybrid Algorithms (MMHC, H2PC):

β€’ Balanced speed and accuracy for moderately dense networks. β€’ Struggled to match the performance of score-based algorithms in larger networks.

Findings

β€’ Only score-based algorithms effectively modeled complex data with teleconnections, crucial for understanding global climate variability. β€’ Constraint-based algorithms performed well in small-scale scenarios but failed to generalize to dense, real-world networks. β€’ The study underscored the importance of algorithm selection and parameter tuning for complex spatial data modeling.


Background

Bayesian Networks (BNs)

Bayesian networks (BNs) are graphical models representing the joint probability distribution over a set of random variables. The structure is defined by:

  • A Directed Acyclic Graph (DAG)where each node corresponds to a variable.
  • Conditional independence relationships between variables encoded by.

The joint probability distribution factorizes as:

where: -: Set of parent nodes for. -: Parameters associated with’s conditional distribution.


Structure Learning

Structure learning involves finding the DAGthat best explains the observed data . It is typically decomposed into two tasks:

  1. Structure Learning: Find the DAGencoding dependencies among variables.
  2. Parameter Learning: Estimate parametersgiven the learned structure.

Bayes’ theorem splits this as:

where: -: Posterior probability of structure. -: Posterior probability of parameters.


Bayesian Information Criterion (BIC)

The Bayesian Information Criterion (BIC) is a score function commonly used to evaluate a BN structure. It approximates the log-marginal likelihood as:

where: -: Sample size. -: Number of parameters for’s conditional distribution.


Extended BIC ()

To handle complex data with sparse networks, an extended version of BIC is used, incorporating a regularization term to penalize the number of parameters:

where is a regularization coefficient.


Conditional Independence Tests

For Discrete BNs

The conditional independence between two variablesand, given, is tested using:

  1. G-Test (Log-Likelihood Ratio Test):

where : Observed frequencies of,, and.

  1. Pearson’s Chi-Square Test:

with.

For Gaussian BNs

Tests rely on partial correlation coefficients:

  • Gaussian Mutual Information Test:

Structural Hamming Distance (SHD)

The Structural Hamming Distance (SHD) measures the difference between the learned structureand a reference structure. It is the count of:

  • Missing edges.
  • Extra edges.
  • Incorrectly oriented edges.

Simulation Metrics

  1. Goodness of Fit:

    • Log-likelihood of the data given the learned structure.
  2. Speed:

    • Measured by the number of calls to the scoring or testing criterion.
  3. Inference Validation:

    • Propagating evidence in the learned BN to validate teleconnections and other dependencies.

Climate Case Study Adjustments

Problem with Constraint-Based Algorithms

Constraint-based algorithms like PC-Stable struggle with dense, locally connected regions, leading to:

  1. Conflicting arc directions.
  2. Invalid CPDAGs (cannot be converted into valid DAGs).

Introduction of

To address this, the regularizedis used with the independence test criterion adjusted as:


This mathematical foundation supports structure learning for climate modeling, enabling better handling of complex data.