Epidemiology · Network Analysis · R

Transmission Network
& Superspreading
Analysis of COVID-19

Network analysis and statistical modeling of KCDC contact tracing data across 15 South Korean regions to identify superspreaders and quantify transmission potential.

View on GitHub Methodology

R₀ = 0.65

Basic Reproduction No.

k = 0.18

Dispersion Parameter

Korean Regions

35 days

Study Period

Background

A small number of cases were responsible for a disproportionate number of secondary infections.

Understanding transmission dynamics and superspreading potential of COVID-19 is critical for effective epidemic intervention and preparedness. This study investigates who spreads the virus the most — and why regional context matters.

Objective

Identify characteristics of individuals who contributed disproportionately to virus spread by investigating superspreading potential using network analysis and statistical modeling.

Data

Korea Disease Control and Prevention Agency (KCDC) contact tracing records, May 2–June 5, 2020. Key variables: source of infection (SOI), region, onset/quarantine dates, cluster name.

Conclusion

Targeting superspreaders through enhanced contact tracing and quarantine strategies — tailored by regional characteristics — could significantly reduce COVID-19 transmission.

Methods

Four-stage analytical pipeline.

Implemented entirely in R using igraph for network construction and MASS for statistical distribution fitting.

Data Preparation

Linelist Cleaning

Filtered to relevant columns (id, region, vaccination status, onset/quarantine dates, cluster, source of infection). Stripped hashtag prefixes from SOI identifiers to enable numeric matching.

→ tidyverse · readxl

Network Construction

Directed Transmission Graph

Built a directed igraph network where each node is a case and each edge represents a confirmed transmission event (SOI → case id). Loops and duplicate edges removed via simplification.

→ igraph · graph_from_data_frame

Regional Analysis

Subgraph Metrics

Induced subgraphs for each of 15 Korean regions. Per-region metrics: edge density, average path length, clustering coefficient, degree/betweenness/closeness centrality.

→ 15 regions · 6 metrics

Statistical Modeling

Offspring Distribution

Extracted out-degree distribution (secondary cases per index case). Fitted negative binomial distribution via maximum likelihood to estimate R₀ (μ) and dispersion parameter k. 95% CIs computed analytically.

→ MASS::fitdistr · NegBin

Results

Significant overdispersion — superspreaders drive transmission.

Basic Reproduction Number

0.65

95% CI: 0.45 — 0.86

R₀ < 1 during the study period, indicating that during May–June 2020 the South Korean outbreak was being suppressed. Each infected person on average transmitted to less than one other person.

Dispersion Parameter

0.18

95% CI: 0.12 — 0.25

Low k (< 0.2) indicates strong overdispersion — most individuals infect zero others, while a rare few cause large clusters. This is characteristic of superspreading dynamics.

Offspring Distribution

Negative Binomial Fit

The out-degree distribution (secondary cases per index) was fit to a negative binomial using maximum likelihood via MASS::fitdistr(). The negative binomial outperformed Poisson, confirming overdispersion.

Regional Highlights

Incheon vs Busan — two distinct superspreading patterns

Incheon Cross-regional spread

2.09

Betweenness Centrality

High betweenness centrality indicates Incheon superspreaders acted as network bridges — spreading the virus to multiple other regions. Suggests Incheon as a transmission hub with inter-regional connectivity.

Busan Within-region spread

0.17

Network Density

High local network density indicates Busan had tightly clustered transmission — a large proportion of possible connections within the region were realized, pointing to contained but intense local outbreaks.

Region Density Betweenness Note

Gyeonggi — — Largest province

Seoul — — Capital

Incheon — 2.09 Cross-regional spread

Busan 0.17 — Within-region spread

Daegu — —

Gwangju — —

Daejeon — —

Gyeongnam — —

Gyeongbuk — —

Jeonbuk — —

Jeonnam — —

Chungnam — —

Chungju — —

Gangwon — —

Jeju — —

Highlighted rows: statistically notable values. Full metrics in the R analysis output.

Open Source

Full R code available on GitHub.

Complete analysis pipeline: data cleaning, network construction, regional subgraph analysis, negative binomial fitting, and confidence interval visualization.

View Repository

Transmission Network & Superspreading Analysis of COVID-19

Four-stage analytical pipeline.

Linelist Cleaning

Directed Transmission Graph

Subgraph Metrics

Offspring Distribution

Significant overdispersion — superspreaders drive transmission.

Incheon vs Busan — two distinct superspreading patterns

Full R code available on GitHub.

Transmission Network
& Superspreading
Analysis of COVID-19