Transmission Network
& Superspreading
Analysis of COVID-19
Network analysis and statistical modeling of KCDC contact tracing data across 15 South Korean regions to identify superspreaders and quantify transmission potential.
Background
A small number of cases were responsible for a disproportionate number of secondary infections.
Understanding transmission dynamics and superspreading potential of COVID-19 is critical for effective epidemic intervention and preparedness. This study investigates who spreads the virus the most — and why regional context matters.
Objective
Identify characteristics of individuals who contributed disproportionately to virus spread by investigating superspreading potential using network analysis and statistical modeling.
Data
Korea Disease Control and Prevention Agency (KCDC) contact tracing records, May 2–June 5, 2020. Key variables: source of infection (SOI), region, onset/quarantine dates, cluster name.
Conclusion
Targeting superspreaders through enhanced contact tracing and quarantine strategies — tailored by regional characteristics — could significantly reduce COVID-19 transmission.
Methods
Four-stage analytical pipeline.
Implemented entirely in R using igraph for network construction and MASS for statistical distribution fitting.
Data Preparation
Linelist Cleaning
Filtered to relevant columns (id, region, vaccination status, onset/quarantine dates, cluster, source of infection). Stripped hashtag prefixes from SOI identifiers to enable numeric matching.
Network Construction
Directed Transmission Graph
Built a directed igraph network where each node is a case and each edge represents a confirmed transmission event (SOI → case id). Loops and duplicate edges removed via simplification.
Regional Analysis
Subgraph Metrics
Induced subgraphs for each of 15 Korean regions. Per-region metrics: edge density, average path length, clustering coefficient, degree/betweenness/closeness centrality.
Statistical Modeling
Offspring Distribution
Extracted out-degree distribution (secondary cases per index case). Fitted negative binomial distribution via maximum likelihood to estimate R₀ (μ) and dispersion parameter k. 95% CIs computed analytically.
Results
Significant overdispersion — superspreaders drive transmission.
Basic Reproduction Number
95% CI: 0.45 — 0.86
R₀ < 1 during the study period, indicating that during May–June 2020 the South Korean outbreak was being suppressed. Each infected person on average transmitted to less than one other person.
Dispersion Parameter
95% CI: 0.12 — 0.25
Low k (< 0.2) indicates strong overdispersion — most individuals infect zero others, while a rare few cause large clusters. This is characteristic of superspreading dynamics.
Offspring Distribution
The out-degree distribution (secondary cases per index) was fit to a negative binomial using maximum likelihood via MASS::fitdistr(). The negative binomial outperformed Poisson, confirming overdispersion.
Regional Highlights
Incheon vs Busan — two distinct superspreading patterns
2.09
Betweenness Centrality
High betweenness centrality indicates Incheon superspreaders acted as network bridges — spreading the virus to multiple other regions. Suggests Incheon as a transmission hub with inter-regional connectivity.
0.17
Network Density
High local network density indicates Busan had tightly clustered transmission — a large proportion of possible connections within the region were realized, pointing to contained but intense local outbreaks.
Highlighted rows: statistically notable values. Full metrics in the R analysis output.
Open Source
Full R code available on GitHub.
Complete analysis pipeline: data cleaning, network construction, regional subgraph analysis, negative binomial fitting, and confidence interval visualization.