This page documents the analytical approach behind the COVID-19 transmission network study — from raw KCDC contact tracing data to a fitted negative binomial offspring distribution and regional network metrics.
The Core Question
During the early COVID-19 epidemic in South Korea, a recurring observation across global outbreaks was that transmission was not uniformly distributed. A small fraction of infected individuals appeared responsible for the majority of onward transmission events — the so-called superspreader effect.
The key question: is this overdispersion statistically significant, and does it vary by region in ways that inform targeted public health response?
Network Construction
The KCDC linelist contained a source of infection (SOI) field for each case — the ID of the individual who transmitted the virus. This field directly encodes the transmission graph: each case is a node, each SOI→case pair is a directed edge.
# R: Build directed transmission graph
prior_graph <- graph_from_data_frame(
total_cl_linelist |>
mutate(SOI_n = as.numeric(SOI)) |>
select(SOI_n, id) |>
rename(from = SOI_n, to = id),
directed = TRUE
)
prior_graph <- igraph::simplify(prior_graph)
After simplification (removing self-loops and duplicate edges), the network represented a clean directed transmission graph across all study cases.
Regional Subgraph Analysis
South Korean cases spanned 15 provinces and metropolitan cities. For each region, an induced subgraph was constructed from the vertices belonging to that region. Six metrics were computed per region:
Offspring Distribution & NegBin Fit
The out-degree of each node (number of people a case directly infected) is the empirical offspring distribution. Fitting a negative binomial to this distribution yields two interpretable parameters:
μ (mu)
0.65 [0.45–0.86]
Mean of the distribution — estimates R₀, the basic reproduction number
k (size)
0.18 [0.12–0.25]
Dispersion parameter — low k indicates high overdispersion and stronger superspreading
# R: Fit negative binomial, compute 95% CIs
offspring_dist <- degree(prior_graph, mode = "out")
fit <- MASS::fitdistr(offspring_dist, "negative binomial")
# CI for mu (R0) — normal approximation
ci_mu <- fit$estimate["mu"] + c(qnorm(0.025), qnorm(0.975)) * fit$sd["mu"]
# CI for k — chi-squared method
df <- length(offspring_dist) - 1
ci_k <- df * fit$estimate["size"] / c(qchisq(0.975, df), qchisq(0.025, df))
Key Findings
R₀ below 1 — epidemic under control.
The estimated R₀ of 0.65 confirms that during May–June 2020, South Korea had successfully reduced transmission below the epidemic threshold. This aligns with the aggressive contact tracing and quarantine policies deployed by KCDC.
Strong overdispersion (k = 0.18).
A dispersion parameter below 0.2 indicates most infected individuals do not transmit to anyone, while a small minority generate outsized clusters. This pattern, observed in SARS-CoV-1 and MERS as well, justifies targeting super-spreading events rather than average-case interventions.
Regional heterogeneity in transmission mode.
Incheon's high betweenness centrality (2.09) indicates inter-regional bridge transmission — infected individuals who spread to geographically distinct clusters. Busan's high density (0.17) indicates concentrated local outbreaks. These require different intervention strategies.
Hosung Kim
MSBA student at USC Marshall School of Business, focused on AI systems and Data Science. This analysis was conducted as part of epidemiological research at the One Health Lab.
@HosungKim48