COVID-19 Network Analysis
Epidemiology Network Analysis R · igraph · MASS

Technical Notes: Transmission Network & Superspreading Analysis

Hosung Kim

USC Marshall MSBA · AI Systems & Data Science

COVID-19 data visualization on a dark background

This page documents the analytical approach behind the COVID-19 transmission network study — from raw KCDC contact tracing data to a fitted negative binomial offspring distribution and regional network metrics.

The Core Question

During the early COVID-19 epidemic in South Korea, a recurring observation across global outbreaks was that transmission was not uniformly distributed. A small fraction of infected individuals appeared responsible for the majority of onward transmission events — the so-called superspreader effect.

The key question: is this overdispersion statistically significant, and does it vary by region in ways that inform targeted public health response?

Network Construction

The KCDC linelist contained a source of infection (SOI) field for each case — the ID of the individual who transmitted the virus. This field directly encodes the transmission graph: each case is a node, each SOI→case pair is a directed edge.

# R: Build directed transmission graph

prior_graph <- graph_from_data_frame(

total_cl_linelist |>

mutate(SOI_n = as.numeric(SOI)) |>

select(SOI_n, id) |>

rename(from = SOI_n, to = id),

directed = TRUE

)

prior_graph <- igraph::simplify(prior_graph)

After simplification (removing self-loops and duplicate edges), the network represented a clean directed transmission graph across all study cases.

Regional Subgraph Analysis

South Korean cases spanned 15 provinces and metropolitan cities. For each region, an induced subgraph was constructed from the vertices belonging to that region. Six metrics were computed per region:

Edge Density Proportion of possible edges that exist — measures how tightly connected local transmission clusters are
Avg Path Length Mean shortest path between all reachable node pairs — indicates transmission chain depth
Clustering Coefficient Probability that two contacts of a node are also connected — measures local clustering
Degree Centrality Average number of direct transmission links per node
Betweenness Centrality How often a node lies on the shortest path between others — identifies transmission bridges
Closeness Centrality How quickly a node can reach all others — proxy for epidemic potential

Offspring Distribution & NegBin Fit

The out-degree of each node (number of people a case directly infected) is the empirical offspring distribution. Fitting a negative binomial to this distribution yields two interpretable parameters:

μ (mu)

0.65 [0.45–0.86]

Mean of the distribution — estimates R₀, the basic reproduction number

k (size)

0.18 [0.12–0.25]

Dispersion parameter — low k indicates high overdispersion and stronger superspreading

# R: Fit negative binomial, compute 95% CIs

offspring_dist <- degree(prior_graph, mode = "out")

fit <- MASS::fitdistr(offspring_dist, "negative binomial")

# CI for mu (R0) — normal approximation

ci_mu <- fit$estimate["mu"] + c(qnorm(0.025), qnorm(0.975)) * fit$sd["mu"]

# CI for k — chi-squared method

df <- length(offspring_dist) - 1

ci_k <- df * fit$estimate["size"] / c(qchisq(0.975, df), qchisq(0.025, df))

Key Findings

1.

R₀ below 1 — epidemic under control.

The estimated R₀ of 0.65 confirms that during May–June 2020, South Korea had successfully reduced transmission below the epidemic threshold. This aligns with the aggressive contact tracing and quarantine policies deployed by KCDC.

2.

Strong overdispersion (k = 0.18).

A dispersion parameter below 0.2 indicates most infected individuals do not transmit to anyone, while a small minority generate outsized clusters. This pattern, observed in SARS-CoV-1 and MERS as well, justifies targeting super-spreading events rather than average-case interventions.

3.

Regional heterogeneity in transmission mode.

Incheon's high betweenness centrality (2.09) indicates inter-regional bridge transmission — infected individuals who spread to geographically distinct clusters. Busan's high density (0.17) indicates concentrated local outbreaks. These require different intervention strategies.

HK

Hosung Kim

MSBA student at USC Marshall School of Business, focused on AI systems and Data Science. This analysis was conducted as part of epidemiological research at the One Health Lab.

@HosungKim48