Mining Local Priorities: What German County Websites Reveal
Moritz Schütz, Lukas Kriesch, and Sebastian Losacker propose a fresh way to see how local governments set priorities: read what they publish online, at scale. In an open-access study for Regional Science Policy & Practice, they scrape and analyze German county and municipal websites, then use modern NLP to map the themes those institutions emphasize. The result is a replicable, data-rich window into place-based strategy.
The team assembled start URLs for all German counties and most municipalities from official statistical lists, then ran an exhaustive crawl in November 2023. From the HTML, they extracted and cleaned paragraphs to create a large corpus suitable for topic modeling across the full breadth of local functions and services. Their transformer-based approach (BERTopic with sentence embeddings, UMAP, and HDBSCAN) identified 231 initial clusters and a final set of 205 coherent topics, of which 30 prominent themes are showcased to illustrate coverage—from nature conservation and water regulation to integration services and digital infrastructure.
To demonstrate what this reveals in practice, the paper dives into three cross-cutting themes frequently central to regional development: Urban Development and Planning, Climate Protection Initiatives, and Business Development and Support. Each shows distinct spatial signatures. Urban development content is most concentrated in denser areas and rises with population density. Climate protection is more evenly distributed, with notable strength in southwestern Germany and little correlation with density. Business support appears broadly but is the least concentrated and also shows no density link.
Beyond counts, the authors show that counties frame identical topics differently—some emphasize long-term formal planning or regulatory compliance, others lean into participation, social capital, or service delivery narratives. That discursive variation matters: it’s not only what local governments prioritize but how they present priorities to residents and partners. The dataset thus supports comparative research on the styles and strategies of local governance—compliance-driven, participatory, sustainability-oriented, or growth-focused.
Methodologically, the study argues that “text as data” can complement interviews and surveys by offering scalable, timely indicators of public priorities. Still, topic models are maps, not ground truth: paragraphs can be multi-thematic, websites reflect curated communication rather than implementation, and deeper inference requires close reading and triangulation. To enable such work, the authors release an aggregated county-level dataset tied to URLs for transparency and reuse.
In sum, Schütz, Kriesch, and Losacker expand the regional researcher’s toolkit with a reproducible pipeline that transforms government web text into structured insights about local strategy and variation across space—a promising base for longitudinal tracking and targeted, theory-driven analyses in future work.
Disclaimer: This summary is provided for general information only and reflects the cited paper at a high level. It may omit nuances and updates. Please consult the original publication for complete methods, findings, and limitations.