AIAny - LOCUS v1.0

Local ordinances shape everyday regulation (zoning, housing, licensing, nuisance) yet are fragmented and hard to analyze at scale; LOCUS v1.0 makes that layer of law machine-observable by assembling a county-harmonized, chunk-level corpus of U.S. municipal and county ordinance text annotated for downstream legal-NLP tasks.

Key Findings

Scale and scope: ~2,211,516 text chunks derived from municipal and county codes, with coverage metadata that links chunks to jurisdiction, state, city, and county.
Annotation schema: Each chunk is assigned a function label (Context, Rules, Process, Enforcement), a binary is_substantive flag, and for substantive chunks a coarse topic (Buildings, Business, Nuisance, Zoning, Other). The release also includes continuous scores for opacity, paternalism, enforcement discretion, and problem salience to support analytic tasks beyond simple classification.
Practical tradeoffs: The dataset uses OCR and automated classifiers to scale labeling across thousands of jurisdictions; this yields broad geographic reach but introduces label noise, taxonomy coarseness, and uneven jurisdictional digitization.

Who it's for and tradeoffs

Great fit if you need a large, jurisdiction-linked corpus to prototype legal-text classifiers, build substantive vs. non-substantive filters, or run comparative studies of municipal regulation. Look elsewhere if you require a fully audited legal source, exhaustive coverage of every U.S. locality, or fine-grained legal subject-matter labeling; LOCUS v1.0 is a snapshot and its function/topic labels are model-assigned and not fully human-validated.

Where it fits

LOCUS is best used as research infrastructure for legal NLP pipelines, weakly supervised workflows, and empirical policy analysis that benefit from reproducible coverage metadata and scalable annotations. For production legal advice or litigation-grade proof, human legal review and up-to-date jurisdictional checks remain necessary.

LOCUS v1.0

Introduction

Key Findings

Who it's for and tradeoffs

Where it fits

Information

Categories

Tags

More Items

olmOCR-bench

Vāgdhenu — Sanskrit Chant Corpus

AFTER