Crowdsourcing park accessibility info

BACKGROUND

BC Parks runs a dedicated accessibility microsite for visitors with mobility, vision, and other access needs. The data behind it was collected carefully: park students went out, photographed facilities, logged what they found, and published it. The problem is that parks change and staff capacity is limited. There was no mechanism to keep the information current, so it gradually became unreliable. By the time we looked at it, visitors couldn’t trust what they were reading.

This is a common shape for government data problems. A one-time effort produces something useful, followed by years of slow decay. The question we wanted to answer: could AI and crowdsourcing replace the periodic manual update cycle with something that stays current on its own?

HYPOTHESIS

Visitors who experience park accessibility firsthand are the best possible source of current information. If they could submit feedback in plain language, and an AI model could extract structured accessibility features from that text, the data could stay fresh without requiring a content team to maintain it. A semantic search layer on top would let future visitors describe what they need in their own words and still get relevant results, even when their phrasing doesn’t match how a feature was tagged.

APPROACH

We built a mobile-first web app with two user flows: one for searching accessible parks and one for submitting new accessibility observations.

The core pipeline worked like this. A visitor submits a plain-language review using their email address as the only identifier. The OpenAI API processes that text, extracts relevant accessibility tags, and sends them to a staff review queue before anything is published. Once approved, tags are stored in PostgreSQL via a Django REST backend and become searchable. Future visitors query in plain language and get matched results through semantic search, even when their exact words don’t appear in any review.

The frontend ran on Vue.js with Quasar. Everything was containerized with Docker and deployed to AWS through a CI/CD pipeline that pushed updates automatically on merge to main.

The technically interesting part was figuring out how to generate useful embeddings from short, unstructured user comments. The central question was whether to embed the full comment as a single vector or split it into sentences and embed each one separately. We tested both. Sentence-level embeddings felt like the right instinct but performed worse in practice, partly because short comments don’t split cleanly and partly because the overhead of embedding every sentence added latency. We also experimented with how to represent the accessibility features themselves, and found that embedding a concise feature name outperformed embedding a longer description.

The approach we settled on: one embedding per comment, compared against embeddings for each feature name, returning the top five matches. That number wasn’t arbitrary. Looking at historical comment data, most submissions ran around five sentences, meaning a single comment typically contained around five distinct observations. Matching that with five extracted tags hit the right balance between coverage and accuracy without introducing the performance cost of sentence-level processing. It’s a good example of where the simpler approach won, but only after testing the more complex one first.

TAKEAWAYS

A staff review step is what makes crowdsourced government data defensible. We didn’t auto-publish submissions. Every review went through a staff approval queue before surfacing publicly. That decision matters more than it might seem. It means the AI is doing triage and extraction, not making publishing decisions, which is the right division of labor for a government information service where bad data has real consequences for people planning trips around accessibility needs. A full role-based permissions system was out of scope for this prototype but is a straightforward next integration for a production deployment.

Simpler embedding strategies outperformed more complex ones. Our initial instinct was to split user comments into individual sentences and embed each one separately, giving us finer-grained matching against accessibility features. It performed worse. The overhead was real, short comments don’t split cleanly, and the accuracy didn’t justify the latency. What worked better was a single embedding per comment compared against concise feature name embeddings, returning the top five matches. That number came from the data: most submissions were around five sentences long and contained roughly five distinct observations. The lesson isn’t that sentence-level embeddings are always wrong. It’s that you need to test the sophisticated approach before assuming it wins.

Semantic search is the right layer for citizen-generated content. When the public contributes information and the public also searches for it, there’s always a vocabulary gap. One person writes “wheelchair accessible,” another writes “easy for my dad with his walker,” another writes “completely flat.” A keyword search treats those as three different things. Semantic search treats them as variations on the same thing. For any system where you can’t control how contributors write, that gap is a real problem and AI closes it without requiring standardized input.

Government data problems are often maintenance problems in disguise. The information existed, was collected carefully, and was published. What was missing was any ongoing mechanism to keep it current. This pattern shows up constantly in public sector digital services: a well-executed initial build followed by years of slow data decay. AI-assisted crowdsourcing shifts the model from periodic manual updates to continuous passive ones. That’s a meaningfully different operational posture, and worth taking seriously when scoping any long-running information service. WCAG compliance was out of scope for this prototype but would be a baseline requirement for any production release, not an afterthought.

CODE

Want to try it yourself? The full source code is on GitHub.