Your codebase is a graph

BACKGROUND

Static analysis tools already detect dead code, god classes, and cyclic dependencies. They’re good at it. So when we started looking at knowledge graphs as a way to analyze code quality, the question wasn’t whether we could detect these things. It was whether modeling a codebase as a graph gives you a different, and potentially more useful, way to get there.

A knowledge graph captures the relational structure of code natively. Every module, class, function, and method becomes a node. Every import, call, and inheritance relationship becomes an edge. That structure is already implicit in your codebase. The question is whether making it explicit, and queryable, changes what you can do with it.

Code-Graph-RAG is an open source tool that does exactly this: it parses a codebase with Tree-sitter and stores the result in Memgraph, a graph database. The original purpose is retrieval-augmented generation, letting you ask natural language questions about your code. But we wanted to know if the underlying graph could also be a source of meaningful quality metrics.

HYPOTHESIS

If a codebase is modeled as a graph, structural problems should be detectable through graph queries. Things like cyclic dependencies (cycles in the graph), god classes (nodes with too many outgoing edges), and dead code (nodes with no incoming edges) are all graph properties. The question was whether we could build a set of metrics from those queries that were actually useful to developers, not just technically correct.

APPROACH

We built a Python analyzer that runs graph queries against the Memgraph instance and outputs timestamped JSON. That feeds a React dashboard. We ended up tracking 21 metrics across 8 categories: cyclic dependencies, god classes, inheritance quality, dead code, coupling and cohesion, size distribution, documentation, and graph connectivity. Each metric has severity thresholds that roll up into an overall quality score.

The technical setup was straightforward. The interesting work was figuring out what to measure and how to avoid drowning developers in false positives.

Dead code detection is a good example. Our first approach flagged any function with no incoming CALLS relationships. Simple query, massive false positive rate. Decorators, closures, and factory functions all looked “dead” because they’re passed or returned rather than called directly. We tried adding regex exclusion patterns for naming conventions like callback_ and handler_, but that required constant tuning. We tried only flagging functions if their parent was also unused, but that still failed for nested functions where the parent is active.

The solution that actually worked was conservative: exclude all nested functions from dead code detection entirely. A function defined inside another function gets a pass. Yes, this means we miss some genuinely dead nested functions. But it’s better to under-report than to flood developers with noise about their decorator patterns. We added a separate metric tracking how many nested functions were excluded so users know what’s being filtered.

We also added a Project root node early on, which turned out to be important. It let multiple codebases share a single Memgraph instance without metrics bleeding across projects. Every query filters by project name. And rather than reporting raw counts, each metric includes the actual names involved, so the dashboard can show expandable drill-downs instead of just numbers.

TAKEAWAYS

The graph isn’t the only way to find these problems, but it may be the most legible. Static analysis tools can flag god classes and dead code. What the graph adds is a visual, queryable representation of why those problems exist and how they connect to each other. A god class in a static analysis report is a number. In a graph, you can see exactly what it depends on, what depends on it, and how removing it would change the structure. That context is what turns a finding into a decision.

The privacy story is better than it looks. Your source code never leaves your infrastructure. The parser runs locally, updates a Memgraph instance you control, and the only thing that touches an LLM is the query layer, converting natural language into graph queries. We ran Memgraph locally for development and on our own AWS environment for cloud deployment. For teams worried about sensitive codebases, that’s a meaningful distinction from tools that send your code to an external API. Nothing proprietary moves.

This is most useful when you don’t already know the codebase. The graph makes dependency structure visible without requiring someone to already understand the system. For a team inheriting legacy code, the visual structure does orientation work that would otherwise fall on whoever wrote the code, if they’re still around. You can see the shape of the problem before you understand the details.

It fits inside an agile sprint, not just a one-time audit. The metrics update continuously as the codebase changes. A development team can track whether the structural problems they identified in sprint planning are actually getting better, rather than running a code quality check once and filing the report. The dashboard becomes part of the sprint rhythm rather than a separate exercise.

Being conservative with metrics is a feature, not a limitation. We deliberately chose to report less rather than flood developers with false positives. A dashboard full of noise gets ignored. The goal was a small set of findings the team would actually act on. That’s a harder design problem than it sounds, and it’s where a lot of automated tooling falls down in practice.

The 21 metrics are a starting point, not a ceiling. The real bet here is that a queryable graph of your codebase opens up questions that traditional static analysis wasn’t designed to answer. Things like how knowledge is distributed across a team based on who owns which nodes, how a proposed refactor would ripple through the dependency structure before you write a line of code, or how two codebases compare structurally before a merger or integration. We haven’t built all of that yet. But the graph makes those questions answerable in a way that scanning text files doesn’t.