AI Doesn't Have a Data Problem. It Has a Geography Problem.

Why foundation models struggle with physical-world decisions, and what spatial intelligence and digital twins reveal about AI's next bottleneck

For a decade, the AI industry has run on one assumption: every limitation is a scale problem. Hallucinations, brittleness, narrow domain performance. The fix has always been the same prescription at higher doses. More parameters. More tokens. More synthetic data to fill the gaps the internet didn't cover.

That prescription worked remarkably well for language. It is starting to fail, quietly and specifically, for anything that touches the physical world.

The failure doesn't look like a crash. It looks like a model that is statistically correct and operationally wrong. An output that passes every benchmark the team built, then fails the moment it meets a road network, a flood plain, or a warehouse floor.

The industry keeps treating this as a data gap. It is closer to a missing primitive. Geography itself never made it into the stack, and "more data" is not the same fix for a missing primitive that it is for a thin dataset.

The Scaling Assumption Doesn't Travel Well

Scaling laws are real. But they describe a relationship between compute, data, and loss on a specific class of problems: prediction over sequences, where the right answer is a statistical regularity buried in enough examples. Text is forgiving in this way. So, increasingly, is image and video, where the underlying signal is dense and homogeneous enough that more examples reliably mean better coverage.

Geography doesn't behave like that. Spatial relationships aren't just another feature column to add to a bigger training set. They encode constraints (physical, regulatory, infrastructural) that don't average out across more data points.

Two ZIP codes can carry nearly identical income, density, and demographic profiles and still produce wildly different business outcomes. One sits on a highway off-ramp. The other sits behind a one-way grid with no left-turn access. No amount of additional demographic data resolves that gap, because the variable that mattered was never demographic. It was topological.

A fair objection here: isn't geography just another kind of data? Yes, in the trivial sense that coordinates and road networks are data too. But the title's claim is narrower and more useful than "AI lacks information." The actual failure is structural: most architectures treat location as an attribute sitting in a row next to income and square footage, when it functions as a relationship between rows. A model can have abundant geographic data and still misuse it, the same way a spreadsheet full of phone numbers isn't a phone network. The deeper problem is representation, not volume, and geography is simply where that representation gap shows up first and most expensively.

This is the part most AI commentary skips. The conversation about model limitations runs almost entirely on language, images, video, and structured tabular data. Location, distance, connectivity, and accessibility rarely appear as first-class subjects, even though they govern outcomes in logistics, retail, healthcare, utilities, insurance, and construction. These sectors collectively represent a large share of global GDP and almost none of the model architecture investment.

Statistically Right, Geographically Wrong

The cleanest way to see the gap is to look at cases where a model's output is technically defensible and still wrong in a way that only shows up in the field. Call it the pattern most AI commentary misses: a system can be statistically right and geographically wrong at the same time, and nothing in standard evaluation catches the difference.

Retail site selection. A model trained on demographic and foot-traffic data can correctly identify a location with strong household income, dense population, and high category spend.

It can still miss that the site has no left-turn access from the dominant traffic direction. Or that the nearest competitor controls the only signed intersection within half a mile. Or that parking turns over once every four hours instead of the twelve times a comparable site achieves.

The demographic model isn't wrong. It's answering a question that was never the binding constraint.

Last-mile logistics. This is the best-documented version of the problem, because a major retailer has published the data. In Amazon's last-mile routing research, the company found that drivers frequently deviate from planned delivery routes because of their tacit knowledge of the road and curbside infrastructure, customer availability, and other characteristics of the respective service areas.

The dataset behind this finding spans over six thousand actual driver trajectories from five major US cities. The headline result: models trained to predict what experienced human drivers actually do significantly outperform traditional optimization-based approaches that only minimize distance or time.

The optimizer was solving the wrong problem. The driver's mental map of curb cuts, loading zones, and one-way exceptions encoded information the routing algorithm never had access to.

Healthcare access. Demand-forecasting models can correctly predict where patient volume will rise. They routinely miss that the relevant variable isn't distance to the nearest facility but travel time under realistic conditions, which can differ by an order of magnitude between a community with a direct arterial connection and one separated from the same facility by a river crossing that floods in heavy rain.

In each case, the error doesn't show up in cross-validation. It shows up in deployment, which is exactly why it's underweighted in how the industry evaluates its own progress.

The Hidden Constraint: Geographic Relationships Don't Average

Here is the underlying reason more data doesn't fix this. Most statistical learning assumes observations are exchangeable, or close enough to it, that adding more of them improves the estimate. Spatial data routinely violates that assumption.

Geographers have a name for why. Tobler's First Law of Geography, formulated in 1970, states it plainly: everything is related to everything else, but near things are more related than distant things. A location's outcome depends on its relationship to its neighbors, not just its own attributes. That relationship is mediated by physical structure a feature vector at the location level simply doesn't capture.

This is also an architecture mismatch, not just a missing variable. Transformers, the backbone of most modern AI, are built to model sequences: token follows token, sentence follows sentence. Physical systems aren't sequences. They're graphs. A road network, a utility grid, a watershed, a supply chain: each is a set of nodes and edges where what matters is adjacency and flow, not order. Feeding graph-shaped problems into sequence-shaped architectures is part of why the failures above keep recurring. It's not that the model saw too little data. It's that the data was poured into a container built for a different geometry.

Two markets with near-identical input statistics can diverge sharply for exactly this reason: an urban infill site with through-traffic, and a suburban outparcel with the same demographics but a single ingress point. Flood-prone and stable parcels with similar elevation profiles can carry very different insurance loss histories, since watershed topology, not just ground elevation, determines where water actually moves. The missing variable in each pair isn't more data about the location. It's the structure of the location's relationship to everything around it: a graph, not a row.

Zoom out and a pattern emerges across the entire history of the field. AI's progress so far has come in three overlapping eras, each defined by what kind of structure the dominant architecture could represent.

Era one was language intelligence. Transformers learned to model sequences, and the sequence turned out to be a remarkably general container: sentences, code, even protein chains.

Era two was multimodal intelligence. The same sequence-modeling trick got extended to images and video by treating pixels and frames as sequences too, with enough success that it looked, briefly, like sequences could absorb almost anything.

Era three, the one just beginning, is spatial intelligence. Place, topology, and physical structure don't compress into sequences without real loss, which is exactly why digital twins, geo-foundation models, and graph-based architectures are emerging now rather than five years ago. They are the field building a new container for a kind of structure the old one couldn't hold.

A Second-Order Effect Few People Talk About: Digital Twins Are Quietly Exposing This

The clearest evidence that the industry is bumping into this constraint, even if it hasn't named it yet, is the maturation of digital twin technology.

NASEM's working definition of a digital twin describes it as a collection of virtual information constructs that replicate the structure, context, and behavior of a natural, engineered, or social system, continuously updated with data from its physical counterpart, with predictive capabilities that support decision-making. Strip the formality from that and the load-bearing word is "structure." A digital twin that doesn't model spatial structure isn't a digital twin. It's a 3D model with a dashboard attached.

A real deployment makes the distinction obvious. Germany's peer-reviewed Herrenberg urban digital twin case study combines a 3D model of the built environment with a street network model, an urban mobility simulation, and a wind flow simulation to manage a historic core that's strongly affected by car traffic and emissions. None of that is a data-volume problem. It's a problem where the model has to know what's next to what, what flows where, and what physically constrains what, and no amount of additional non-spatial data substitutes for that knowledge.

Digital twins are forcing the industry to confront, project by project, what pure data-scaling roadmaps have been able to ignore: you cannot simulate a physical system without first representing its geography correctly, and most AI tooling still treats geography as metadata rather than structure.

Why This Trend Is Easier to Predict Than It Appears

Skeptics will point out that "spatial AI" has existed for decades inside GIS software, and that's correct. The difference now is that the foundation-model paradigm, the thing that made language and vision tractable at scale, is being applied to location itself for the first time, and the early results are a useful signal of where the binding constraint actually sits.

Google Research's Population Dynamics Foundation Model is a useful case study. It generates compact location embeddings by ingesting regional data including search trends, points of interest, mobility patterns, weather, and air quality, then processes these features through a graph neural network to produce a rich embedding for each location.

The notable design choice isn't the data volume. It's the graph structure, because relationships between locations are the thing being modeled, not just the attributes of each location in isolation. Since release, over two hundred organizations have tested the embeddings, with the dataset expanding from the US to the UK, Australia, Japan, Canada, and beyond.

A second example, Google's AlphaEarth Foundations, takes a parallel approach for the physical landscape itself. Researchers describe it as a "virtual satellite" capable of characterizing the Earth's surface and its dynamics in unprecedented detail, integrating optical imagery, radar, LiDAR, and climate variables into a unified representation. The architecture choice that matters here is the same one: spatial relationship is structural to the model, not bolted on as a coordinate pair.

Neither of these is a market-sizing story. They're an architecture story: the field is quietly rebuilding the foundation-model approach around graphs and relationships instead of sequences, because that's what physical systems actually are.

Where This Argument Could Be Wrong

Three pushbacks are worth taking seriously rather than waving off.

Geo-foundation models could simply absorb this problem the way scale absorbed earlier ones. If graph-structured location embeddings become a commodity API layer, as PDFM's trajectory suggests, the distinction this piece draws between "data" and "structure" could collapse back into a data problem, just at a different layer of the stack. That's a real possibility, and it would be the optimistic outcome: the gap closes because the industry builds the missing primitive, not because the argument was wrong about the primitive being missing.

Plenty of physical-world AI already works fine without explicit spatial structure. Demand forecasting, fraud detection, and dynamic pricing often perform well using tabular features that happen to correlate with location, without ever modeling adjacency directly. The cases in this piece are real, but they're drawn from domains where spatial topology is the dominant constraint. They shouldn't be read as a universal claim that every physical-world model is secretly broken.

The "data versus geography" framing is, admittedly, a simplification. Geography is data. The more precise claim is that AI has a representation problem, and physical-world deployments are where that representation problem becomes visible and expensive first, because spatial relationships violate the exchangeability assumption baked into most architectures more obviously than other relational domains do. The title is a hook, not a literal taxonomy, and it's worth saying so plainly.

The Next AI Race Won't Be Decided by Model Size

The first decade of the modern AI race was a parameters-and-compute contest, and it's largely been won by whoever could afford the largest training runs. Compute still matters, and foundation models aren't going away. But the next contest is best understood as orthogonal to model scale rather than a replacement for it: spatial intelligence layers on top of large models rather than competing with them for the same budget line. The bottleneck for physical-world applications increasingly isn't a bigger model. It's whether a system can represent where things are, how they relate, and what physically constrains them, which requires a different kind of infrastructure: spatial graphs, geo-foundation models, and digital twins that are accurate about structure, not just rich in data.

Organizations sitting on physical-asset data, utilities, logistics networks, construction portfolios, telecom infrastructure, insurance books tied to real property, hold a quietly underpriced advantage here. Their data was never going to be the bottleneck. Their spatial modeling capability is.

Three Forecasts, Held to Account

Prediction 1: Geo-foundation models become a standard layer in enterprise GeoAI stacks by 2028. Probability estimate: 65%, a reasoned judgment based on current deployment pace, not a market forecast. Why it may happen: Early movers already have multi-country pilots and hundreds of testing organizations within roughly a year of release, and cloud providers have an incentive to commoditize spatial embeddings the way they commoditized language embeddings. Headwinds: Spatial data is fragmented across incompatible regional standards and licensing regimes, and enterprise procurement cycles for infrastructure-adjacent tooling are slow. Leading indicator to watch: Whether a major cloud provider ships geo-embeddings as a default, billed-by-the-call API rather than a research preview.

Prediction 2: "Spatial blindness" becomes a named, budgeted risk category in enterprise AI governance by 2029. Probability estimate: 45%, lower confidence since this depends on a few high-visibility operational failures becoming public, which is harder to forecast than adoption curves. Why it may happen: As digital twins and autonomous systems get deployed in higher-stakes physical contexts, a model being statistically correct but geographically wrong becomes a liability event, not just an inefficiency. The mechanism would most likely run through insurance underwriters first, since they already price geographic risk and have the clearest incentive to formalize a new failure category, with audit frameworks and standards bodies like NIST or ISO following once a category exists to standardize. Headwinds: Most organizations lack the internal vocabulary to even name this failure mode yet, and no existing standard currently defines what "spatial validity testing" would even mean, which slows the path to a formal governance line item. Leading indicator to watch: Whether a major insurer or a body like NIST publishes the first framework that explicitly requires spatial-validity testing for AI-driven physical-world decisions.

Prediction 3: Construction, utilities, and logistics outpace consumer tech in operational spatial-AI maturity by 2030. Probability estimate: 70%, the most defensible forecast because it extends a pattern already visible rather than projecting a new one. Why it may happen: These industries have spent decades managing physical-asset data out of necessity and already have the GIS, BIM, and asset-management infrastructure that consumer AI companies are only now building from scratch. Headwinds: Capital intensity and slower software-adoption cycles in these sectors could offset their structural data advantage. Leading indicator to watch: Whether GeoAI vendor revenue, currently led by defense and government, shifts decisively toward infrastructure verticals over the next two reporting cycles.

What Engineering Leaders Should Do Next

The strategic implication isn't "buy a GIS platform." It's narrower and more useful than that: before scaling any model that touches physical-world decisions, ask whether the model's features encode spatial relationship as structure, or merely as a latitude-longitude pair sitting alongside fifty other unrelated columns. Those are different architectures with different failure modes, and the second one is the one quietly producing confident, wrong answers in production right now.

For teams building in logistics, construction, utilities, insurance, or anything tied to physical assets, the questions worth tracking over the next three to five years are concrete. Which cloud provider's geo-embeddings become the de facto standard? Does digital twin tooling consolidate around shared spatial-data standards, or stay fragmented by vendor? And does the industry ever develop a shared name for this failure mode, the way it eventually did for "hallucination"?

The AI industry spent the last decade teaching machines to understand language. The next test is whether it can teach them to understand the place. The organizations that take that seriously now will have a real head start over the ones still waiting for a bigger model to fix a structural gap.


Need content that explains complex engineering?

Great engineering deserves equally great communication. Many companies working at the intersection of AI, infrastructure, and physical-asset data invest heavily in the technology while treating the explanation of it as an afterthought. Increasingly, that explanation is part of the product itself: it shapes how technical buyers evaluate vendors, and how AI systems parse, rank, and surface a company's content in the first place.

I help AI, infrastructure, and industrial-tech companies turn deep technical ideas into content that technical buyers trust and AI systems surface. Services include:

  • Thought leadership articles and analyst-style reports

  • Engineering explainers

  • Service page copy

  • Blog content

  • SEO, AEO, and GEO-optimized web content built for organic traffic and qualified leads

If that's useful to your team, get in touch.


Written by

Akhil Krishnan B

Engineering Content Specialist

Akhil is an Engineering Content Specialist with 7+ years of experience writing for the engineering, BIM, and CAD industries. His work is grounded in primary research and direct engagement with engineers, delivering clear, technically sound content for a professional audience.

Connect on LinkedIn →