Most guest intelligence conversations start with the experience: a returning guest recognized, a preference met without being asked, a mishap caught early. That's the surface.
This post is about what's underneath.
We don't usually write technically. The people running hotels don't need to read our engineering docs, and we don't think buying decisions should hinge on infrastructure minutiae. But there's a real question beneath the demos: is there durable infrastructure here, or is this a thin AI layer stitched onto off-the-shelf tooling?
For the teams that want to pressure-test that question (hotel IT, group CIOs, integration partners, technical advisors to ownership groups), here's how Abra actually works.
At the end, we flip the frame: three questions any serious reader should put to any vendor in this space, Abra included. If the architecture below is doing its job, those questions should be easy to answer.
The design problem
A hotel's guest intelligence lives in a dozen systems that weren't designed to talk to each other. Oracle OPERA, Mews, Maestro, and Infor HMS each model a guest differently. A loyalty program knows a member number the PMS has never heard of. SevenRooms has F&B preferences the front desk can't see. Revinate has survey sentiment the housekeeping team will never read. And much of what staff actually know about a guest isn't in any system at all. It's on a clipboard, in a shift huddle, or in the head of a bellhop who just clocked out.
Any platform that claims to unify this has to solve four different problems at once:
-
Ingest from vendors with wildly different data shapes, without getting locked into any of them.
-
Serve two workloads from the same source of truth: fast reads for staff applications at the front desk, and long-history analytics for ownership.
-
Model relationships between guests, stays, rooms, staff, and departments in a way that matches how people actually think about service.
-
Extract structured knowledge from free-text notes without turning it into a black box hotels can't audit.
Each of those is a different engineering problem with a different right answer. So we built four layers, one per problem, and connected them with a single canonical definition of what a guest, a stay, and a room are.
Layer 1: Ingestion and canonical entities
At the boundary with source systems, every record lands in raw form first (the original PMS payload, unchanged) and is then promoted through a staging step into a canonical entity: one vendor-neutral record per guest, stay, room, note, or preference.
Canonical records are identified by a stable composite key of the form property × source system × source identifier. That sounds like a plumbing detail, but it's the thing that lets Abra reconcile a guest who appears in a PMS and a loyalty program without collapsing them prematurely. Two sources become two records with a known relationship, not one guess.
Reservation statuses follow the same pattern. Every source status maps to a canonical set (confirmed, tentative, checked-in, checked-out, cancelled, no-show, waitlist), while the original source status is preserved verbatim alongside for auditability. A new PMS can be onboarded by writing a connector that produces canonical records. Nothing downstream has to change.
Every canonical record is change-tracked, soft-deletable, and replayable. Translation: any downstream system can be rebuilt from scratch, backfilled from history, or restated after a correction without losing lineage. When a hotel tells us "this guest was misidentified three months ago, can you fix it?", the answer is yes, and the fix propagates cleanly through every system that reads from the canonical layer.
Layer 2: The warehouse and the operational database
One canonical change stream fans out in parallel to three independent downstream stores: the analytical warehouse, the operational database, and the context graph in Layer 3. Each is purpose-built for a different shape of question, and each can be rebuilt from canonical history on its own schedule. That last point matters more than it sounds: if any downstream store gets corrupted, drifts, or needs a schema change, it can be rebuilt from scratch without touching the others, and without replaying anything through the upstream source systems.
The analytical warehouse holds the full history. It's a columnar analytical warehouse with a star-schema model of guests, reservations, and room nights, and it's what feeds the dashboards ownership looks at, the segmentation work revenue management runs, and the feature engineering for the ML models that score guests and stays. Historical depth matters here. An ownership group asking "how does the wellness-oriented guest segment compare across our Kauai and Aspen properties over the last three years?" needs a store that was built for that kind of query.
The operational database holds the current state of the same entities (a relational operational database for the application surface) and is tuned for the opposite profile: fast reads from the staff applications that run daily hotel operations. Front-office tools, CRM consoles, housekeeping dashboards, the Abra mobile app. When a front-desk agent pulls up a guest profile, the question isn't "show me three years of segmentation analysis." It's "who is standing in front of me right now, and what do we know about them." Different store, different index strategy, different performance contract.
Those contracts aren't vague. We hold ourselves to concrete latency targets: a p95 of under 500 ms for CRM-shaped queries against the analytical layer, and under 50 ms for a direct guest lookup against the operational store. These are commitments, not advertised benchmarks. A front-desk agent checking in a guest shouldn't wait on a spinner, and a GM opening a morning dashboard shouldn't either.
On top of the warehouse, a thin semantic metrics layer exposes the same metrics to apps, agents, and BI: arrivals today, in-house, departures, revenue daily, guest preferences. One canonical metrics layer means the number on a GM's dashboard matches the number in a manager's morning briefing matches the number an AI agent cites in a summary. Different surfaces, same truth.
Layer 3: The context graph
Above the tabular layers sits the part most new to hospitality tech: a context graph that represents guests, stays, rooms, notes, staff, departments, and extracted facts as nodes connected by explicit relationships.
Tabular stores are the wrong shape for the questions staff actually ask about the guest in front of them. "Who has stayed with us before, what do they prefer, who travels with them, what notes do we have, and which of our departments is responsible for each of those preferences?" That's not one query. It's a traversal across guest → stays → notes → facts → departments, with conditional filtering at every hop. Graphs are built for exactly this.
Staff are first-class entities in the graph, not a column on a note record. A housekeeping lead, a concierge, a bartender: each has their own node, and the observations and confirmations they contribute attach as attributed edges back to them. When a pattern turns out to matter ("the last three guests this concierge flagged as anniversary stays all came back within a year"), the graph can actually answer for it. It also means a hotel can always see which staff member stood behind a given piece of guest knowledge.
Some of the edge types are genuinely hospitality-specific and worth naming. TRAVELED_WITH, for instance, is a first-class relationship: companion travel (a spouse, a child, a frequent colleague, an assistant) is modeled as its own edge rather than inferred from shared reservations. That's the sort of thing a concierge has always tracked mentally, and a flat-record system tends to lose. A handful of other edge types work the same way, attributing observations to the staff member who made them, linking a preference to the stay where it first surfaced, connecting a note to the fact it produced.
Embeddings live on every node that matters: every guest, preference, nugget, trace, celebration, and source note. This isn't a vector index bolted on top of unstructured text. It's a first-class attribute on the structured nodes themselves, which is what lets structured and similarity-based lookup compose cleanly in one layer. Structured queries (all guests who stayed in suite 412 in the last 18 months) and similarity queries (guests whose preference profile looks like this one) work side by side. "Find guests whose profile is similar to this one, who also have an anniversary stay on the books in the next 90 days" is one query, not three.
This is the layer staff-facing AI actually reads from. When a concierge opens an arrival briefing, when a housekeeping lead sees a pre-arrival suite setup checklist, when an F&B manager gets a VIP dietary note, they're all reading projections of the graph.
Layer 4: The agent layer
On top of the graph, AI agents consume unstructured guest notes and emit structured facts back into it. Free text goes in: "VIP, prefers quiet rooms, allergic to shellfish, anniversary June 12, first time she called down for extra pillows around 11pm." Four distinct facts come out, one of each type:
-
A preference (quiet rooms) routed to front office and housekeeping.
-
A celebration (anniversary, June 12) routed to F&B and guest experience.
-
A notable detail (VIP status) flagged to the arrivals team.
-
A trace, in the form of an observed action (late-night request for extra pillows), logged against the guest so the next turndown team sees the pattern before the guest has to ask again.
The shellfish allergy in that same note gets special handling on top of the preference track: regulated-domain facts like allergies carry elevated confidence requirements and are routed to F&B and kitchen with their own review path.
Crucially, the agent doesn't just extract facts. It reconciles them against what's already in the graph. Before writing anything new, it checks whether the same preference already lives on the guest (either verbatim or semantically close, via the embeddings sitting on existing nodes). If "prefers quiet rooms" is already there, the new observation reinforces the existing node rather than creating a duplicate. Over a year of notes that's the difference between a guest with one clean preference record and a guest with seventeen near-copies of the same fact.
Every fact carries three things: a confidence score at extraction time, a provenance link back to the note it came from, and a validity window describing when it became true and whether it's been superseded by newer information. That validity window is tracked along two separate time dimensions: when a fact became true in the world, and when we learned it. A guest who switched to a gluten-free diet last January but only mentioned it to the concierge in April has two honest timestamps on that fact, and both matter. The world-time tells staff when the change actually happened; the learn-time tells an auditor when the hotel could reasonably have known.
Confidence drives routing. High-confidence extractions are written straight to production. Mid-confidence extractions go to a human review queue before they're visible to staff. Low-confidence extractions are suppressed: the agent saw something, but not clearly enough to act on. A hotel can audit any fact in the system by following its provenance link back to the original note and, if needed, the original PMS payload it was ingested from.
Allergies are the obvious case for why this matters. A hotel should never have to take a platform's word for it that a guest is allergic to shellfish. They should be able to see exactly which note that claim came from, when, at what confidence, and whether anyone has reviewed it since. The architecture makes that trivially auditable.
What holds across all four layers
Three properties apply everywhere, and they're the ones that matter most for any team evaluating the platform for real deployment.
Multi-tenancy by construction. Every record at every layer is qualified by property. Queries, indexes, access controls, and AI prompts are scoped on property by design; there is no path that implicitly crosses properties. A portfolio customer with ten properties gets ten cleanly isolated tenants with an explicit cross-property contract on top, not a single pooled dataset with filters that someone has to remember to apply.
Lineage and provenance end-to-end. Every canonical record preserves its raw source payload. Every agent-produced fact preserves a link back to the note it was extracted from, its confidence at extraction time, and any subsequent human-review attribution. No separate lineage infrastructure required: it's built into the data model.
Versioning at every boundary. The context graph exposes a formally versioned contract. The canonical layer tracks content hashes and source timestamps on every entity. The agent layer tracks the two time dimensions described above (world-time and learn-time) on every fact, so both the operator's view and the auditor's view are always recoverable. New concepts can be added to the model without breaking existing consumers, and historical state at any past point in time can be reconstructed on demand.
Why we did this and not something simpler
A reasonable question from a technical reader is: couldn't you do most of this with a CDP, a vector database, and a prompt template?
You could build something that way. You couldn't build this.
A CDP assumes the identity problem is the hard part. In hospitality it's the tip of the iceberg. The note structure, the provenance requirements, the property-level isolation, the need to project the same truth into analytical and operational stores with different performance profiles, and the need to serve the graph-shaped questions staff actually ask are all problems CDPs don't model. Similarly, a vector database gives you semantic search but not the structured relationships that make service routing correct. And a prompt template gives you plausible text but not a durable, auditable fact record.
The reason guest intelligence is hard is that all four of these problems are real at the same time. Solving one of them well, and hand-waving the other three, produces the kind of AI pilot that demos beautifully and then fails quietly at scale.
What this unlocks for hotels
For a hotel, the architecture is mostly invisible. What they see is:
-
Guest profiles that stitch cleanly across PMS, POS, loyalty, and messaging, without the usual duplicate-record mess.
-
Arrival briefings that include facts a human could audit back to their source.
-
Preferences and allergies that travel with a guest across properties in a portfolio.
-
Staff observations that enter the system in natural language and get routed to the right department without anyone writing a rule.
-
Analytics that tie cleanly to the same numbers the operational app is showing.
-
An AI layer that gets more accurate over time because staff corrections retrain the classifier. The flywheel depends on having a clean graph and clean provenance, which is why we built those first.
The experience on the surface is the point. The architecture underneath is what makes the experience trustworthy enough for a hotel to actually stake a guest relationship on.
If you're evaluating platforms
Three questions any team evaluating a guest intelligence platform should ask, regardless of vendor:
-
Where does an AI-extracted fact come from, and can you show me? If the answer isn't "here's the source note, here's the confidence, here's the reviewer," the platform is a black box.
-
How is property isolation enforced: by query filters, or by the data model itself? The difference is the blast radius of a single bug.
-
Can the system be rebuilt from canonical history, or is the current database the only source of truth? Replayability is the difference between "we fixed the bug" and "we fixed the bug and restated every downstream system correctly."
If the answers are reassuring, great. If they aren't, you've learned something important before signing anything.
This is what we built Abra to be able to answer without flinching. If you want to see the platform in context for your property or group, get in touch.


