Build an AI Voice Assistant for Restaurants (Vapi / Retell + POS Integration)
At 7:14 PM, your kitchen is busy, your cashier is handling walk-ins, and the phone rings three times in under a minute.
One customer wants to order a chicken burrito with extra guacamole and no onions. Another wants to ask if you are still serving dinner. A third wants to place a large family order for pickup in 25 minutes.
In many restaurants, at least one of those calls gets missed, rushed, or written down incorrectly.
That is exactly where an AI voice assistant for restaurants creates real value. A well-built system can answer every incoming call, take food orders conversationally, confirm them clearly, collect the customer name and phone number, push the order into your POS workflow, and send the order summary by SMS and/or email.
This guide shows how to build that system using Vapi or Retell for voice AI, Twilio or SIP telephony where needed, and Square, Toast, or Clover integration through APIs or webhooks.
Planning restaurant voice AI beyond a demo?
The production challenge is not speech-to-text. It is orchestration: menu structure, modifiers, POS syncing, human handoff, concurrent calls, order logging, and reliable confirmations.
Tirnav Solutions designs restaurant automation systems that connect voice AI + telephony + POS + SMS/email + analytics without creating operational chaos.
Why restaurants are a perfect use case for voice AI
Restaurants live in a high-interruption environment:
- Staff are busy during lunch and dinner rush.
- Customers still prefer calling for pickup, special instructions, or large family orders.
- Missed calls often mean missed revenue.
- Manual phone ordering creates errors around modifiers, sides, sauces, and timing.
An AI restaurant phone ordering assistant solves a very specific operational problem:
- It answers immediately.
- It understands the menu and modifiers.
- It confirms the order before submission.
- It hands off to a human when confidence is low or the customer asks for staff.
- It leaves behind structured data instead of a messy handwritten note.
If implemented correctly, this is not just a chatbot on a phone line. It becomes a 24/7 front-of-house ordering layer.
Core requirements for a restaurant AI voice assistant
Your system should satisfy these requirements from day one:
1) AI answers incoming calls
The assistant must answer every inbound call with a natural greeting, identify the restaurant or branch, and determine the caller's intent:
- Place a new food order
- Ask menu questions
- Check operating hours
- Ask for staff
- Track or modify an existing order
2) Takes food orders conversationally
The assistant should handle natural speech such as:
I want two paneer wraps, one with extra cheese, one without onions, and a Coke
That means it must support:
- quantities
- combos
- size choices
- spice level
- add-ons
- exclusions
- item-level notes
3) Confirms the order clearly before submission
This is non-negotiable. The assistant should read the order back in plain language before it reaches the kitchen or POS:
Confirming your pickup order: two paneer wraps, one with extra cheese, one with no onions, and one Coke. Pickup in 25 minutes. Is that correct?
This single step prevents a large percentage of restaurant phone-order errors.
4) Collects customer name and phone number
At minimum, capture:
- customer full name
- phone number in normalized format
- pickup or delivery preference
- optional email
5) Sends order via SMS and/or email
After confirmation, the customer should receive:
- order summary
- pickup or delivery estimate
- branch/contact details
- link to call back or reply if something is wrong
Advanced requirements that matter in production
These are the features that separate a working prototype from a system a restaurant can trust on Friday night.
Ability to handle multiple simultaneous calls
Your backend should be stateless per call session and horizontally scalable. Voice providers such as Vapi or Retell can manage concurrent call sessions, but your own orchestration layer must also support concurrency for:
- tool calls from many active calls
- simultaneous menu lookups
- concurrent POS submissions
- parallel SMS/email notifications
- logging and transcript persistence
If your design relies on a single in-memory order state, it will break the moment five callers order at once.
Call transfer to human staff if needed
The assistant should transfer the call when:
- the caller explicitly asks for a human
- order confidence is low
- the menu item is ambiguous
- the customer is angry or confused
- the request is outside policy, such as catering or refund disputes
Integration with Square, Toast, and Clover
A production system needs a POS adapter layer, not a pile of one-off API calls. Each provider has its own object model, permissions, and workflow assumptions. Your AI should submit a normalized internal order to an adapter, and that adapter should translate it for:
- Square
- Toast
- Clover
If direct POS access is not available for a specific merchant setup, the same adapter can send a webhook to a middleware service or order dashboard.
Structured menu with modifiers
Restaurants do not sell flat products. They sell nested choices:
- burger
- burger size
- cheese add-on
- no onions
- sauce on side
- combo drink selection
Your assistant must reason over menu groups, required modifiers, optional modifiers, max selections, and price deltas.
Logging of calls and orders
You need complete logs for:
- call id
- transcript
- extracted items
- final confirmed order
- handoff reason
- POS submission status
- SMS/email delivery status
Without logging, you cannot debug bad orders or measure business impact.
Nice-to-have features that quickly become high-value
Twilio or telephony API experience
Even if Vapi or Retell handles the voice layer, telephony still matters for:
- number provisioning
- transfers
- failover routing
- voicemail handling
- SMS delivery
Twilio is especially useful when you want a reliable path for call transfer + confirmation SMS + operational fallback in the same stack.
Restaurant systems experience
Restaurant logic has edge cases that generic voice bots often miss:
- out-of-stock items
- time-based menu changes
- delivery radius rules
- kitchen prep time spikes during rush
- branch-specific pricing
- modifiers with operational constraints
Multi-language support
English should come first, but the architecture should support additional languages later. This is especially useful for high-volume local businesses where callers may switch languages mid-call.
Reference architecture: Vapi or Retell + telephony + POS + notifications
Design rule: the voice provider should manage the conversation, but your restaurant order orchestrator should own business truth:
- menu validation
- modifier rules
- idempotent order submission
- audit logs
- transfer decisions
This keeps your logic portable if you switch from Vapi to Retell or change POS providers later.
End-to-end call flow for AI restaurant ordering
Here is what a reliable call flow looks like:
1) Greet and identify intent
Thanks for calling Spice Garden. I can help you place a pickup order, answer menu questions, or connect you to our staff. What would you like to do today?
2) Determine order context
The assistant should identify:
- location or branch
- pickup vs delivery
- requested time
- language
3) Take the order item by item
The AI should ask only the questions that are actually required:
- What size would you like?
- Would you like to add extra cheese?
- Which drink comes with the combo?
It should not ask unnecessary questions for optional modifiers unless the customer signals interest.
4) Validate against the menu
Before final confirmation, the system should check:
- item exists
- branch carries it
- required modifiers are selected
- excluded combinations are blocked
- item is in stock
5) Read back the final order clearly
This is the moment to correct mistakes before money or kitchen effort is wasted.
6) Collect customer details
Capture name, phone, optional email, and any necessary fulfillment notes.
7) Submit to POS or middleware
Once the caller says yes, submit the normalized order to the POS adapter layer.
8) Send SMS/email confirmation
The post-call message should include the exact confirmed items, not an approximate summary.
9) Log everything
Store transcript, order payload, provider response, transfer status, and notification outcomes.
How to model restaurant menus with modifiers
Menu modeling is where many voice AI projects fail. If your menu is not structured cleanly, the conversation will sound smart but generate bad orders.
Use a menu model like this:
1{ 2 "itemId": "burger_classic", 3 "name": "Classic Burger", 4 "basePrice": 8.99, 5 "fulfillmentModes": ["pickup", "delivery"], 6 "modifierGroups": [ 7 { 8 "id": "size", 9 "name": "Size", 10 "required": true, 11 "minSelections": 1, 12 "maxSelections": 1, 13 "options": [ 14 { "id": "single", "name": "Single", "priceDelta": 0 }, 15 { "id": "double", "name": "Double", "priceDelta": 2.5 } 16 ] 17 }, 18 { 19 "id": "cheese", 20 "name": "Cheese", 21 "required": false, 22 "minSelections": 0, 23 "maxSelections": 2, 24 "options": [ 25 { "id": "american", "name": "American Cheese", "priceDelta": 1.0 }, 26 { "id": "cheddar", "name": "Cheddar", "priceDelta": 1.2 } 27 ] 28 }, 29 { 30 "id": "remove_ingredients", 31 "name": "Remove Ingredients", 32 "required": false, 33 "minSelections": 0, 34 "maxSelections": 5, 35 "options": [ 36 { "id": "no_onions", "name": "No Onions", "priceDelta": 0 }, 37 { "id": "no_pickle", "name": "No Pickle", "priceDelta": 0 } 38 ] 39 } 40 ] 41}
This structure lets the AI ask correct follow-up questions and lets your POS adapter map choices reliably.
TypeScript example: commit a confirmed order from Vapi or Retell
The following TypeScript example shows a backend endpoint that receives a confirmed order from a voice session, validates it, submits it through a POS adapter, logs the result, and sends notifications.
1import express from "express"; 2 3type ModifierSelection = { 4 groupId: string; 5 optionId: string; 6 name: string; 7 priceDelta: number; 8}; 9 10type OrderItem = { 11 itemId: string; 12 name: string; 13 quantity: number; 14 unitPrice: number; 15 modifiers: ModifierSelection[]; 16 note?: string; 17}; 18 19type CustomerInfo = { 20 name: string; 21 phone: string; 22 email?: string; 23}; 24 25type ConfirmedOrderRequest = { 26 callId: string; 27 provider: "vapi" | "retell"; 28 branchId: string; 29 language: string; 30 fulfillmentMode: "pickup" | "delivery"; 31 requestedAt?: string; 32 customer: CustomerInfo; 33 items: OrderItem[]; 34 spokenConfirmationAccepted: boolean; 35}; 36 37type PosOrderResult = { 38 externalOrderId: string; 39 status: "accepted" | "queued"; 40 estimatedReadyMinutes?: number; 41}; 42 43interface PosAdapter { 44 submitOrder(order: ConfirmedOrderRequest): Promise<PosOrderResult>; 45} 46 47const app = express(); 48app.use(express.json()); 49 50const processedCallIds = new Set<string>(); 51 52const posAdapter: PosAdapter = { 53 async submitOrder(order) { 54 const response = await fetch("https://pos-middleware.internal/orders", { 55 method: "POST", 56 headers: { 57 "Content-Type": "application/json", 58 "Idempotency-Key": order.callId 59 }, 60 body: JSON.stringify(order) 61 }); 62 63 if (!response.ok) { 64 throw new Error(`POS submission failed with status ${response.status}`); 65 } 66 67 return response.json() as Promise<PosOrderResult>; 68 } 69}; 70 71function normalizePhone(phone: string): string { 72 return phone.replace(/[^\d+]/g, ""); 73} 74 75function calculateGrandTotal(items: OrderItem[]): number { 76 return items.reduce((sum, item) => { 77 const modifierTotal = item.modifiers.reduce((m, mod) => m + mod.priceDelta, 0); 78 return sum + (item.unitPrice + modifierTotal) * item.quantity; 79 }, 0); 80} 81 82async function logOrderEvent(payload: Record<string, unknown>): Promise<void> { 83 await fetch("https://ops.internal/logs/order-events", { 84 method: "POST", 85 headers: { "Content-Type": "application/json" }, 86 body: JSON.stringify(payload) 87 }); 88} 89 90async function sendSms(phone: string, message: string): Promise<void> { 91 await fetch("https://messaging.internal/sms", { 92 method: "POST", 93 headers: { "Content-Type": "application/json" }, 94 body: JSON.stringify({ to: phone, body: message }) 95 }); 96} 97 98async function sendEmail(email: string, subject: string, body: string): Promise<void> { 99 await fetch("https://messaging.internal/email", { 100 method: "POST", 101 headers: { "Content-Type": "application/json" }, 102 body: JSON.stringify({ to: email, subject, body }) 103 }); 104} 105 106app.post("/voice/order/commit", async (req, res) => { 107 const order = req.body as ConfirmedOrderRequest; 108 109 if (!order.spokenConfirmationAccepted) { 110 return res.status(400).json({ error: "Order must be confirmed verbally before submission." }); 111 } 112 113 if (!order.customer?.name || !order.customer?.phone || order.items.length === 0) { 114 return res.status(400).json({ error: "Missing customer details or order items." }); 115 } 116 117 if (processedCallIds.has(order.callId)) { 118 return res.status(200).json({ ok: true, deduplicated: true }); 119 } 120 121 processedCallIds.add(order.callId); 122 123 try { 124 order.customer.phone = normalizePhone(order.customer.phone); 125 126 const posResult = await posAdapter.submitOrder(order); 127 const total = calculateGrandTotal(order.items).toFixed(2); 128 129 const summary = 130 `Hi ${order.customer.name}, your order is confirmed. ` + 131 `Order #${posResult.externalOrderId}, total $${total}. ` + 132 `Status: ${posResult.status}.` + 133 (posResult.estimatedReadyMinutes 134 ? ` Estimated ready in ${posResult.estimatedReadyMinutes} minutes.` 135 : ""); 136 137 const notificationTasks: Promise<unknown>[] = [ 138 sendSms(order.customer.phone, summary), 139 logOrderEvent({ 140 eventType: "order_committed", 141 callId: order.callId, 142 provider: order.provider, 143 branchId: order.branchId, 144 externalOrderId: posResult.externalOrderId, 145 total, 146 itemCount: order.items.length 147 }) 148 ]; 149 150 if (order.customer.email) { 151 notificationTasks.push(sendEmail(order.customer.email, "Your restaurant order is confirmed", summary)); 152 } 153 154 await Promise.all(notificationTasks); 155 156 res.status(200).json({ 157 ok: true, 158 externalOrderId: posResult.externalOrderId, 159 estimatedReadyMinutes: posResult.estimatedReadyMinutes 160 }); 161 } catch (error) { 162 await logOrderEvent({ 163 eventType: "order_commit_failed", 164 callId: order.callId, 165 message: error instanceof Error ? error.message : "Unknown error" 166 }); 167 168 res.status(502).json({ error: "Could not submit restaurant order." }); 169 } finally { 170 processedCallIds.delete(order.callId); 171 } 172});
What this TypeScript example gets right
- it requires explicit verbal confirmation
- it normalizes phone numbers
- it uses the callId as an idempotency key
- it submits to a POS adapter instead of hardcoding one provider
- it sends SMS and optional email
- it logs both success and failure paths
In production, move idempotency state from memory into Redis or a database, because restaurant systems must survive restarts and scale across multiple backend instances.
Java example: POS adapter service with concurrency for high call volume
If your backend is built in Java, a clean pattern is to keep one normalized internal order model and dispatch downstream work concurrently: POS submission, audit logging, and notifications.
1import java.net.URI; 2import java.net.http.HttpClient; 3import java.net.http.HttpRequest; 4import java.net.http.HttpResponse; 5import java.time.Duration; 6import java.util.Map; 7import java.util.concurrent.CompletableFuture; 8import java.util.concurrent.ExecutorService; 9import java.util.concurrent.Executors; 10 11public class RestaurantOrderService { 12 13 private final ExecutorService executor = Executors.newVirtualThreadPerTaskExecutor(); 14 private final HttpClient httpClient = HttpClient.newBuilder() 15 .connectTimeout(Duration.ofSeconds(5)) 16 .build(); 17 18 private final Map<String, PosClient> posClients = Map.of( 19 "square", new SquarePosClient(httpClient), 20 "toast", new ToastPosClient(httpClient), 21 "clover", new CloverPosClient(httpClient) 22 ); 23 24 public OrderResult commitConfirmedOrder(String posProvider, RestaurantOrder order) { 25 PosClient posClient = posClients.get(posProvider); 26 if (posClient == null) { 27 throw new IllegalArgumentException("Unsupported POS provider: " + posProvider); 28 } 29 30 OrderResult result = posClient.submit(order); 31 32 CompletableFuture<Void> logFuture = CompletableFuture.runAsync( 33 () -> auditLog("order_committed", order, result), 34 executor 35 ); 36 37 CompletableFuture<Void> smsFuture = CompletableFuture.runAsync( 38 () -> sendSms(order.customerPhone(), buildConfirmationMessage(order, result)), 39 executor 40 ); 41 42 CompletableFuture<Void> emailFuture = order.customerEmail() == null 43 ? CompletableFuture.completedFuture(null) 44 : CompletableFuture.runAsync( 45 () -> sendEmail(order.customerEmail(), "Your order is confirmed", buildConfirmationMessage(order, result)), 46 executor 47 ); 48 49 CompletableFuture.allOf(logFuture, smsFuture, emailFuture).join(); 50 return result; 51 } 52 53 private void auditLog(String eventType, RestaurantOrder order, OrderResult result) { 54 // Persist transcript reference, order JSON, provider, timestamps, and final status. 55 } 56 57 private void sendSms(String phone, String body) { 58 // Call Twilio or your messaging service here. 59 } 60 61 private void sendEmail(String email, String subject, String body) { 62 // Call SES, SendGrid, or internal mail service here. 63 } 64 65 private String buildConfirmationMessage(RestaurantOrder order, OrderResult result) { 66 return "Hi " + order.customerName() + ", your order #" + result.externalOrderId() 67 + " is confirmed. Estimated ready in " + result.readyMinutes() + " minutes."; 68 } 69 70 interface PosClient { 71 OrderResult submit(RestaurantOrder order); 72 } 73 74 record RestaurantOrder(String customerName, String customerPhone, String customerEmail) {} 75 record OrderResult(String externalOrderId, int readyMinutes) {} 76 77 static class SquarePosClient implements PosClient { 78 private final HttpClient httpClient; 79 SquarePosClient(HttpClient httpClient) { this.httpClient = httpClient; } 80 public OrderResult submit(RestaurantOrder order) { return new OrderResult("SQ-1001", 20); } 81 } 82 83 static class ToastPosClient implements PosClient { 84 private final HttpClient httpClient; 85 ToastPosClient(HttpClient httpClient) { this.httpClient = httpClient; } 86 public OrderResult submit(RestaurantOrder order) { return new OrderResult("TS-2001", 25); } 87 } 88 89 static class CloverPosClient implements PosClient { 90 private final HttpClient httpClient; 91 CloverPosClient(HttpClient httpClient) { this.httpClient = httpClient; } 92 public OrderResult submit(RestaurantOrder order) { return new OrderResult("CL-3001", 18); } 93 } 94}
This pattern is a strong fit for restaurants with heavy evening traffic because it handles many simultaneous order-related tasks without blocking a single thread per notification or log write.
POS integration strategy for Square, Toast, and Clover
Do not make your AI talk directly in POS-native language. Instead, define an internal order contract like:
1{ 2 "branchId": "downtown", 3 "fulfillmentMode": "pickup", 4 "customer": { 5 "name": "Ava Patel", 6 "phone": "+15551234567", 7 "email": "[email protected]" 8 }, 9 "items": [ 10 { 11 "itemId": "burger_classic", 12 "quantity": 2, 13 "modifiers": ["double", "no_onions", "cheddar"] 14 } 15 ], 16 "notes": "Pickup in 20 minutes", 17 "source": "voice_ai" 18}
Then build adapters that translate that payload into the target provider's required format.
Best practice for restaurant POS integration
- keep your own canonical order schema
- keep menu ids mapped to POS catalog ids
- use idempotency keys for all order submissions
- log raw request and raw response payloads
- support fallback to webhook or operator dashboard
This is especially important when merchants have partial POS access, custom plugins, or branch-specific setups.
Handling multiple simultaneous calls without breaking operations
Concurrency is not just a scaling feature. It directly affects order quality.
If ten callers speak to the assistant at once, your system must isolate each session cleanly:
- one session state per call
- no shared mutable in-memory order object
- durable state in Redis, Postgres, or another shared store
- per-call idempotency token
- retry-safe POS submission
For high-volume restaurants, add:
- queue-based event processing for non-blocking logs and analytics
- branch-level rate limiting for downstream POS endpoints
- circuit breakers when the POS is degraded
- operational fallback to staff transfer or SMS-only confirmation
A practical rule: treat voice AI as real-time, but treat POS, logging, and notifications as resilient distributed systems work.
Human handoff and call transfer design
Not every restaurant call should stay with AI.
Transfer the call when:
- customer says "representative," "manager," or "staff"
- confidence score is below threshold
- order exceeds complexity threshold
- customer mentions allergy concerns that require staff confirmation
- catering, refunds, complaints, or store-specific exceptions are requested
Your transfer workflow should include:
- Warm summary for staff
- Transcript snippet
- Captured customer details
- Current draft order
- Reason for transfer
That way, the customer does not have to repeat the entire conversation.
If you use Twilio, this is typically implemented with a controlled transfer to a staff number, queue, or SIP endpoint while preserving call metadata in your backend.
Logging, analytics, and auditability
For restaurant voice ordering, logs are operational assets, not developer luxury.
Track at least:
- call_id
- voice provider session id
- caller phone
- branch id
- transcript
- recognized language
- order extraction steps
- final confirmed items
- POS provider and response code
- SMS/email delivery result
- transfer outcome
- total order value
- call duration
These logs let you answer the questions that restaurant owners actually care about:
- How many calls did AI answer?
- How many orders were successfully completed?
- Which modifier combinations create errors?
- Which branches need more human fallback?
- How much revenue was recovered from missed calls?
SMS and email confirmation content
A good confirmation message should be short, precise, and operationally useful.
SMS example:
1Spice Garden: Hi Ava, your pickup order #SQ-1001 is confirmed. 22x Classic Burger (double, cheddar, no onions) 3Estimated ready in 20 min. 4Questions? Call us at (555) 010-1000.
Email example:
- order number
- timestamp
- branch
- itemized summary
- taxes/fees if applicable
- pickup or delivery notes
- support contact details
If the order is only queued and not fully accepted by the POS yet, say that clearly. Never send a misleading "confirmed" message.
Multi-language support for restaurant voice AI
English-first is the right rollout strategy, but multi-language support should be built into the data model and prompt design.
That means:
- store menu aliases by language
- localize modifier labels
- support bilingual confirmation prompts
- store transcript language per call
- allow human transfer when the language model confidence drops
This matters because food ordering often contains slang, local dish names, and code-switching in the same sentence.
Security and operational guardrails
There is one critical rule for restaurant phone ordering:
Do not casually collect or process card numbers over voice AI unless you are deliberately handling PCI scope.
A safer design is:
- AI takes the order
- POS or payment link handles payment separately
- confirmation SMS/email includes the secure next step if needed
Additional safeguards:
- verify branch availability before promising prep time
- expire stale draft orders
- mask customer PII in analytics views
- encrypt transcripts and order logs at rest
- add monitoring for failed POS submissions and failed SMS delivery
FAQ:
Can an AI voice assistant really take restaurant orders accurately?
Yes, if the menu is structured well, the assistant is required to confirm orders before submission, and ambiguous cases transfer to human staff. Accuracy comes from menu modeling + confirmation + logging, not from voice AI alone.
Can I connect restaurant voice AI to Square, Toast, or Clover?
Yes. The safest pattern is to connect through a POS adapter layer or webhook middleware so your voice logic stays independent of one POS provider.
How does the assistant handle add-ons like extra cheese or no onions?
Those should be represented as modifier groups and validated before order submission. This is essential for real restaurant ordering.
Can the system manage multiple calls at the same time?
Yes, but only if call state is isolated per session and your backend is built for concurrency, idempotency, and retry-safe downstream integrations.
What happens if the caller wants a human?
The AI should transfer the call immediately, along with a summary of what has already been captured.
Is Twilio required?
Not always. Vapi or Retell may cover much of the voice stack, but Twilio is still useful for telephony control, SMS confirmations, and fallback transfer flows.
Final implementation advice
If you are building an AI voice assistant for restaurants, start with a narrow but production-safe scope:
- pickup orders only
- one branch
- English only
- 20 to 40 best-selling menu items
- SMS confirmation
- human transfer for low-confidence calls
Then add:
- delivery
- multiple branches
- deeper POS automation
- multilingual support
- analytics and optimization
The fastest path to value is not "make the AI do everything." It is "make the AI do the repetitive 80% reliably, and escalate the risky 20% cleanly."
That is how restaurant voice AI moves from interesting demo to measurable revenue system.




