Scalable bot architecture

Q: Do I really need a queue between the webhook and my logic?

Yes, once you serve more than a handful of conversations per minute. Telegram, Discord and WhatsApp expect a 200 OK in under a few seconds or they retry — and a retry storm without idempotency duplicates messages. A queue (Redis Streams, RabbitMQ, SQS) decouples ingestion from processing, absorbs traffic spikes and lets you replay failed events safely.

Q: What are the real rate limits per platform?

Telegram: 30 messages/sec global, 1 message/sec per chat, ~20 messages/min per group. Discord: 50 requests/sec per route, 5 messages/5s per channel. WhatsApp Business API: tier-based (1K → 100K conversations/day), 80 messages/sec per phone number, plus Meta's conversation-based pricing. Always implement exponential backoff and respect the Retry-After header.

Q: How do I make webhooks idempotent?

Persist a hash of (platform, update_id or message_id, tenant_id) before processing. If the key already exists, ACK and skip. Use database transactions or Redis SETNX with TTL. Idempotency turns network retries from a duplicate-message bug into a no-op, which is essential when WhatsApp or Stripe retries the same webhook 3–5 times.

Q: What does observability look like for a serious bot?

Structured logs with correlation IDs per conversation; metrics for queue depth, worker latency p50/p95/p99, error rate per platform, rate-limit hits, and human-handoff conversion; tracing across webhook → queue → worker → CRM call (OpenTelemetry). Alert on p95 latency, dead-letter queue size and failed deliveries — not on raw CPU.

A bot that scales is not “a bigger script”: it is events → queue → workers → persistence → integrations, with platform limits and observability from day one.

Minimum layers

Ingestion: webhooks (Discord/Telegram) or polling with idempotency on update_id.
Queue: Redis, SQS or similar to absorb spikes without dropping messages.
Workers: isolated business logic—never block the webhook thread.
State: PostgreSQL (users, subscriptions, tickets)—not process memory alone.
Outbound: messaging APIs + CRM/ERP with retries and dead-letter queues.

Mistakes we often fix

Mixing Discord and WhatsApp in one deployment without credential isolation; no structured logs; no human handoff; ignoring rate limits. We keep a messaging bots hub and one main service landing instead of ten thin clones.

Hire bot development Bots hub

1. Diagnosing why bots crash in production

The pattern is almost always the same: a single-file Python or Node script that works fine in demo, the bot launches with one channel, traffic grows, and within a month you start seeing duplicate messages, ghost replies, conversations stuck mid-flow, hosting bills that creep up, and a customer support backlog the bot was supposed to remove. The root cause is rarely the messaging library — it is the absence of architectural primitives that any production system needs: idempotency, queues, persistent state and observability.

Once you cross ~50 active conversations per hour, "bot" stops being a script and starts being a distributed system. Scaling means treating the webhook as untrusted input, the queue as the system's heartbeat, and the database as the only source of truth. Below is the minimum architecture we ship for every bot project, regardless of channel.

2. The eight architectural decisions that matter

Stateless workers, stateful persistence. Webhook handlers should validate signatures, enqueue the event and return 200 OK in under one second. All conversation state (current step, collected fields, tenant, locale) lives in PostgreSQL or Redis keyed by (platform, chat_id, user_id). This lets you autoscale workers horizontally without sticky sessions and survive deploys without losing context.
Queues between the edge and your logic. Redis Streams, RabbitMQ, SQS or BullMQ. They absorb spikes (a CRM webhook storm, a marketing blast), decouple ingestion from processing and provide a dead-letter queue (DLQ) for events that keep failing. Without a queue, a 30-second LLM call blocks the webhook and Telegram retries the same update — duplicating actions downstream.
Idempotent webhooks. Hash (platform, update_id or message_id, tenant_id) in a unique-key table or Redis SETNX with TTL. If the key exists, ACK and skip. This turns inevitable network retries from a duplicate-message bug into a no-op. Critical for WhatsApp (Meta retries 3–5 times) and Stripe webhooks reaching the same bot.
Platform rate limits as first-class concern. Telegram: 30 msg/s global, 1 msg/s per chat. Discord: 50 req/s per route, 5 msg/5s per channel. WhatsApp Business: 80 msg/s per number plus tiered daily limits. Implement a per-route token bucket, respect the Retry-After header, and serialise outbound messages per chat with exponential backoff.
Observability from day one. Structured JSON logs with conversation_id correlation; metrics for queue depth, worker p50/p95/p99 latency, error rate per platform, rate-limit hits and human-handoff conversion; OpenTelemetry tracing across webhook → queue → worker → CRM. Alerts on DLQ size, p95 latency and failed deliveries — never on raw CPU.
Multi-tenant from the first commit. Even if you start with one client, design the database with tenant_id on every row, scoped tokens per tenant, isolated credentials and per-tenant rate limits. Retrofitting multi-tenancy after launch costs 5x more than building it in week one.
WhatsApp template discipline. Outside the 24-hour conversation window you can only send pre-approved Meta templates (HSM). Plan your template library before coding: utility, marketing, authentication. Submit templates early — Meta approval takes 1–7 days and rejections cascade into missed launch dates.
Human handoff with full context. When the bot escalates, the agent must receive a CRM ticket with conversation transcript, collected fields, intent classification and the reason for handoff. Without this, automation becomes a wall the customer has to climb twice — once with the bot, once with the human.
GDPR & data residency. EU bots should store data in EU regions, expose per-user export and deletion endpoints, retain conversations no longer than the documented policy (typically 90–365 days) and log access for audit. Discord/Telegram message content is yours to manage; WhatsApp adds Meta's own DPA on top.

3. Reference stack we ship

Layer	Light (≤500 conv/day)	Mid (≤20k conv/day)	Heavy (100k+ conv/day)
Edge / webhook	Single Node/Python service on Render or Railway	Fastify/FastAPI behind a load balancer, 2–4 instances	Edge runtime (Cloudflare Workers) or Kubernetes ingress
Queue	BullMQ + Redis (managed)	RabbitMQ or SQS + DLQ	Kafka or Redis Streams with consumer groups
Database	Managed Postgres (Neon, Supabase)	RDS Postgres + read replica	Postgres + Redis cache + partitioning by tenant
Observability	Logtail / Better Stack	Grafana Cloud + Sentry	Datadog + OpenTelemetry collector
Monthly infra cost	$40–$120	$300–$900	$2,500–$8,000

4. Frequently asked questions

Stateless or stateful bot — which one should I build?

Build a stateless bot worker on top of a stateful persistence layer (PostgreSQL, Redis). The webhook handler should be pure (validate, enqueue, ACK in <1s). Conversation state lives in the database keyed by tenant + user. This pattern lets you autoscale workers horizontally without sticky sessions and survives restarts without losing context.

Do I really need a queue between the webhook and my logic?

Yes, once you serve more than a handful of conversations per minute. Telegram, Discord and WhatsApp expect a 200 OK in under a few seconds or they retry — and a retry storm without idempotency duplicates messages. A queue (Redis Streams, RabbitMQ, SQS) decouples ingestion from processing, absorbs traffic spikes and lets you replay failed events safely.

What are the real rate limits per platform?

Telegram: 30 messages/sec global, 1 message/sec per chat, ~20 messages/min per group. Discord: 50 requests/sec per route, 5 messages/5s per channel. WhatsApp Business API: tier-based (1K → 100K conversations/day), 80 messages/sec per phone number, plus Meta's conversation-based pricing. Always implement exponential backoff and respect the Retry-After header.

How do I make webhooks idempotent?

Persist a hash of (platform, update_id or message_id, tenant_id) before processing. If the key already exists, ACK and skip. Use database transactions or Redis SETNX with TTL. Idempotency turns network retries from a duplicate-message bug into a no-op, which is essential when WhatsApp or Stripe retries the same webhook 3–5 times.

What does observability look like for a serious bot?

Structured logs with correlation IDs per conversation; metrics for queue depth, worker latency p50/p95/p99, error rate per platform, rate-limit hits, and human-handoff conversion; tracing across webhook → queue → worker → CRM call (OpenTelemetry). Alert on p95 latency, dead-letter queue size and failed deliveries — not on raw CPU.

Need a bot that scales past the demo?

We design and operate production bot infrastructure for Discord, Telegram and WhatsApp Business API — with queues, idempotency, CRM integrations and observability built in from day one. Tell us about your channel mix and traffic, and we will reply with a phased plan and fixed-price range within 24 hours.

Hire bot development Estimate cost (2 min)

Related resources

Discord vs Telegram vs WhatsApp: how to choose — channel selection before architecture.
Multi-tenant SaaS backend patterns — when your bot serves multiple client organisations.
Automate customer support without breaking your CRM — the integration layer behind the bot.
How to hire custom software development — budgets, contracts and red flags.

Last updated: June 2026 · Written by the RoviDev studio.