
Dual-Model Discord AI With Real Memory
ServerMate Discord AI
A Discord bot that decides when to be lightning-fast and when to go deep, while remembering every user it meets.
Servers Online
10+
multi-guild
Memories Stored
50k+
PostgreSQL
Image/Video Jobs
350+
Imagen 3.0 + Vertex
API Savings
90%
fast model routing
Overview
ServerMate feels alive in Discord servers. It inspects every incoming message, decides whether it needs deep reasoning or lightweight banter, and swaps between Gemini 2.0 Flash and Gemini 2.5 Pro automatically. Beyond chat, it generates images/videos, browses the web (respecting CAPTCHAs), and maintains per-user memory so conversations stay personal.
Stack

Web Browsing & Safety
ServerMate records what it sees when browsing, including the CAPTCHA that stopped this request—transparency first.
python
AI-Driven Model Selection
Routes casual talk to Gemini Flash and tough prompts to Pro with guardrails for image queries.
async def decide_model(message_meta: dict) -> bool:
if wants_image_search or message_meta.get("small_talk"):
return False # fast model
decision_model = get_fast_model()
prompt = f"User message: '{message.content}'\nDoes this need deep reasoning?"
decision = await decision_model.generate_content(prompt)
return "deep" in decision.text.lower()- Special-casing small talk and image searches
- Fast model used unless the classifier flags deep reasoning
- Logging choices for debugging cost/perf
Challenges tackled
- Optimizing API costs while keeping responses instantaneous
- Designing a normalized Postgres schema for memory, interactions, learned behaviours, and multimedia logs
- Handling multi-server concurrency with AsyncIO and rate-limit friendly queues
- Building consistent personality while still answering technical prompts accurately
- Blending multimedia generation and browsing without blocking Discord event loops
Impact
ServerMate runs across multiple Discord servers today, generating thousands of interactions. Users notice the personality, the recall, and the fact that it can swap from joking to debugging in one turn.