Dual-Model Discord AI With Real Memory

ServerMate Discord AI

A Discord bot that decides when to be lightning-fast and when to go deep, while remembering every user it meets.

Dual-model intelligence keeps costs lowMemory + personality tuned per serverImagen, video, browsing, and research in one bot

Servers Online

10+

multi-guild

Memories Stored

50k+

PostgreSQL

Image/Video Jobs

350+

Imagen 3.0 + Vertex

API Savings

90%

fast model routing

Overview

ServerMate feels alive in Discord servers. It inspects every incoming message, decides whether it needs deep reasoning or lightweight banter, and swaps between Gemini 2.0 Flash and Gemini 2.5 Pro automatically. Beyond chat, it generates images/videos, browses the web (respecting CAPTCHAs), and maintains per-user memory so conversations stay personal.

Stack

PythonDiscord.pyGoogle GeminiGemini 2.0 FlashGemini 2.5 ProImagen 3.0Vertex AIPostgreSQLSerper APIAsyncIO

Visual tour

Web Browsing & Safety

ServerMate records what it sees when browsing, including the CAPTCHA that stopped this request—transparency first.

Code highlights

python

AI-Driven Model Selection

Routes casual talk to Gemini Flash and tough prompts to Pro with guardrails for image queries.

async def decide_model(message_meta: dict) -> bool:
    if wants_image_search or message_meta.get("small_talk"):
        return False  # fast model

    decision_model = get_fast_model()
    prompt = f"User message: '{message.content}'\nDoes this need deep reasoning?"
    decision = await decision_model.generate_content(prompt)
    return "deep" in decision.text.lower()

Special-casing small talk and image searches
Fast model used unless the classifier flags deep reasoning
Logging choices for debugging cost/perf

Key outcomes

AI-driven model selection that routes casual chat to the fast model and complex prompts to the smart model

Long-term memory per user and per server with transparency commands (!memory and !forget)

Imagen 3.0-powered image generation plus video generation flows

Vision analysis so users can drop screenshots, diagrams, or equations

Web browsing with guardrails that explain CAPTCHAs instead of bypassing them

Serper-powered search for real-time research

Command suite covering stats, reminders, and creative tooling

Challenges tackled

Optimizing API costs while keeping responses instantaneous
Designing a normalized Postgres schema for memory, interactions, learned behaviours, and multimedia logs
Handling multi-server concurrency with AsyncIO and rate-limit friendly queues
Building consistent personality while still answering technical prompts accurately
Blending multimedia generation and browsing without blocking Discord event loops

Impact

ServerMate runs across multiple Discord servers today, generating thousands of interactions. Users notice the personality, the recall, and the fact that it can swap from joking to debugging in one turn.