Back to portfolio
Visit siteLet's talk
Bespoke logo

AI Platform for Architecture & Engineering Talent

Bespoke

Next.js 15 platform where clients chat with an AI RFP partner while resumes stream through a two-stage parsing pipeline.

Live at bespoke-ae.com (Railway + Cloudflare + GoDaddy + Google Search Console)Streaming RFP assistant with multi-model routingResume automation with strict anti-hallucination guardrails

Resume Parsing

≤ 2 min

two-stage AI

Services Catalog

100+

architecture & engineering

Streaming Latency

< 500ms

fast mode

AI Models

4

Flash + Pro routing

Overview

Bespoke connects clients with vetted architecture and engineering talent. I built two signature AI systems: a streaming RFP copilot that switches between fast and smart Gemini models, and a resume pipeline that first extracts structured data then performs deep profiling to match services, set rates, and craft narratives.

Stack

Next.js 15React 19TypeScriptTailwind CSS v4PrismaPostgreSQLGoogle GeminiAWS S3SSEJWT
Live demo
Visual tour
Bespoke landing page

Guided Landing Experience

Two clear entry paths—hire talent or apply as talent—plus a transparent roadmap build trust immediately.

Code highlights

typescript

Streaming AI Agent with Multi-Model Selection

Routes prompts to the right Gemini model and streams tokens back over SSE.

const fastMode = init || ["start", "services"].includes(step);
const miniModel = getFastModel();
const chatModel = fastMode ? miniModel : (env.GEMINI_MODEL || SMART_MODEL);

const stream = new ReadableStream<Uint8Array>({
  start: async (controller) => {
    const enc = new TextEncoder();
    const write = (obj: any) =>
      controller.enqueue(enc.encode(`data: ${JSON.stringify(obj)}\n\n`));

    const baseReq = {
      model: chatModel,
      messages: [{ role: "system", content: system }, ...history],
      stream: true,
      temperature: fastMode ? 0.2 : 0.4,
      max_tokens: fastMode ? 512 : 1000,
    };

    const completion = await openai.chat.completions.create(baseReq);
    for await (const part of completion as any) {
      const delta = part?.choices?.[0]?.delta;
      if (delta?.content) {
        assistantText += delta.content;
        write({ type: "text", content: delta.content });
      }
    }
  },
});
  • Fast mode detection for greetings/startups
  • Dynamic temperature + token limits
  • Automatic fallback to backup models
Key outcomes
Streaming RFP assistant with Server-Sent Events and sub-500ms first-token latency in fast mode
Two-stage resume parsing: Gemini 2.0 Flash extracts structured data, Gemini 3 Pro performs deep analysis
Automatic model routing that chooses between Gemini 2.0 Flash and Gemini 3 Pro per query complexity
Fuzzy service catalog mapping that aligns 100+ engineering/architecture services to user inputs
Rate engine that blends region, seniority, and job history to output realistic hourly recommendations
Memory system with importance scoring, fast-mode context truncation, and fallback protections
S3-backed resume ingestion with presigned URLs and DOCX/PDF support

Challenges tackled

  • Delivering real-time streaming UX while juggling context loading and error handling
  • Preventing hallucinations by forcing strict JSON mode and defensive parsing
  • Balancing cost and latency with a multi-model strategy and automatic fallbacks
  • Designing an ergonomic services catalog that still supports fuzzy lookups and aliases
  • Keeping database queries snappy across memory, knowledge, chat history, and profiles

Impact

Resume ingestion now takes minutes, not hours. Clients can talk through complex requirements like a real PM, while AI suggests services, talent matches, and pricing in one flow. The platform is live at bespoke-ae.com with Railway hosting behind Cloudflare, the GoDaddy-managed domain, and indexing handled via Google Search Console.