AI Platform for Architecture & Engineering Talent

Bespoke

Next.js 15 platform where clients chat with an AI RFP partner while resumes stream through a two-stage parsing pipeline.

Live at bespoke-ae.com (Railway + Cloudflare + GoDaddy + Google Search Console)Streaming RFP assistant with multi-model routingResume automation with strict anti-hallucination guardrails

Resume Parsing

≤ 2 min

two-stage AI

Services Catalog

100+

architecture & engineering

Streaming Latency

< 500ms

fast mode

AI Models

Flash + Pro routing

Overview

Bespoke connects clients with vetted architecture and engineering talent. I built two signature AI systems: a streaming RFP copilot that switches between fast and smart Gemini models, and a resume pipeline that first extracts structured data then performs deep profiling to match services, set rates, and craft narratives.

Stack

Next.js 15React 19TypeScriptTailwind CSS v4PrismaPostgreSQLGoogle GeminiAWS S3SSEJWT

Live demo

Visual tour

Guided Landing Experience

Two clear entry paths—hire talent or apply as talent—plus a transparent roadmap build trust immediately.

Code highlights

typescript

Streaming AI Agent with Multi-Model Selection

Routes prompts to the right Gemini model and streams tokens back over SSE.

const fastMode = init || ["start", "services"].includes(step);
const miniModel = getFastModel();
const chatModel = fastMode ? miniModel : (env.GEMINI_MODEL || SMART_MODEL);

const stream = new ReadableStream<Uint8Array>({
  start: async (controller) => {
    const enc = new TextEncoder();
    const write = (obj: any) =>
      controller.enqueue(enc.encode(`data: ${JSON.stringify(obj)}\n\n`));

    const baseReq = {
      model: chatModel,
      messages: [{ role: "system", content: system }, ...history],
      stream: true,
      temperature: fastMode ? 0.2 : 0.4,
      max_tokens: fastMode ? 512 : 1000,
    };

    const completion = await openai.chat.completions.create(baseReq);
    for await (const part of completion as any) {
      const delta = part?.choices?.[0]?.delta;
      if (delta?.content) {
        assistantText += delta.content;
        write({ type: "text", content: delta.content });
      }
    }
  },
});

Fast mode detection for greetings/startups
Dynamic temperature + token limits
Automatic fallback to backup models

Key outcomes

Streaming RFP assistant with Server-Sent Events and sub-500ms first-token latency in fast mode

Two-stage resume parsing: Gemini 2.0 Flash extracts structured data, Gemini 3 Pro performs deep analysis

Automatic model routing that chooses between Gemini 2.0 Flash and Gemini 3 Pro per query complexity

Fuzzy service catalog mapping that aligns 100+ engineering/architecture services to user inputs

Rate engine that blends region, seniority, and job history to output realistic hourly recommendations

Memory system with importance scoring, fast-mode context truncation, and fallback protections

S3-backed resume ingestion with presigned URLs and DOCX/PDF support

Challenges tackled

Delivering real-time streaming UX while juggling context loading and error handling
Preventing hallucinations by forcing strict JSON mode and defensive parsing
Balancing cost and latency with a multi-model strategy and automatic fallbacks
Designing an ergonomic services catalog that still supports fuzzy lookups and aliases
Keeping database queries snappy across memory, knowledge, chat history, and profiles

Impact

Resume ingestion now takes minutes, not hours. Clients can talk through complex requirements like a real PM, while AI suggests services, talent matches, and pricing in one flow. The platform is live at bespoke-ae.com with Railway hosting behind Cloudflare, the GoDaddy-managed domain, and indexing handled via Google Search Console.