
AI Platform for Architecture & Engineering Talent
Bespoke
Next.js 15 platform where clients chat with an AI RFP partner while resumes stream through a two-stage parsing pipeline.
Resume Parsing
≤ 2 min
two-stage AI
Services Catalog
100+
architecture & engineering
Streaming Latency
< 500ms
fast mode
AI Models
4
Flash + Pro routing
Overview
Bespoke connects clients with vetted architecture and engineering talent. I built two signature AI systems: a streaming RFP copilot that switches between fast and smart Gemini models, and a resume pipeline that first extracts structured data then performs deep profiling to match services, set rates, and craft narratives.
Stack

Guided Landing Experience
Two clear entry paths—hire talent or apply as talent—plus a transparent roadmap build trust immediately.
typescript
Streaming AI Agent with Multi-Model Selection
Routes prompts to the right Gemini model and streams tokens back over SSE.
const fastMode = init || ["start", "services"].includes(step);
const miniModel = getFastModel();
const chatModel = fastMode ? miniModel : (env.GEMINI_MODEL || SMART_MODEL);
const stream = new ReadableStream<Uint8Array>({
start: async (controller) => {
const enc = new TextEncoder();
const write = (obj: any) =>
controller.enqueue(enc.encode(`data: ${JSON.stringify(obj)}\n\n`));
const baseReq = {
model: chatModel,
messages: [{ role: "system", content: system }, ...history],
stream: true,
temperature: fastMode ? 0.2 : 0.4,
max_tokens: fastMode ? 512 : 1000,
};
const completion = await openai.chat.completions.create(baseReq);
for await (const part of completion as any) {
const delta = part?.choices?.[0]?.delta;
if (delta?.content) {
assistantText += delta.content;
write({ type: "text", content: delta.content });
}
}
},
});- Fast mode detection for greetings/startups
- Dynamic temperature + token limits
- Automatic fallback to backup models
Challenges tackled
- Delivering real-time streaming UX while juggling context loading and error handling
- Preventing hallucinations by forcing strict JSON mode and defensive parsing
- Balancing cost and latency with a multi-model strategy and automatic fallbacks
- Designing an ergonomic services catalog that still supports fuzzy lookups and aliases
- Keeping database queries snappy across memory, knowledge, chat history, and profiles
Impact
Resume ingestion now takes minutes, not hours. Clients can talk through complex requirements like a real PM, while AI suggests services, talent matches, and pricing in one flow. The platform is live at bespoke-ae.com with Railway hosting behind Cloudflare, the GoDaddy-managed domain, and indexing handled via Google Search Console.