th3chris
AI Engineering

AI & Automation

AI where it creates measurable business value — not as an end in itself, not as pilot theatre.

You've been hearing about AI transformation for two years. You see impressive demos. And you suspect that most initiatives are stuck in pilot stage — without anyone saying why. What you're looking for: someone who doesn't treat AI as the next chatbot prototype, but as an integral part of your systems — with clear boundaries, measurable benefit, and the 'who carries accountability' question settled in the right places.

Why AI projects get stuck in pilot limbo

The typical pattern: a department experiments with ChatGPT, builds an impressive prototype, presents internally, everyone is excited. Three months later nothing is in production — because nobody clarified who is liable when the AI hallucinates, how token costs per request add up, what happens when the LLM provider swaps the model, and who writes eval tests against regressions.

The demo-to-production gap

An impressive prototype without an architecture plan lands in pilot limbo. Token costs, hallucination rate, model migration, eval — that's what decides production vs. demo, not the wow-demo.

Or, even more frequent: there are two parallel AI initiatives, one in marketing with a chatbot, one in IT with a RAG prototype. Both use different LLMs, different vector databases, different embedding models. In six months you have two island solutions, three API invoices and no coherent AI strategy.

This has a shared root cause: AI gets treated as a tool topic instead of an architecture topic. Demo speed gets rewarded, production readiness rarely honoured, eval pipelines almost never visibly sold. Multi-agent architectures sound modern — until you try to debug an agentic pipeline in which three LLMs have been talking to each other.

Multi-agent magic

Architectures with three chattering LLMs look futuristic — and aren't debuggable when results drift. Conservative before magic is architectural discipline here, not innovation brake.

My approach: think AI as an integrated layer in your systems, not as an isolated prototype. Modular, so LLM swaps don't trigger architectural breaks. Observable, so token costs per request, latency and hallucination rate are transparent. Conservative — RAG before fine-tuning, tool-use before multi-agent, deterministic wherever possible. And with clear human accountability where decisions have consequences.

My approach

Four principles I anchor AI architecture on — grown from 25 years of distributed systems and productive use of current AI architectures:

  1. 01Modular instead of monolithic. LLM, retrieval, tooling and eval layer as exchangeable building blocks. Models change every few months; an architecture that can't survive that becomes technical debt in 18 months. Built modular, the system stays load-bearing even when the model provider changes.
  2. 02Observable instead of black box. Tracing over every inference, token costs per request, latency profile, eval pipeline against regressions. Running AI in production without this layer means flying blind through every incident and noticing model drift only when customers call.
  3. 03Conservative before magic. RAG before fine-tuning, tool-use before multi-agent, deterministic before LLM wherever possible. A majority of 'AI problems' can be solved more cleanly classically — recognising that is part of consulting responsibility, not part of sales avoidance.
  4. 04Human in the accountability chain. AI takes the routine: research, pre-qualification, code reviews, documentation. Decisions with consequence — compliance approvals, financial commitments, medical assessments — stay with humans. Eval pipelines and guardrails are the technical implementation of that separation; they're not optional extras but a prerequisite for productive operation.

What you actually get

AI strategy & use-case assessment

Honest assessment of your AI initiatives: where does AI demonstrably bring value, where is classical software the clean answer, where does a RAG system pay off over a chatbot. You get a prioritised list with expected value, effort and risk per use case — no blanket 'AI does everything'.

RAG pipelines in production

Document ingestion, chunking strategies, hybrid search (vector + keyword), re-ranking. Proven productive with Dify and Weaviate on own infrastructure — this website is the live demo. Hands-on with pgvector and Qdrant for existing environments too.

MCP servers & tool integration

Multiple productive MCP servers built, including for Enterprise DataHub queries at Hoffmann Group. Claude, GPT-4 and local models connected to real tools in a controlled way — with boundaries, auth layer, audit trail.

AI agents in the workflow

Custom AI agents that take over development tasks autonomously and communicate with the team via the ticketing system — as a tool amplifier, not a replacement. Automated code reviews in GitLab, GitHub and Azure DevOps. The architect defines boundaries and quality gates.

Eval, cost control & guardrails

Eval pipelines against model drift, cost monitoring per request, rate limiting, input validation, prompt-injection protection. Fallback chains on model outage. All of this as an architectural layer, not bolted on after the first production incident.

From the trenches

Hoffmann Group — AI integration in Enterprise DataHubs

An MCP server made the DataHub backend directly accessible to AI tools for the first time. Developers now formulate queries in natural language instead of memorising GraphQL schemas. AI assistance in the playground noticeably accelerates onboarding of new use cases and consumers — without the data model or the existing GraphQL Federation knowing anything about it.

th3chris.com — RAG system as a live demo

This website isn't just a portfolio but is itself a productive RAG architecture: Dify as orchestration, Weaviate as vector database, custom knowledge base, GPT-4 for answer generation with streaming. Self-hosted on a Kubernetes cluster with GitOps. Anyone who wants to know how a productively operated RAG pipeline feels can chat down to the right.

AI agents in the dev workflow — human-machine integration

Custom AI agents take over development tasks autonomously and communicate with human team members via GitLab tickets. Additionally automated code reviews via GitLab, GitHub and Azure DevOps. Bridge technology between classical microservice landscapes and LLM-backed tools — architect defines tasks, AI executes, human reviews.

... outstanding mind with excellent skills in development; great software architect. Highly recommended if you need to find a professional fast and scalable solution. We've been working together on a project that was rated by Microsoft professionals as "not possible". Together with Christian our team managed to deliver a great working solution/product!"
Oscar AngressCyber Security Consultant, Bosch Engineering GmbH

Questions you might be asking

When is AI NOT the answer?

When determinism counts (accounting, compliance, billing) — rule-based systems beat LLMs in every audit. When the corpus fits in the context window — straight into the prompt is more deterministic and provable than RAG. When the output isn't checkable — LLM answers without an eval pipeline are a compliance risk, not a feature. When consequences are irreversible (medical, financial, safety-critical decisions) the human belongs in the loop.

Self-hosted or cloud LLM?

That's decided by the use case, not by vendor preference. Cloud LLMs (OpenAI, Anthropic) are often the faster answer and for many workloads the more economical one. Self-hosted (Llama, Mistral, own models) pays off with data sovereignty requirements, regulated industries or high volume with cost control. Often hybrid: sensitive paths self-hosted, the rest in the cloud.

How do you measure AI quality?

Eval pipelines with defined test sets per use case. Hallucination rate, factual consistency, token cost per request, latency profile. For RAG additionally: retrieval precision (are the right documents found?) and answer faithfulness (does the answer actually appear in the retrieved documents?). Without this layer 'AI quality' is a claim, not a measurement.

What does AI cost in production?

Token costs plus infrastructure (vector DB, compute, monitoring). At low volumes token costs dominate, at high volumes infrastructure does. Concretely: a productive RAG system with ~10k requests/month typically sits in the low three- to four-digit euro range per month — depending on model choice, chunk size and whether caching is used. Cost monitoring per request belongs in the architecture from the start, otherwise there are surprises.

How do you handle AI compliance?

Data flow diagrams per use case (which data leaves your system, where to, under which contract), logging of all inferences for audit trails, input validation against prompt injection, clear responsibility assignment for who reviews what. With GDPR-sensitive data often: self-hosting or cloud provider with DPA and EU region. AI compliance isn't a retrofitted patch but an architectural property.

Related areas of expertise

Sounds like your project?

A few targeted questions instead of a rigid form — I'll understand what it's about in 2–3 minutes and get back to you personally.