94%
Response Accuracy
Verified by senior partners
1.8s
Avg Response Time
80%
Lawyer Adoption
32 of 40 lawyers
3.5h/week
Time Saved
Per lawyer, self-reported
Challenge
Critical knowledge locked inside 12,000+ documents across scattered systems. Off-the-shelf tools failed GDPR compliance and lacked access to internal firm knowledge. Lawyers could not verify AI answers against source documents.
Outcome
A GDPR-compliant AI chat agent with RAG, EU-hosted infrastructure, and source citations. 94% response accuracy, 80% lawyer adoption, 3.5 hours saved per lawyer per week. Architecture to production in 8 weeks.
A 40-lawyer firm in the DACH region had a problem most legal teams know well: critical knowledge locked inside documents nobody could search properly. Internal memos, case summaries, contract templates, regulatory updates. Scattered across shared drives and the heads of senior partners.
They wanted an internal AI assistant their lawyers could actually use. Not a generic chatbot. Something that understood their documents, answered in German, cited its sources, and kept client data inside the EU.
I built it for them. Frontend, backend, RAG pipeline, deployment. This is what the project looked like.
The starting point
The firm had tried two off-the-shelf legal AI tools before contacting me. Both failed for the same reasons.
Their compliance officer flagged that queries were being routed through US-based APIs. For a firm handling client-privileged information, that killed both tools immediately. GDPR plus the firm's own data processing agreements left zero room for interpretation.
The tools could also only summarize public legal databases. They had no access to the firm's internal knowledge, their own precedents, client intake templates, or internal policy documents. Lawyers tried them twice and stopped.
And without source citations, nobody could verify whether the AI was pulling from actual firm documents or generating plausible-sounding fiction. In legal work, "probably correct" does not cut it.
The brief was clear: build something that works with our documents, runs inside the EU, and shows exactly where every answer comes from.
What I built
The system has four layers. Each one had to work within the firm's security and compliance requirements.
Knowledge ingestion pipeline
The firm had roughly 12,000 documents across three sources: a document management system (DMS), a shared network drive, and an internal wiki. Formats included PDF, DOCX, and HTML.
I built an ingestion pipeline that extracts text from all three sources on a nightly schedule, chunks documents using a semantic strategy (not fixed-size splits), generates embeddings using a multilingual model hosted on EU infrastructure, and stores vectors in a PostgreSQL database with pgvector running on a German cloud provider.
The chunking strategy matters more than people think. Fixed 512-token chunks break legal clauses mid-sentence. I used a combination of heading detection, paragraph boundaries, and overlap windows to keep legal context intact. This alone improved retrieval accuracy by about 20% compared to the naive approach.
RAG retrieval and response generation
When a lawyer asks a question, the system converts the query into an embedding, retrieves the 8 most relevant document chunks via vector similarity search, re-ranks results using a cross-encoder model to filter out false positives, then passes the top 5 chunks plus the original question to the LLM. The LLM generates a response with inline citations pointing to specific documents and page numbers.
The LLM runs on EU-hosted infrastructure. No data leaves German data centers at any point. I used a self-hosted open-source model fine-tuned for German legal language, running on dedicated GPU instances.
Every response includes clickable source references. Lawyers can verify any claim by opening the original document at the exact paragraph the AI used. This was the first requirement the firm stated, and the last thing I tested before launch.
The frontend
I built the chat interface as a React application embedded into the firm's existing intranet portal. It needed to feel like a tool lawyers would actually open at 7 AM, not something they demo once and forget.
The agent remembers context within a session, so lawyers can ask follow-up questions without repeating themselves. A side panel shows the retrieved documents with highlighted passages. When retrieval confidence is low (few matching documents, low similarity scores), the interface says so explicitly rather than generating a confident-sounding guess. Lawyers can also mark answers as helpful or flag inaccuracies, which feeds back into the retrieval tuning.
The interface is responsive but optimized for desktop. Lawyers use it at their workstations, not on mobile.
Deployment and infrastructure
Everything runs on Hetzner Cloud in Germany with data residency guarantees. The Node.js backend sits on a dedicated VM. PostgreSQL with pgvector runs on a managed instance with daily encrypted backups. The self-hosted LLM runs on GPU instances, load-balanced for the 40-user concurrency target. A separate embedding service handles document processing and query embedding. Grafana dashboards track response times, retrieval quality metrics, and usage patterns. Authentication is integrated with the firm's existing Active Directory via SAML SSO.
No external API calls. No data leaving the infrastructure. The compliance officer reviewed the architecture before I wrote the first line of code.
Results after 3 months
The system went live after 8 weeks of development and 2 weeks of testing with a pilot group of 6 lawyers.
| Metric | Result |
|---|---|
| Response accuracy (verified by senior partners) | 94% |
| Average response time | 1.8 seconds |
| Weekly active users | 32 of 40 lawyers (80%) |
| Most common use case | Checking internal precedents before drafting |
| Time saved per lawyer per week | ~3.5 hours (self-reported) |
| Documents indexed | 12,000+ across 3 sources |
| Uptime (first 90 days) | 99.7% |
The 6% inaccuracy rate comes mostly from ambiguous queries where the system retrieves correct documents but the LLM misinterprets the question. The feedback loop catches these, and retrieval quality improves over time as lawyers provide more training signals.
The time savings are real but modest. 3.5 hours per week per lawyer. The firm calculates that at their average billable rate, this pays for the entire system within the first quarter. Junior lawyers also reported feeling more confident in their research because they could cross-check against the firm's own knowledge base rather than relying only on external databases.
A similar approach worked well for a RAG system we built for an insurance broker, where research time dropped by 75%.
What I would do differently
Two things I learned the hard way on this project.
I integrated all three document sources simultaneously during development. In hindsight, starting with just the DMS (the highest-quality source) and adding the others incrementally would have made testing faster and initial accuracy higher. Lesson: start with one clean source, prove it works, then expand.
I also underestimated how much time German prompt engineering would take. The LLM's default German outputs were grammatically correct but too casual for legal professionals. I spent about two additional weeks refining the system prompt and response formatting to match the tone lawyers expect. This should have been in the original timeline from the start.
Tech stack
| Layer | Technology |
|---|---|
| Frontend | React, TypeScript, Tailwind CSS |
| Backend | Node.js, Express, TypeScript |
| Database | PostgreSQL + pgvector |
| Embeddings | Multilingual E5-large (self-hosted) |
| LLM | Open-source model, EU-hosted GPU inference |
| Infrastructure | Hetzner Cloud (Germany) |
| Auth | SAML SSO via Active Directory |
| Monitoring | Grafana + Prometheus |
| CI/CD | GitHub Actions, Docker |
Who this is for
If you run a law firm or legal department in the DACH region and you are considering an internal AI assistant, here is what matters.
EU hosting is not optional. Client-privileged data cannot leave the EU. Any vendor telling you their US-hosted API is "GDPR-compliant" is asking you to take a risk with your clients' data.
RAG with your own documents is the only approach that works for internal knowledge. Generic legal AI tools are useful for public case law. They are useless for your own precedents and templates. The value is in connecting AI to the documents your firm actually works with every day.
Source citations cannot be an afterthought. Lawyers need to verify every answer. If the AI cannot show exactly which document and which paragraph it used, it is a liability, not a tool.
And you need someone who can build the whole thing. AI backend, frontend interface, deployment, security review. This is not a project you can split across five vendors and hope it integrates cleanly.
I build end-to-end AI systems for regulated industries in the DACH and Nordic regions. If this sounds like what your firm needs, book a call and we can talk specifics.

AI Agent & RAG Developer
AI Agent & RAG Developer with 10+ years of software engineering experience. Specialized in intelligent AI solutions for enterprises in the DACH & Nordic region.