CodBi / AI
Setup Guide & Configuration Reference
LLAMA · Whisper · Tesseract
1 GDPR, EU AI Act & Data Sovereignty
The CodBi is designed from the ground up for use in privacy-sensitive environments. All three AI engines — LLAMA, Whisper, and Tesseract — process data exclusively locally on your own Formcycle server. Not a single byte of sensitive data leaves your infrastructure.
In contrast to cloud AI services (ChatGPT, Google Gemini, Microsoft Copilot, etc.), CodBi eliminates almost all data protection hurdles that have previously complicated the use of AI in public administration.
1.1 GDPR — General Data Protection Regulation
Art. 5 — Data Minimization & Purpose Limitation
Personal data is not trained into the AI models, not transmitted to third parties, and not persisted. Processing takes place volatilely (in-memory) within the scope of the specific request. After the response, the input data is discarded — there is no training, no logging of personal content.
Art. 25 — Privacy by Design & Default
The CodBi is privacy-compliant out of the box: Local processing as the standard, no cloud connection without explicit configuration, no telemetry, no external tracking services. External APIs (OpenAI, Claude, etc.) must be configured deliberately and individually — they are disabled by default.
Art. 13/14 — Information Obligations
With cloud AI services, data subjects must be extensively informed in accordance with Art. 13 GDPR about the transfer of their data to external AI providers, the risk of third-country transfers (e.g., to the USA), and unclear storage periods. With CodBi, all these complex third-party and cloud clauses are completely eliminated. Since there is no outward data transmission, the information obligation is reduced to an absolute minimum: The privacy policy merely needs to transparently state that a purely locally hosted AI assistant is used for processing (e.g., text recognition) – without data leakage and without third-country risks.
Art. 28 — Data Processing Agreement (DPA)
Cloud AI providers are considered data processors under Art. 28 GDPR. Authorities must conclude, review, and maintain complex Data Processing Agreements (DPAs). With CodBi, no DPA with an AI provider is required, as all processing takes place on your own server. This significantly simplifies procurement and contracting.
Art. 44–49 — Third-Country Transfer
Cloud AI providers like OpenAI (USA) or Google (USA) process data in third countries outside the EU. This requires adequacy decisions, Standard Contractual Clauses (SCCs), or Binding Corporate Rules — with the fallback risk of judicial invalidations (cf. Schrems I & II). CodBi processes everything within your own infrastructure — no third-country transfer, no dependence on international agreements.
⚖️ Art. 35 — Data Protection Impact Assessment (DPIA)
Cloud AI services regularly require an extensive DPIA evaluating the risks of third-country transfers, potential model training with authority data, and unclear deletion periods. CodBi's purely local architecture significantly reduces the scope of a DPIA: no third-country risk, no training, deterministic deletion (request data is discarded after the response).
1.2 EU AI Act (Regulation (EU) 2024/1689)
The EU AI Act has been in force since August 2024 and will be gradually implemented until 2027. It classifies AI systems according to risk levels and establishes transparency and documentation obligations. CodBi addresses the relevant requirements as follows:
Risk Classification (Art. 6)
CodBi use cases (form assistance, document analysis, OCR, speech recognition) typically fall into the "limited risk" category (not "high risk"). There is no obligation for CE marking, conformity assessment, or registration in the EU database. Exception: If CodBi is used for automated decision-making regarding citizen applications, a high-risk classification could apply — check this on a case-by-case basis.
✨ Transparency Obligation (Art. 50)
The EU AI Act requires that users can recognize when they are interacting with AI-generated content. CodBi implements this automatically: All AI responses are marked with ✨ AI-Generated. The notice can be configured via data-cb-AIHint. Important: If the labeling is disabled, CodBi outputs a compliance warning in the server log.
Documentation Obligation (Art. 11–13)
CodBi uses exclusively open-source models with known architectures and documented training data (Qwen3 from Alibaba Cloud, Whisper from OpenAI, Tesseract from Google). The utilized model versions, release tags, and download URLs are documented and reproducible in the plugin properties. Model cards can be viewed on HuggingFace.
Human Oversight (Art. 14)
CodBi is designed as an assistance system, not as an autonomous decision-maker. AI answers are presented to the user for review — automatic insertion into form fields is optional and can be secured via data-cb-Mode="verify" with a manual override checkbox. The human retains decision-making authority.
1.3 Comparison: Local AI (CodBi) vs. Cloud AI
| Criterion |
Cloud AI (ChatGPT, Gemini etc.) |
Local (CodBi) |
| Data Processing |
In the provider's data centers (mostly USA) |
On your own server |
| Third-Country Transfer |
Yes — SCCs, adequacy decisions required |
No — no transfer |
| DPA required |
Yes — with the AI provider |
No |
| Training with your data |
Possible (Opt-out needed, implementation unclear) |
Impossible — Model is read-only |
| Privacy Policy |
Must be expanded with AI cloud clauses |
Minimal basic notice (local assistant) |
| DPIA Scope |
Extensive (Third country, training, deletion) |
Minimal |
| AI Labeling |
Must be implemented independently |
Automatic (✨ AI-Generated) |
| Availability |
Dependent on provider, mandatory internet |
Offline-capable after initial setup |
| Costs |
Ongoing API costs (per token/minute) |
One-time (hardware), no subscription |
| Audit Logging |
At the provider, limited visibility |
Fully under your own control |
1.4 Comparison with Alternative AI Solutions
There are various approaches to operating AI in a privacy-compliant manner. The following table shows how CodBi positions itself against these other solutions.
| Criterion |
Aleph Alpha
PhariaAI · DE |
Mistral AI
Le Chat · FR |
StackIT AI
Schwarz IT · DE |
Ollama / llama.cpp
Self-Hosted |
CodBi
Formcycle Plugin |
| Operating Model |
Sovereign Cloud (Delos/T-Systems) or On-Prem |
Cloud (EU Data Centers) or Self-Hosted |
German Cloud (Schwarz Data Centers) |
Local (own server) |
Local within Formcycle Server |
| Data leaves server? |
Yes (to sovereign cloud) |
Cloud: Yes · Self-Hosted: No |
Yes (to German cloud) |
No |
No |
| DPA required? |
Yes (with cloud operator) |
Cloud: Yes · Self-Hosted: No |
Yes (with Schwarz IT) |
No |
No |
| Setup Effort |
High (Enterprise sales, contract negotiation, pilot project) |
Self-Hosted: High (own GPU infrastructure, containers, configuration) |
Medium (Cloud onboarding, API integration) |
Medium (manual: binary, model, configuration, API connection) |
Minimal — Set 1 property, not even a server restart required |
| LLM / Chat |
✅ Luminous / PhariaAI |
✅ Mixtral, Mistral Large |
✅ Various models |
✅ Arbitrary GGUF models |
✅ LLAMA (Qwen3-VL, arbitrary GGUF models) |
| Speech-to-Text |
❌ |
❌ |
❌ |
⚠️ Set up separately (whisper.cpp) |
✅ Whisper integrated |
| OCR |
❌ |
❌ |
❌ |
❌ Not included |
✅ Tesseract integrated |
| Form Integration |
❌ API only |
❌ API only |
❌ API only |
❌ API / CLI only |
✅ Native (data-cb-* attributes, QA, Verify) |
| EU AI Act Labeling |
⚠️ User responsibility |
⚠️ User responsibility |
⚠️ User responsibility |
❌ Not available |
✅ Automatic (✨ AI-Generated) |
| Costs |
Enterprise license (6-figures+/year) |
API: pay-as-you-go · Self-Hosted: GPU hardware |
Pay-as-you-go (API costs) |
Free (Open Source), hardware only |
Free (Open Source), hardware only |
| Target Audience |
Large authorities, federal administration, enterprises |
Developers, companies with GPU infrastructure |
Companies (Schwarz ecosystem) |
Developers, tech-savvy admins |
Any organization with Formcycle |
CodBi - Features: LLM, Speech-to-Text, and OCR as an integrated solution with native form integration (data-cb-* attributes, automatic QA workflows, Verify mode), Zero-Config Deployment, offline capability, and built-in EU AI Act compliance.
1.5 GDPR Benefits per AI Module
LLAMA — Text Processing & Chat
Citizen messages, application details, uploaded ID documents, and official notices are processed exclusively in the server's RAM. The GGUF model is read-only — feeding data back into the model weights is technically impossible. PII filters protect against unintentional disclosure of personal data when web search (Brave) is enabled.
Whisper — Speech Recognition
With the Browser Speech API (Chrome, Edge), audio data is sent to Google or Apple — a GDPR consent under Art. 13 is mandatory. CodBi Whisper, however, processes audio locally on the server: The WAV file is sent via HTTP to the local Whisper server (127.0.0.1:8393), transcribed, and immediately discarded. No cloud, no storage.
Tesseract — Document OCR
ID documents, medical certificates, proof of income — sensitive citizen documents are processed in-process via JNI within the Formcycle server. No external network traffic, no temporary files on third-party systems. The OCR results are transferred directly into the form fields.
1.6 Audit Logging & Traceability
For compliance purposes, CodBi — especially in the AI Proxy — logs accesses anonymously:
-
- Username: SHA-256 Hash (irreversible)
-
- Client IP: Only the first two octets (e.g.,
192.168.*.*)
-
- Timestamp and Endpoint
-
- Persistence: JPA/Liquibase → Table
codbi_ai_proxy
IP detection follows the chain: X-Forwarded-For → X-Real-IP → Remote-Addr (Reverse-Proxy compatible).
⚠️ Note — External APIs: If you run CodBi in Hybrid Mode with external APIs (OpenAI, Claude, etc.), the privacy policies of the respective provider apply to these connections. In this case, a separate GDPR assessment (possibly including DPA and DPIA) is required. CodBi clearly marks in the log which requests were routed externally.
✅ Recommendation: For maximum data sovereignty, use CodBi exclusively with local models. Hybrid mode (Section 8) is useful when specific use cases require higher model quality — e.g., a Specialist for non-sensitive, public domain inquiries.
2 Overview
Zero-Config Deployment: Binaries, models, and native libraries are automatically fetched on the first start. The only requirements: Java 11+, one-time network access to GitHub / HuggingFace, and sufficient RAM and disk space.
Firewall Whitelist: The following domains must be reachable during the first start: github.com, objects.githubusercontent.com, api.github.com, huggingface.co, raw.githubusercontent.com, repo1.maven.org (only for Tesseract DLLs).
3 System Architecture
LLAMA Server
Separate OS Process · Port 8392
Whisper Server
Separate OS Process · Port 8393
Tesseract
In-Process (JNI)
⬆ OpenAI-compatible REST API ⬆ ⬆ JNI ⬆
⚙️ CodBi Plugin · Kotlin
Formcycle JVM (Tomcat)
⬆ HTTP / WebSocket ⬆
AI Proxy
IP Whitelist + Basic Auth
External APIs
OpenAI, Claude, etc.
LLAMA and Whisper run as separate operating system processes — a crash of the AI engine does not affect the Tomcat server. Communication occurs via OpenAI-compatible REST endpoints (/v1/chat/completions, /v1/audio/transcriptions). Tesseract runs in-process via JNI.
File Storage Location: All downloaded files (binaries, models, native libraries) are stored under <pluginFolder>/ai/. For LLAMA and Whisper, in subfolders bin/ (server binary) and models/ (model files) respectively. Tesseract stores its data under <pluginFolder>/Resources/AI/Tesseract/.
4 Step-by-Step: Activation
Activation is done via the plugin property Active_AI in the Formcycle administration interface under System → Plugins → CodBi → Properties. The property accepts a space-separated list of the desired modules:
| Token |
What is activated? |
What is downloaded? |
llama_engine |
LLAMA Server Infrastructure (llama-server binary) |
llama-server binary (~80 MB, platform-specific) + possibly CUDA runtime DLLs |
llama_std |
Standard LLAMA Model + Vision Projector |
Qwen3-VL-2B-Instruct Q4_K_M (~1.1 GB) + mmproj (~819 MB) |
whisper |
Whisper Server Infrastructure + Model |
whisper-server binary + ggml-small (~466 MB) + ffmpeg |
ocr |
Tesseract OCR (JNI) |
Tess4J DLLs + Language Models (deu.traineddata, osd.traineddata) |
Example: Activating All Modules
Active_AI = llama_engine llama_std whisper ocr
After setting the properties and clicking "Save", the system initialization is automatically triggered. The required AI models and binaries are then downloaded asynchronously in the background.
A server restart is not necessary. Once the download is complete, the system is seamlessly available. The readiness of the models can be verified at any time, as a notice about the models currently loading or the names of the models available for chat can be found in the top line of the chat window (AI.LLAMA.Chat).
After saving the plugin with the appropriate "Active_AI" settings, messages like the following will appear in the server log:
[CodBi / AI / LLAMA] Starting LLAMA-Server: [CodBi / AI / LLAMA] Binary: .../ai/llama_engine/bin/llama-server.exe [CodBi / AI / LLAMA] Model: .../ai/llama_engine/models/Qwen3VL-2B-Instruct-Q4_K_M.gguf (1100 MB) [CodBi / AI / LLAMA] mmproj: .../ai/llama_engine/models/mmproj-Qwen3VL-2B-Instruct-F16.gguf [CodBi / AI / LLAMA] Port: 8392 · Threads: 8 · Context: 32768 tokens [CodBi / AI / LLAMA] GPU layers: 999 (detected: CUDA) [CodBi / AI / LLAMA] LLAMA-Server is healthy and ready on port 8392 [CodBi / AI / Whisper] Whisper infrastructure initialized [CodBi / AI / Whisper] Port: 8393 · Release: v1.7.6 · GPU: enabled (auto)
Download Resume: If a download is interrupted (network error, server restart), it will automatically resume on the next start — it does not have to start from 0% again.
Removing Modules
To delete downloaded files, set the property AI_Remove with the same token values (e.g., AI_Remove = llama_engine, whisper), Save, and restart the server. The files will be deleted during startup. Afterward, the property can be removed again. Warning: Tesseract DLLs are locked in the JVM process — to remove them, the plugin must first be disabled and the server restarted.
5 AI Modules in Detail
5.1 LLAMA — Large Language Model
CodBi operates a local Large Language Model via llama.cpp. The standard model Qwen3-VL-2B-Instruct is multimodal — it understands both text as well as images and PDFs. On the first start, the llama-server binary (from GitHub Releases) and the GGUF model file (from HuggingFace) are automatically downloaded. If AI.LLAMA.Chat is to be used by the end user in the German language, it is recommended to use at least a Qwen3 with 8 billion parameters (8B) or a model from another manufacturer with a correspondingly high parameter count.
What happens on startup?
-
- Platform Detection: OS (
os.name) and architecture (os.arch) are detected.
-
- GPU Detection: CUDA 12 → Vulkan → CPU Fallback. On Apple Silicon, Metal is used. The appropriate binary variant is chosen automatically.
-
- Download: Server binary + GGUF model + Vision Projector (mmproj) are saved in
<pluginFolder>/ai/llama_engine/.
-
- Server Start: An independent OS process is started via
ProcessBuilder.
-
- Health Check: CodBi waits until the server responds on
http://127.0.0.1:8392.
Using Your Own Model
You can use any GGUF-compatible model from HuggingFace — Qwen, Mistral, LLaMA 3, Phi-3, Gemma, etc. To do this, set the following plugin properties:
AI_LLAMA_STD_ModelUrl = https://huggingface.co/Your/Model.gguf AI_LLAMA_STD_MmprojUrl = https://huggingface.co/Your/Model-mmproj.gguf
Thinking Mode (Chain-of-Thought)
The Thinking Mode starts a separate LLAMA server on port Main Port + 100 (Default: 8492) with a double context window. If no separate Thinking Model is configured, the standard model is used in Hybrid Mode.
# Optional: Separate, more powerful model for reasoning AI_LLAMA_STD_ThinkingModelUrl = https://huggingface.co/.../thinking-model.gguf AI_LLAMA_STD_ThinkingMmprojUrl = https://huggingface.co/.../thinking-mmproj.gguf
In the chat frontend, the thought process is first displayed live and then in an expandable bubble. The end user only sees the final answer, but can view the thought process if interested.
Specialist Models
You can register any number of additional models — each gets its own LLAMA server process starting on port Main Port + 200 and up. Routing to the correct model is done in the frontend via the data-cb-Specialist parameter.
# Local Specialist "Extractor" (own GGUF model) AI_LLAMA_STD_SPECIALIST_Extractor = https://huggingface.co/.../extractor.gguf AI_LLAMA_STD_SPECIALIST_MMProj_Extractor = https://huggingface.co/.../extractor-mmproj.gguf # External Specialist "GPT4" (OpenAI API) AI_LLAMA_STD_EXT_SPECIALIST_GPT4 = https://api.openai.com AI_LLAMA_STD_EXT_SPECIALIST_Key_GPT4 = sk-... AI_LLAMA_STD_EXT_SPECIALIST_Model_GPT4 = gpt-4o
In the frontend element: data-cb-Specialist="Extractor" or data-cb-Specialist="GPT4". The name is case-insensitive.
LLAMA Plugin Properties (Reference)
| Property |
Default |
Description |
| AI_LLAMA_ENGINE_Port |
8392 |
Port for the local LLAMA server |
| AI_LLAMA_ENGINE_Threads |
Auto (phys. cores) |
Number of CPU threads for inference |
| AI_LLAMA_ENGINE_CtxSize |
32768 |
Context window in tokens |
| AI_LLAMA_ENGINE_GpuLayers |
-1 (auto) |
GPU offload layers. -1 = auto, 0 = CPU only |
| AI_LLAMA_ENGINE_Release |
b8175 |
llama.cpp release tag (GitHub) |
| AI_LLAMA_ENGINE_ServerArgs |
— |
Additional CLI arguments for llama-server (e.g., --mlock to lock models in RAM) |
| AI_LLAMA_ENGINE_MaxConcurrent |
2 |
Max. concurrent inference requests (shared semaphore) |
| AI_LLAMA_STD_ModelUrl |
Qwen3-VL-2B Q4_K_M |
Download URL for the GGUF model |
| AI_LLAMA_STD_MmprojUrl |
Qwen3-VL-2B mmproj |
Vision Projector for multimodal input (images/PDFs) |
| AI_LLAMA_STD_MaxTokens |
2048 |
Max. tokens per response |
| AI_LLAMA_STD_MaxPixels |
3211264 |
Max. pixel budget for images (~1792 × 1792) |
| AI_LLAMA_STD_MaxUploadBytes |
52428800 (50 MB) |
Max. upload size for images/PDFs |
| AI_LLAMA_STD_Language |
Auto (Geo/Browser) |
Force language (ISO 639-1, e.g., en), otherwise the model answers in the language of the preceding prompt |
| AI_LLAMA_STD_UpdateCheckHours |
24 |
Hours between release checks (0 = disabled) |
| AI_LLAMA_STD_ExternalUrl |
— |
External OpenAI-compatible API URL (→ Section 8) |
5.2 Whisper — Speech-to-Text
Whisper is a local Speech-to-Text engine based on whisper.cpp. The standard model ggml-small (~466 MB) recognizes 99 languages. All audio processing is done locally — GDPR-compliant with no cloud transmission.
Workflow in the browser: The user presses the microphone button (or the hotkey Alt+A) → the browser records audio and converts it to WAV or WebM/Opus (depending on whether ffmpeg is available on the server) → the WAV/WebM/Opus file is sent to plugin?name=CodBi_AI_Whisper as a Base64 data URL in a FormData field → the transcribed text appears in the input field.
GDPR Comparison: Browser API vs. Whisper
| Feature |
Browser Speech API |
CodBi Whisper |
| Processing |
Cloud (Google/Apple) |
Local on the server |
| Data leaves server? |
Yes |
No |
| GDPR Consent required? |
Yes (Art. 13) |
No (only for processing, not for transmission to third parties) |
| Browser Support |
The Web Speech API is natively supported primarily by Chromium-based browsers (Chrome, Edge) as well as Safari, but has highly fragmented compatibility across the rest of the browser landscape – such as Firefox. |
All modern browsers |
Whisper Plugin Properties (Reference)
| Property |
Default |
Description |
| AI_Whisper_Port |
8393 |
Port for the local Whisper server |
| AI_Whisper_ModelUrl |
ggml-small |
GGML model URL (Alternatives: ggml-base, ggml-medium, ggml-large-v3-turbo-q5_0) |
| AI_Whisper_Release |
v1.7.6 |
whisper.cpp release tag |
| AI_Whisper_AutoDetectLanguage |
false |
true = Whisper detects language automatically |
| AI_Whisper_NoGpu |
false |
true = forces CPU-only mode |
| AI_Whisper_ExternalUrl |
— |
External STT API URL — local server is not started (→ Section 8) |
5.3 Tesseract — OCR
Tesseract is an OCR engine (Tesseract 4.x via Tess4J/JNI) that runs directly in the JVM process — no separate server. The native DLLs and language models are automatically downloaded on the first start.
Configuring Languages
By default, only German (deu) is loaded. For multiple languages, use the + separator with three-letter ISO 639-3 codes:
AI_Tesseract_Languages = deu+eng+ita
Each language model requires approx. 100 MB of RAM. Recommended pool size: CPU cores / 2. Example: 8-core server → AI_Tesseract_PoolSize = 4.
Warning — Windows-only: Tesseract is currently only available on Windows (win32-x86-64). Additionally, the DLLs are locked in the JVM process — to remove them, the plugin must be disabled and the server restarted.
To ensure the best possible inference quality, CodBi performs automatic preprocessing when the OCR module is activated: Images are first analyzed and correctly aligned using Tesseract. This is particularly essential for smartphone photos, as device tilts often lead to unnoticed rotated image files, which could negatively impact the analysis results without prior correction. It is therefore recommended to also activate OCR along with LLAMA_engine & LLAMA_STD.
Tesseract Plugin Properties (Reference)
| Property |
Default |
Description |
| AI_Tesseract_Languages |
deu |
Languages (3-letter codes, separated by +) |
| AI_Tesseract_PoolSize |
2 |
Number of parallel Tesseract instances |
6 Setting Up Frontend Functionalities
CodBi functionalities are configured via CSS classes and data-cb-* attributes on HTML elements in the Formcycle form. The central attribute data-cb-func determines which functionality is applied to an element. All other parameters are specified as data-cb-<ParameterName>.
6.1 AI.LLAMA.Chat — AI Chat
Creates a full multi-turn chat with the local LLM. Supports image/PDF uploads, voice input (Whisper), Thinking Mode, web search, email sending, and specialist routing.
All AI functions support the processing of JPG and PNG image formats as well as PDF documents. For PDFs, the system is capable of fully analyzing both native, text-based documents as well as image-based PDFs (e.g., scans or embedded photos).
Target Element: textarea (becomes the chat display, read-only)
data-cb-func: AI.LLAMA.Chat
Required Auxiliary Elements (via CSS class in the same container):
| CSS Class |
Element |
Function |
| AI_LLAMA_CHAT_Input |
input[text] / textarea |
Input field for user messages |
| AI_LLAMA_CHAT_Send |
button |
Send button |
| AI_LLAMA_CHAT_Stop |
button |
Cancel button |
| AI_LLAMA_CHAT_Upload |
input[file] |
File upload (images, PDFs) — optional |
| AI_LLAMA_CHAT_Thinking |
input[checkbox] |
Thinking Mode on/off — optional |
| AI_LLAMA_CHAT_Internet |
input[checkbox] |
Brave web search on/off — optional |
| AI_LLAMA_CHAT_Location |
input[checkbox] |
Geolocation on/off — optional |
| AI_LLAMA_CHAT_MailForward |
input[checkbox] |
Email forwarding on/off — optional (the model can also be asked in the prompt to send an email, e.g., send me the answer as an email to... (AI_Mail_AllowedRecipients is also respected here)) |
| AI_LLAMA_CHAT_MailAddress |
input[email] |
Target email address — optional |
Important data-cb Parameters:
| Parameter |
Default |
Description |
| data-cb-Specialist |
— |
Specialist model (name as defined in the property) |
| data-cb-ResponseLanguage |
Auto |
Response language (ISO 639-1, e.g., en) |
| data-cb-WelcomeText |
— |
Welcome text in the chat |
| data-cb-LLAMABubble |
#e5e5ea |
Color of the AI speech bubble (Hex) |
| data-cb-UserBubble |
#0b93f6 |
Color of the user speech bubble (Hex) |
| data-cb-VoiceHotkey |
Alt+A |
Keyboard shortcut for voice recording |
| data-cb-MaxPages |
5 |
Max. PDF pages per upload |
| data-cb-ShowUncertainTokens |
true |
Highlight uncertain tokens in color |
| data-cb-QueueBadge |
false |
Display wait position in the queue |
6.2 AI.Llama.Standard.QA — Image/PDF Question-Answering
Automatically analyzes uploaded images or PDFs and answers predefined questions. Triggers on file upload — the end user does not need to perform any action.
Target Element: input[type="file"]
data-cb-func: AI.LLAMA.STANDARD.QA
Question Elements: Input fields with the CSS class AI_LLAMA_STANDARD_QA_Question in the same container are recognized as target fields. Each needs an id attribute and a data-cb-Question attribute containing the question text. The question text supports placeholders like <[FieldName]> for token resolution.
Verify Mode: With data-cb-Mode="verify", the AI only checks whether the document fulfills a specific condition (Yes/No). If No, an error message with a manual override checkbox is displayed.
To ensure accurate classification, the prompt must be optimized for a binary answer (Example: 'Is this document a purchase agreement for a vehicle? Answer exclusively with Yes or No, without a trailing period.').
If the attribute is then correspondingly set to data-cb-PositiveResponse="Yes", only vehicle purchase agreements are considered valid for the upload field. If an upload does not meet the defined criteria – be it because it is not a purchase agreement or no vehicle is being sold – the AI validation fails. In this case, the system outputs an error message directly at the upload field and displays a checkbox for manual override by the user. The recognition quality is naturally largely determined by the parameter count of the model.
EU AI Act: All AI-generated answers are marked with ✨ AI-Generated. The notice text can be customized via data-cb-AIHint.
6.3 AI.LLAMA.Standard.TXTQA — Text Question-Answering
Text-based question-answering — triggers delayed upon field change (Default: 5 seconds debounce). The AI reads the content of marked source fields and answers questions.
Target Element: input[type="text"] or textarea
data-cb-func: AI.LLAMA.Standard.TXTQA
Source Fields: Elements with CSS class AI_LLAMA_TXTQA_Source. Question Elements: CSS class AI_LLAMA_STANDARD_TXTQA_Question + data-cb-Question (here you can use the notation <[ QuerySelector ]> to insert multiple placeholders to form the question). Further parameters: data-cb-inferencedelay (Default: 5 s), data-cb-debounce (Default: 500 ms), data-cb-Thinking, data-cb-useinternet.
6.4 AI.OCR — Text Recognition
Tesseract-based OCR with various modes:
Target Element: input[type="file"]
data-cb-func: AI.OCR
-
data-cb-Mode="Print" — Returns the complete recognized text
-
data-cb-Mode="Verify" — Checks the text against a regex pattern (data-cb-Pattern)
-
data-cb-Mode="Extract Fields" — Extracts values via regex into target fields
Receivers: Elements with CSS class AI_OCR_Receiver and data-cb-Field attribute receive the extracted text. Supports PDFs (text extraction for digital PDFs, rendering for scanned ones), automatic orientation detection (OSD), and optional preprocessing (data-cb-Preprocess="true").
6.5 Media.Input.Speech.Whisper — Voice Input
Adds a microphone button to text fields. Can be activated either explicitly or automatically:
-
- Explicit:
data-cb-func="MEDIA.INPUT.SPEECH.WHISPER" on an input[text] or textarea
-
- Automatic (Default configuration): All
input[text] and textarea elements automatically receive a microphone button — except elements with the CSS classes CodBi_XCL or CodBi_XCL_Speech_Whisper
Parameters: data-cb-VoiceHotkey (Default: Alt+A), data-cb-Language (2-letter code), data-cb-ShowHint (hotkey hint). A health check automatically verifies whether the Whisper server is reachable — if unreachable, the button is not displayed and periodic retries are made to contact the server.
7 Setting Up Advanced Features
7.1 Brave Web Search
Enables the LLM to retrieve current information from the internet.
Setup:
-
- Register for an API key at
https://api.search.brave.com
-
- Set plugin property:
AI_BraveSearch_ApiKey = BSA-xxxxxxxxxxxx
-
- Whitelist domain
api.search.brave.com in the firewall
Workflow: The LLM automatically recognizes when current information is needed and triggers a search. Search queries are PII-filtered (no personal data). Max. 5 results
(adjustable with the plugin property AI_BraveSearch_MaxResults (max. 20)), max. 2 search operations per request (adjustable with the plugin property AI_LLAMA_STD_MaxSearchRoundTrips (max. 10)). Web pages can additionally be fetched — with SSRF protection (blocking of localhost, private IPs, non-HTTP schemes).
Frontend Activation: In chat, enable the Internet checkbox; for QA, set data-cb-InternetAccess="true"; for TxtQA, set data-cb-useinternet="true".
7.2 E-Mail Bridge
Enables the LLM to send emails via the Formcycle mail server. The email bridge uses Formcycle's system mail context and automatically adds an AI disclaimer.
AI_Mail_Enabled = true AI_Mail_AllowedRecipients = .*@my-authority\.gov AI_Mail_MaxPerHour = 10
Regex Examples for allowed recipients:
-
.*@my-authority\.gov — own domain only
-
.*@(authority1|authority2)\.gov — multiple domains
-
(admin|support)@example\.com — specific addresses only
Rate Limit: globally configurable (Default 10/hour) + max. 3 emails per chat session. In frontend: enable the MailForward checkbox in chat + activate the email address field.
7.3 Date and Location Awareness
Date: The system prompt automatically injects the date, day of the week, and days of the month, as well as a specific instruction in the prompt so that small local models can correctly calculate dates (e.g., "the Tuesday after next").
Location: Reverse Geocoding via Nominatim (OpenStreetMap) + IP Geolocation (ipwho.is). Automatically detects the user's language and adjusts the system prompt. In chat: enable the Location checkbox.
Configurable: AI_LLAMA_STD_NominatimDomain (own instance), AI_LLAMA_STD_IpGeolocationDomain, AI_LLAMA_STD_FallbackLocation (if geo-services are unreachable).
8 Hybrid Mode & External APIs
Flexibility: CodBi can simultaneously use local and external AI models. Using the specialist parameter in the frontend, arbitrary local GGUF models and external API endpoints can be mixed in the same workflow.
8.1 External LLM (OpenAI, Claude, etc.)
Connection to any OpenAI-compatible API. If an external URL is set, the local model is not started — no download, no server process.
# Example: OpenAI GPT-4o AI_LLAMA_STD_ExternalUrl = https://api.openai.com AI_LLAMA_STD_ExternalApiKey = sk-... AI_LLAMA_STD_ExternalModel = gpt-4o # Example: Anthropic Claude AI_LLAMA_STD_ExternalUrl = https://api.anthropic.com AI_LLAMA_STD_ExternalApiKey = sk-ant-... AI_LLAMA_STD_ExternalModel = claude-3-opus # Optional: Disable CodBi system prompt (API brings its own) AI_LLAMA_STD_ExternalNoPrompt = true
Both modes (local + external) can run in parallel: The standard model works locally, while a specialist points to an external API. Or, external models can be defined as specialists (using AI_LLAMA_STD_EXT_SPECIALIST_XXX, AI_LLAMA_STD_EXT_SPECIALIST_Model_XXX & AI_LLAMA_STD_EXT_SPECIALIST_Key_XXX).
8.2 External Whisper API
AI_Whisper_ExternalUrl = https://api.openai.com AI_Whisper_ExternalApiKey = sk-... AI_Whisper_ExternalModel = whisper-1
If an external URL is set, the local whisper-server is not started. The audio files are sent to the external API.
8.3 AI Proxy — Provide Local AI for External Systems
The AI Proxy makes CodBi's local AI engines usable for third-party systems via OpenAI-compatible endpoints. It is secured via IP whitelist and HTTP Basic Auth.
# IP Whitelist (Comma-separated, CIDR notation supported) AI_Proxy_AllowedIPs = 192.168.1.0/24, 10.0.0.5 # Basic Auth Users (Comma-separated, format: user:password) AI_Proxy_Users = alice:secret1, bob:secret2
Available Endpoints:
| Endpoint |
Engine |
| POST <fc>/plugin?name=CodBi_AI_Proxy&endpoint=/v1/chat/completions |
LLAMA |
| POST <fc>/plugin?name=CodBi_AI_Proxy&endpoint=/v1/audio/transcriptions |
Whisper |
| POST <fc>/plugin?name=CodBi_AI_Proxy&endpoint=/v1/ocr |
Tesseract |
All access is logged anonymously: SHA-256 hash of the username, first two octets of the client IP. Client IP detection: X-Forwarded-For → X-Real-IP → Remote-Addr.
⚡ 9 Hardware & Platforms
| Module |
Process Model |
GPU Support |
Platforms |
RAM (approx.) |
LLAMA
Qwen3-VL-2B Q4_K_M |
Separate OS Process |
CUDA 12 · Vulkan · Metal · CPU |
Win (verified), Linux (compatible), macOS (x64 + arm64) (compatible) |
2 – 4 GB |
Whisper
ggml-small |
Separate OS Process |
GPU or CPU-only |
Win (verified), Linux (compatible), macOS (x64 + arm64) (compatible) |
~1 GB |
Tesseract
Tess4J / JNI |
In-Process (JVM) |
CPU only |
Windows (win32-x86-64) (verified) |
~100 MB / language |
GPU Detection: CUDA 12 → Vulkan → CPU Fallback (automatic). On Apple Silicon (M1–M4), Metal is used. To force CPU-only: AI_LLAMA_ENGINE_GpuLayers = 0 or AI_Whisper_NoGpu = true.
Port Overview: LLAMA Standard: 8392, Thinking Server: 8492 (Main+100), Specialists: from 8592 (Main+200), Whisper: 8393.
10 Troubleshooting
Port Conflicts
If port 8392 / 8393 is occupied, change the port via AI_LLAMA_ENGINE_Port or AI_Whisper_Port. Keep in mind that Thinking (+100) and Specialists (+200) automatically use derived ports.
Download Errors
Downloads support HTTP Range Resume — interrupted downloads are resumed. Check the firewall whitelist (see Section 2). For proxy servers, set JVM proxy settings if necessary (-Dhttps.proxyHost=...).
GPU Not Recognized
The log shows GPU layers: 0 (CPU only). Check: Is the CUDA 12 driver installed? Vulkan Runtime present? CUDA runtime DLLs are automatically downloaded for CUDA builds. Manual control: AI_LLAMA_ENGINE_GpuLayers = 0 (Force CPU).
Queue / Overload
During overload, requests receive a queue position instead of an error message. AI_LLAMA_ENGINE_MaxConcurrent controls the max. number of simultaneous inferences (Default: 2). In the frontend, data-cb-QueueBadge="true" can display the waiting position.
Tesseract DLLs Locked
Once loaded, DLLs are locked in the JVM process. To remove: Disable plugin → restart server → delete files.
Update Check
LLAMA checks for new llama.cpp releases every 24 hours. Configurable: AI_LLAMA_STD_UpdateCheckHours = 0 (disabled). Email notification: AI_LLAMA_STD_NotifyEmail. For this check, the following domains must be permanently reachable: github.com & api.github.com.