Core Concepts
VOIX enables AI assistants to interact with websites through a simple yet powerful architecture. This guide explains how the different parts work together.
Overview
VOIX consists of three main components working together:
- Your Website - Declares what the AI can do and provides current state
- Chrome Extension - Bridges the gap between your website and AI
- User + AI - Natural language interface for interacting with your site
How It Works
1. Website Declaration
Your website declares capabilities using HTML elements:
- Tools - Actions the AI can perform
- Context - Current state information
<!-- Declare an action -->
<tool name="create_task" description="Create a new task">
<prop name="title" type="string" required/>
</tool>
<!-- Provide current state -->
<context name="user">
Name: John Doe
Role: Admin
</context>
2. Extension Discovery
When a user opens VOIX on your page:
- The extension scans for all
<tool>
and<context>
elements - It builds a catalog of available actions and current state
- This information is presented to the AI assistant
3. User Interaction
The user types or speaks naturally:
"Create a task called 'Review pull requests'"
4. AI Understanding
The AI:
- Reads the available tools and their descriptions
- Understands the current context
- Determines which tool to use and with what parameters
5. Tool Execution
When the AI decides to use a tool:
- VOIX triggers a
call
event on the tool element - Your JavaScript handler receives the parameters
- Your code performs the action
document.querySelector('[name=create_task]').addEventListener('call', (e) => {
const { title } = e.detail;
// Create the task in your application
createTask(title);
});
Architecture Benefits
For Developers
VOIX uses HTML elements to define AI capabilities. You add <tool>
and <context>
tags to your pages and attach event listeners to handle tool calls. No API integration or SDK is required. The approach works with any JavaScript framework or vanilla JavaScript, and you can add these elements to existing pages without modifying other code.
You control what data the AI can access by choosing what to include in your tool and context elements. Your website receives only the tool execution requests with their parameters. The conversation between the user and AI remains private - you never see what the user typed or how they phrased their request.
For Users
You interact with websites through natural language. The AI reads the available tools and current context to understand what actions it can perform. You can make requests like "delete the third item" or "show only active tasks" and the AI will execute the appropriate tools with the correct parameters.
You configure your own AI provider in the extension settings. This can be OpenAI, Anthropic, Ollama running locally, or any OpenAI-compatible endpoint. Your conversation data goes directly from the extension to your chosen provider. The website never receives your messages, only the resulting tool calls that need to be executed.
The Role of Each Part
Your Website's Role
- Declare Tools - Define what actions are possible
- Provide Context - Share current state information
- Handle Events - Execute actions when tools are called
- Update UI - Reflect changes in your interface
Extension's Role
- Discovery - Find tools and context on the page
- Communication - Connect your site with the AI
- Event Dispatch - Trigger tool calls based on AI decisions
- Privacy - Keep all data local in the browser
User's Role
- Natural Input - Describe what they want to accomplish
- Conversation - Clarify or refine requests as needed
- Verification - Confirm actions when necessary
Data Flow Example
Let's trace through a complete interaction to see how data flows through the system:
User Input → AI Processing → Tool Selection → Event Dispatch → Your Handler → UI Update
User visits your task management app
html<tool name="mark_complete" description="Mark a task as complete"> <prop name="taskId" type="string" required/> </tool> <context name="tasks"> Active tasks: 5 Task IDs: task-1, task-2, task-3, task-4, task-5 </context>
User opens VOIX and types
"Mark task-3 as complete"
AI processes the request
- Sees the
mark_complete
tool - Reads the context showing task-3 exists
- Decides to call the tool with taskId: "task-3"
- Sees the
Your code handles the event
javascripttool.addEventListener('call', (e) => { const { taskId } = e.detail; markTaskComplete(taskId); updateTaskList(); });
User sees the result
- Task marked as complete in the UI
- Context automatically updates
- Ready for the next interaction
Each step happens in the browser. The only external dependency is the AI provider, which is configured and trusted by the user. The website acts purely as a capability provider, never seeing the conversation between the user and their AI assistant.
Next Steps
- Learn about Tools to create interactive capabilities
- Understand Context for sharing application state