webwhiz

WebWhiz allows you to create an AI chatbot that knows everything about your product and can instantly respond to your customer's queries.

Stars: 886

Visit

WebWhiz is an open-source tool that allows users to train ChatGPT on website data to build AI chatbots for customer queries. It offers easy integration, data-specific responses, regular data updates, no-code builder, chatbot customization, fine-tuning, and offline messaging. Users can create and train chatbots in a few simple steps by entering their website URL, automatically fetching and preparing training data, training ChatGPT, and embedding the chatbot on their website. WebWhiz can crawl websites monthly, collect text data and metadata, and process text data using tokens. Users can train custom data, but bringing custom open AI keys is not yet supported. The tool has no limitations on context size but may limit the number of pages based on the chosen plan. WebWhiz SDK is available on NPM, CDNs, and GitHub, and users can self-host it using Docker or manual setup involving MongoDB, Redis, Node, Python, and environment variables setup. For any issues, users can contact [email protected].

README:

WebWhiz

Train ChatGPT on Your Website Data and Build an AI Chatbot that can instantly answer your customers queries.

🔥 Core features

Easy Integration
Data-Specific Responses
Regular Data Updates
No code builder
Customise chatbot
Fine tuning
Offline message

🤔 How it works ❓

Create and train a chatbot for your website in just a few simple steps.

Just enter your website URL to get started. We'll automatically fetch and prepare training data.
We’ll automatically train ChatGPT on your website based on the selected parameters and create the chatbot for you.
To embed the chatbot to your website, simply add the tiny script tag to your website.

🙋‍♂️ Frequently Asked Question ❓

What is WebWhiz?

WebWhiz allows you to train ChatGPT on your website data and build a chatbot that you can add to your website. No coding required.

How frequently do you crawl my website?

Currently we crawl your website once every month. Please contact us if you need your website to be scanned more frequently

What data do you collect from my website?

WebWhiz collects data from your website pages to train your chatbot. This includes text data from the pages as well as any metadata such as page titles or descriptions. We do not collect any personally identifiable information (PII) or sensitive data from your website. We scan only public data available to search engines

What happens if I exceed my plan's limits?

If you exceed your plan's limits for projects or pages, we will notify you. However, if you exceed the token limit for your plan, your chatbots will stop generating AI responses and will instead respond with a predefined message.

What are tokens?

Tokens are a unit of measurement used to calculate the amount of text data that is processed by your chatbot. Each token corresponds to a variable number of characters, depending on the complexity of the language used in the message. Each message your chatbot sends uses a certain number of tokens based on the length and complexity of the input and the AI response. You can view the current token usage of your account on the dashboard.

Can I train custom data?

Yes, you can train custom data by simply pasting content to WebWhiz

Can I bring my own open ai Key

Not at the moment, but, it will be possible in a couple of days.

What is the maximum size of context?

WebWhiz have any limitations on the size of context. However, please note that the number of pages you can crawl may be limited based on the plan you choose. Please refer to our plans page to learn more about the specific limitations of each plan.

📑 License

Webwhiz is open-source under the GNU Affero General Public License Version 3 (AGPLv3)

WebWhiz SDK

WebWhiz SDK is available on NPM, CDNs, and GitHub.

NPM - NPM is a package manager for the JavaScript programming language. You can install webwhiz using the following command
```
npm install webwhiz
```

CDN Use directly from CDN

  https://www.unpkg.com/[email protected]/dist/sdk.js

☁️ Self Hosting

Docker - Easy
Manual Setup - Involved (but provides more flexibility)

📦 Docker

Prerequisites

Docker & docker-compose

Running Webwhiz with docker

Clone the repo
Edit the .env.docker file present in the root of the repo and add your OPENAI_KEY & OPENAI_KEY_2
Use docker-compose to start the stack

# Bring up webwhiz
# Once the building is done and webwhiz starts the UI will be available at
# http://localhost:3030, backend is available at http://localhost:3000
# To exit Press Ctrl-C
docker-compose up

# Alternatively Run webwhiz as a daemon
docker-compose up -d

# Stop Webwhiz
docker-compose down

# Force rebuild all containers (required only if some change is not picked up)
sudo docker-compose up --build --force-recreate

🛃 Manual

WebWhiz is designed to be used as a production grade Chatbot that can be scaled up or down to handle any volume of data.

WebWhiz consists of mainly 3 components

The API server - This is the main webwhiz backend web server using NestJS
JS Celery Worker - Handles crawling, embeddings generation
Python Celery Worker - Container Cosine similarity calculator & HTML / PDF content extractor

For Database and Caching Webwhiz uses

MongoDB
Redis

The backend server uses third part services (including OpenAI) for powering the chatbot, as well as for error monitoring etc. Only OpenAI key is mandatory and you can ignore the others if you prefer to.

NOTE: WebWhiz keeps embeddings in Redis to improve the performance of chatbot responses. For most organisations the chatbots created would be conatins data for a few hundren or thousands of pages, and Redis should work well while providing better performance. If you would like to use a dedicated vector database for searching relavant chunks please reach out to us.

Prerequisites

MongoDB v6
Redis v7
Node v18 + Yarn
Python v3.6+

Setting Enviornment Variables

Create a copy of the .env.sample file and rename as .env

The following variables as mandatory

HOST - IP to which the web server should bind to typically 0.0.0.0
PORT - Port on which web server should listen to (Default 3000)
SECRET_KEY - Secret used for encryption (JWT, etc).
MONGO_URI - MongoDB uri to use
MONGO_DBNAME - Name of Database inside MongoDB
OPENAI_KEY - OpenAI API key
OPENAI_KEY_2 - Alternate OpenAI API key, used when the primary one raises error. You can use the same API key for both if you don't want to provide two separate API keys

Inside the workers folder create a copy of the .env.sample and rename as .env.

Set the value for the following variables - MONGO_URI, MONGO_DBNAME, REDIS_HOST, REDIS_PORT

Installing dependencies and running app

From the root folder run the following commands

# Install node dependencies
yarn install

# Install python worker dependencies
cd workers
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Run application with pm2
cd ..
yarn run build
npm install -g pm2 # Use sudo if required
pm2 start ecosystem.config.js

This will start the backend http server, the js worker and the python worker

Frontend

Create .env file in the frontend folder and add the following variables

REACT_APP_BASE_URL='https://api.website.com'
GOOGLE_AUTH_ID='Only if you need google login'

From the frontend folder run the following commands to start the server

# Install dependencies
npm install

# Run front end app
npm run start

Run npm run build to package the frontend app

If you face any issues, reach out to [email protected]

For Tasks:

Click tags to check more tools for each tasks

train chatbot customize responses embed chatbot monitor token usage self-host with docker

For Jobs:

web developer customer support agent digital marketer content creator ai engineer

Alternative AI tools for webwhiz

Similar Open Source Tools

webwhiz

github

: 886

raggenie

RAGGENIE is a low-code RAG builder tool designed to simplify the creation of conversational AI applications. It offers out-of-the-box plugins for connecting to various data sources and building conversational AI on top of them, including integration with pre-built agents for actions. The tool is open-source under the MIT license, with a current focus on making it easy to build RAG applications and future plans for maintenance, monitoring, and transitioning applications from pilots to production.

github

: 161

AIOStreams

AIOStreams is a versatile tool that combines streams from various addons into one platform, offering extensive customization options. Users can change result formats, filter results by various criteria, remove duplicates, prioritize services, sort results, specify size limits, and more. The tool scrapes results from selected addons, applies user configurations, and presents the results in a unified manner. It simplifies the process of finding and accessing desired content from multiple sources, enhancing user experience and efficiency.

github

: 174

n8n-docs

n8n is an extendable workflow automation tool that enables you to connect anything to everything. It is open-source and can be self-hosted or used as a service. n8n provides a visual interface for creating workflows, which can be used to automate tasks such as data integration, data transformation, and data analysis. n8n also includes a library of pre-built nodes that can be used to connect to a variety of applications and services. This makes it easy to create complex workflows without having to write any code.

github

: 352

aiCoder

aiCoder is an AI-powered tool designed to streamline the coding process by automating repetitive tasks, providing intelligent code suggestions, and facilitating the integration of new features into existing codebases. It offers a chat interface for natural language interactions, methods and stubs lists for code modification, and settings customization for project-specific prompts. Users can leverage aiCoder to enhance code quality, focus on higher-level design, and save time during development.

github

: 157

openui

OpenUI is a tool designed to simplify the process of building UI components by allowing users to describe UI using their imagination and see it rendered live. It supports converting HTML to React, Svelte, Web Components, etc. The tool is open source and aims to make UI development fun, fast, and flexible. It integrates with various AI services like OpenAI, Groq, Gemini, Anthropic, Cohere, and Mistral, providing users with the flexibility to use different models. OpenUI also supports LiteLLM for connecting to various LLM services and allows users to create custom proxy configs. The tool can be run locally using Docker or Python, and it offers a development environment for quick setup and testing.

github

: 16.7k

Perplexica

Perplexica is an open-source AI-powered search engine that utilizes advanced machine learning algorithms to provide clear answers with sources cited. It offers various modes like Copilot Mode, Normal Mode, and Focus Modes for specific types of questions. Perplexica ensures up-to-date information by using SearxNG metasearch engine. It also features image and video search capabilities and upcoming features include finalizing Copilot Mode and adding Discover and History Saving features.

github

: 21.0k

azure-search-openai-demo

This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access a GPT model (gpt-35-turbo), and Azure AI Search for data indexing and retrieval. The repo includes sample data so it's ready to try end to end. In this sample application we use a fictitious company called Contoso Electronics, and the experience allows its employees to ask questions about the benefits, internal policies, as well as job descriptions and roles.

github

: 6.0k

SeaGOAT

SeaGOAT is a local search tool that leverages vector embeddings to enable you to search your codebase semantically. It is designed to work on Linux, macOS, and Windows and can process files in various formats, including text, Markdown, Python, C, C++, TypeScript, JavaScript, HTML, Go, Java, PHP, and Ruby. SeaGOAT uses a vector database called ChromaDB and a local vector embedding engine to provide fast and accurate search results. It also supports regular expression/keyword-based matches. SeaGOAT is open-source and licensed under an open-source license, and users are welcome to examine the source code, raise concerns, or create pull requests to fix problems.

github

: 1.1k

devika

Devika is an advanced AI software engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika utilizes large language models, planning and reasoning algorithms, and web browsing abilities to intelligently develop software. Devika aims to revolutionize the way we build software by providing an AI pair programmer who can take on complex coding tasks with minimal human guidance. Whether you need to create a new feature, fix a bug, or develop an entire project from scratch, Devika is here to assist you.

github

: 18.1k

LLMStack

LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.

github

: 1.5k

ai-voice-cloning

This repository provides a tool for AI voice cloning, allowing users to generate synthetic speech that closely resembles a target speaker's voice. The tool is designed to be user-friendly and accessible, with a graphical user interface that guides users through the process of training a voice model and generating synthetic speech. The tool also includes a variety of features that allow users to customize the generated speech, such as the pitch, volume, and speaking rate. Overall, this tool is a valuable resource for anyone interested in creating realistic and engaging synthetic speech.

github

: 268

ai-clone-whatsapp

This repository provides a tool to create an AI chatbot clone of yourself using your WhatsApp chats as training data. It utilizes the Torchtune library for finetuning and inference. The code includes preprocessing of WhatsApp chats, finetuning models, and chatting with the AI clone via a command-line interface. Supported models are Llama3-8B-Instruct and Mistral-7B-Instruct-v0.2. Hardware requirements include approximately 16 GB vRAM for QLoRa Llama3 finetuning with a 4k context length. The repository addresses common issues like adjusting parameters for training and preprocessing non-English chats.

github

: 270

MiniSearch

MiniSearch is a minimalist search engine with integrated browser-based AI. It is privacy-focused, easy to use, cross-platform, integrated, time-saving, efficient, optimized, and open-source. MiniSearch can be used for a variety of tasks, including searching the web, finding files on your computer, and getting answers to questions. It is a great tool for anyone who wants a fast, private, and easy-to-use search engine.

github

: 384

gpt-pilot

GPT Pilot is a core technology for the Pythagora VS Code extension, aiming to provide the first real AI developer companion. It goes beyond autocomplete, helping with writing full features, debugging, issue discussions, and reviews. The tool utilizes LLMs to generate production-ready apps, with developers overseeing the implementation. GPT Pilot works step by step like a developer, debugging issues as they arise. It can work at any scale, filtering out code to show only relevant parts to the AI during tasks. Contributions are welcome, with debugging and telemetry being key areas of focus for improvement.

github

: 32.2k

WebCraftifyAI

WebCraftifyAI is a software aid that makes it easy to create and build web pages and content. It is designed to be user-friendly and accessible to people of all skill levels. With WebCraftifyAI, you can quickly and easily create professional-looking websites without having to learn complex coding or design skills.

github

: 90

For similar tasks

wechat-bot

WeChat Bot is a simple and easy-to-use WeChat robot based on chatgpt and wechaty. It can help you automatically reply to WeChat messages or manage WeChat groups/friends. The tool requires configuration of AI services such as Xunfei, Kimi, or ChatGPT. Users can customize the tool to automatically reply to group or private chat messages based on predefined conditions. The tool supports running in Docker for easy deployment and provides a convenient way to interact with various AI services for WeChat automation.

github

: 5.9k

webwhiz

github

: 886

parlant

Parlant is a structured approach to building and guiding customer-facing AI agents. It allows developers to create and manage robust AI agents, providing specific feedback on agent behavior and helping understand user intentions better. With features like guidelines, glossary, coherence checks, dynamic context, and guided tool use, Parlant offers control over agent responses and behavior. Developer-friendly aspects include instant changes, Git integration, clean architecture, and type safety. It enables confident deployment with scalability, effective debugging, and validation before deployment. Parlant works with major LLM providers and offers client SDKs for Python and TypeScript. The tool facilitates natural customer interactions through asynchronous communication and provides a chat UI for testing new behaviors before deployment.

github

: 1.9k

douyin-chatgpt-bot

Douyin ChatGPT Bot is an AI-driven system for automatic replies on Douyin, including comment and private message replies. It offers features such as comment filtering, customizable robot responses, and automated account management. The system aims to enhance user engagement and brand image on the Douyin platform, providing a seamless experience for managing interactions with followers and potential customers.

github

: 166

Claude-Usage-Extension

The Claude usage tracker extension helps users monitor their token usage on Claude, calculating usage from various sources like files, projects, preferences, message history, and AI output. It also synchronizes usage amounts across devices via firebase. The extension provides a user-friendly UI for easy tracking.

github

: 54

secret-llama

Entirely-in-browser, fully private LLM chatbot supporting Llama 3, Mistral and other open source models. Fully private = No conversation data ever leaves your computer. Runs in the browser = No server needed and no install needed! Works offline. Easy-to-use interface on par with ChatGPT, but for open source LLMs. System requirements include a modern browser with WebGPU support. Supported models include TinyLlama-1.1B-Chat-v0.4-q4f32_1-1k, Llama-3-8B-Instruct-q4f16_1, Phi1.5-q4f16_1-1k, and Mistral-7B-Instruct-v0.2-q4f16_1. Looking for contributors to improve the interface, support more models, speed up initial model loading time, and fix bugs.

github

: 2.0k

LLM_AppDev-HandsOn

This repository showcases how to build a simple LLM-based chatbot for answering questions based on documents using retrieval augmented generation (RAG) technique. It also provides guidance on deploying the chatbot using Podman or on the OpenShift Container Platform. The workshop associated with this repository introduces participants to LLMs & RAG concepts and demonstrates how to customize the chatbot for specific purposes. The software stack relies on open-source tools like streamlit, LlamaIndex, and local open LLMs via Ollama, making it accessible for GPU-constrained environments.

github

: 383

For similar jobs

sweep

Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.

github

: 7.1k

teams-ai

The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.

github

: 502

ai-guide

This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.

github

: 159

classifai

Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.

github

: 620

chatbot-ui

Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.

github

: 27.7k

BricksLLM

BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students

github

: 953

uAgents

uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.

github

: 1.3k

griptape

Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.

github

: 2.2k