CyberScraper-2077
A Powerful web scraper powered by LLM | OpenAI, Gemini & Ollama
Stars: 951
CyberScraper 2077 is an advanced web scraping tool powered by AI, designed to extract data from websites with precision and style. It offers a user-friendly interface, supports multiple data export formats, operates in stealth mode to avoid detection, and promises lightning-fast scraping. The tool respects ethical scraping practices, including robots.txt and site policies. With upcoming features like proxy support and page navigation, CyberScraper 2077 is a futuristic solution for data extraction in the digital realm.
README:
Rip data from the net, leaving no trace. Welcome to the future of web scraping.
CyberScraper 2077 is not just another web scraping tool β it's a glimpse into the future of data extraction. Born from the neon-lit streets of a cyberpunk world, this AI-powered scraper uses OpenAI, Gemini and LocalLLM Models to slice through the web's defenses, extracting the data you need with unparalleled precision and style.
Whether you're a corpo data analyst, a street-smart netrunner, or just someone looking to pull information from the digital realm, CyberScraper 2077 has got you covered.
- π€ AI-Powered Extraction: Utilizes cutting-edge AI models to understand and parse web content intelligently.
- π₯οΈ Sleek Streamlit Interface: User-friendly GUI that even a chrome-armed street samurai could navigate.
- π Multi-Format Support: Export your data in JSON, CSV, HTML, SQL or Excel β whatever fits your cyberdeck.
- π΅οΈ Stealth Mode: Implemented stealth mode parameters that help avoid detection as a bot.
- π¦ Ollama Support: Use a huge library of open source LLMs.
- β‘ Async Operations: Lightning-fast scraping that would make a Trauma Team jealous.
- π§ Smart Parsing: Structures scraped content as if it was extracted straight from the engram of a master netrunner.
- πΎ Caching: Implemented content-based and query-based caching using LRU cache and a custom dictionary to reduce redundant API calls.
- π Upload to Google Sheets: Now you can easily upload your extracted CSV data to Google Sheets with one click.
- π‘οΈ Bypass Captcha: Bypass captcha by using the -captcha at the end of the URL. (Currently only works natively, doesn't work on Docker)
- π Current Browser: The current browser feature uses your local browser instance which will help you bypass 99% of bot detections. (Only use when necessary)
- π Proxy Mode (Coming Soon): Built-in proxy support to keep you ghosting through the net.
- π§ Navigate through the Pages (BETA): Navigate through the webpage and scrape data from different pages.
Check out our Redesigned and Improved Version of CyberScraper-2077 with more functionality YouTube video for a full walkthrough of CyberScraper 2077's capabilities.
Check out our first build (Old Video) YouTube video
Please follow the Docker Container Guide given below, as I won't be able to maintain another version for Windows systems.
Note: CyberScraper 2077 requires Python 3.10 or higher.
-
Clone this repository:
git clone https://github.com/itsOwen/CyberScraper-2077.git cd CyberScraper-2077
-
Create and activate a virtual environment:
virtualenv venv source venv/bin/activate # Optional
-
Install the required packages:
pip install -r requirements.txt
-
Install the playwright:
playwright install
-
Set OpenAI & Gemini Key in your environment:
Linux/Mac:
export OPENAI_API_KEY="your-api-key-here" export GOOGLE_API_KEY="your-api-key-here"
-
If you want to use Ollama:
Note: I only recommend using OpenAI and Gemini API as these models are really good at following instructions. If you are using open-source LLMs, make sure you have a good system as the speed of the data generation/presentation depends on how well your system can run the LLM. You may also have to fine-tune the prompt and add some additional filters yourself.
1. Setup Ollama using `pip install ollama`
2. Download Ollama from the official website: https://ollama.com/download
3. Now type: ollama pull llama3.1 or whatever LLM you want to use.
4. Now follow the rest of the steps below.
If you prefer to use Docker, follow these steps to set up and run CyberScraper 2077:
-
Ensure you have Docker installed on your system.
-
Clone this repository:
git clone https://github.com/itsOwen/CyberScraper-2077.git cd CyberScraper-2077
-
Build the Docker image:
docker build -t cyberscraper-2077 .
-
Run the container:
- Without API key:
docker run -p 8501:8501 cyberscraper-2077
- With OpenAI API key:
docker run -p 8501:8501 -e OPENAI_API_KEY="your-actual-api-key" cyberscraper-2077
- With Gemini API key:
docker run -p 8501:8501 -e GOOGLE_API_KEY="your-actual-api-key" cyberscraper-2077
- Without API key:
-
Open your browser and navigate to
http://localhost:8501
.
If you want to use Ollama with the Docker setup:
-
Install Ollama on your host machine following the instructions at https://ollama.com/download
-
Run Ollama on your host machine:
ollama pull llama3.1
-
Find your host machine's IP address:
- On Linux/Mac:
ifconfig
orip addr show
- On Windows:
ipconfig
- On Linux/Mac:
-
Run the Docker container with the host network and set the Ollama URL:
docker run -e OLLAMA_BASE_URL=http://host.docker.internal:11434 -p 8501:8501 cyberscraper-2077
On Linux you might need to use this below:
docker run -e OLLAMA_BASE_URL=http://<your-host-ip>:11434 -p 8501:8501 cyberscraper-2077
Replace
<your-host-ip>
with your actual host machine IP address. -
In the Streamlit interface, select the Ollama model you want to use (e.g., "ollama:llama3.1").
Note: Ensure that your firewall allows connections to port 11434 for Ollama.
-
Fire up the Streamlit app:
streamlit run main.py
-
Open your browser and navigate to
http://localhost:8501
. -
Enter the URL of the site you want to scrape or ask a question about the data you need.
-
Ask the chatbot to extract the data in any format. Select whatever data you want to export or even everything from the webpage.
-
Watch as CyberScraper 2077 tears through the net, extracting your data faster than you can say "flatline"!
Note: The multi-page scraping feature is currently in beta. While functional, you may encounter occasional issues or unexpected behavior. We appreciate your feedback and patience as we continue to improve this feature.
CyberScraper 2077 now supports multi-page scraping, allowing you to extract data from multiple pages of a website in one go. This feature is perfect for scraping paginated content, search results, or any site with data spread across multiple pages.
I suggest you enter the URL structure every time if you want to scrape multiple pages so it can detect the URL structure easily. It detects nearly all URL types.
-
Basic Usage: To scrape multiple pages, use the following format when entering the URL:
https://example.com/page 1-5 https://example.com/p/ 1-6 https://example.com/xample/something-something-1279?p=1 1-3
This will scrape pages 1 through 5 of the website.
-
Custom Page Ranges: You can specify custom page ranges:
https://example.com/p/ 1-5,7,9-12 https://example.com/xample/something-something-1279?p=1 1,7,8,9
This will scrape pages 1 to 5, page 7, and pages 9 to 12.
-
URL Patterns: For websites with different URL structures, you can specify a pattern:
https://example.com/search?q=cyberpunk&page={page} 1-5
Replace
{page}
with where the page number should be in the URL. -
Automatic Pattern Detection: If you don't specify a pattern, CyberScraper 2077 will attempt to detect the URL pattern automatically. However, for best results, specifying the pattern is recommended.
- Start with a small range of pages to test before scraping a large number.
- Be mindful of the website's load and your scraping speed to avoid overloading servers.
- Use the
simulate_human
option for more natural scraping behavior on sites with anti-bot measures. - Regularly check the website's
robots.txt
file and terms of service to ensure compliance.
URL Example : "https://news.ycombinator.com/?p=1 1-3 or 1,2,3,4"
If you want to scrape a specific page, just enter the query "please scrape page number 1 or 2". If you want to scrape all pages, simply give a query like "scrape all pages in csv" or whatever format you want.
If you encounter errors during multi-page scraping:
- Check your internet connection
- Verify the URL pattern is correct
- Ensure the website allows scraping
- Try reducing the number of pages or increasing the delay between requests
As this feature is in beta, we highly value your feedback. If you encounter any issues or have suggestions for improvement, please:
- Open an issue on our GitHub repository
- Provide detailed information about the problem, including the URL structure and number of pages you were attempting to scrape
- Share any error messages or unexpected behaviors you observed
Your input is crucial in helping us refine and stabilize this feature for future releases.
- Go to the Google Cloud Console (https://console.cloud.google.com/).
- Select your project.
- Navigate to "APIs & Services" > "Credentials".
- Find your existing OAuth 2.0 Client ID and delete it.
- Click "Create Credentials" > "OAuth client ID".
- Choose "Web application" as the application type.
- Name your client (e.g., "CyberScraper 2077 Web Client").
- Under "Authorized JavaScript origins", add:
- Under "Authorized redirect URIs", add:
- Click "Create" to generate the new client ID.
- Download the new client configuration JSON file and rename it to
client_secret.json
.
Customize the PlaywrightScraper
settings to fit your scraping needs. If some websites are giving you issues, you might want to check the behavior of the website:
use_stealth: bool = True,
simulate_human: bool = False,
use_custom_headers: bool = True,
hide_webdriver: bool = True,
bypass_cloudflare: bool = True:
Adjust these settings based on your target website and environment for optimal results.
You can also bypass the captcha using the -captcha
parameter at the end of the URL. The browser window will pop up, complete the captcha, and go back to your terminal window. Press enter and the bot will complete its task.
We welcome all cyberpunks, netrunners, and code samurais to contribute to CyberScraper 2077!
Ran into a glitch in the matrix? Let me know by adding the issue to this repo so that we can fix it together.
Q: Is CyberScraper 2077 legal to use? A: CyberScraper 2077 is designed for ethical web scraping. Always ensure you have the right to scrape a website and respect their robots.txt file.
Q: Can I use this for commercial purposes? A: Yes, under the terms of the MIT License. But remember, in Night City, there's always a price to pay. Just kidding!
This project is licensed under the MIT License - see the LICENSE file for details. Use it, mod it, sell it β just don't blame us if you end up flatlined.
Got questions? Need support? Want to hire me for a gig?
- π§ Email: [email protected]
- π¦ Twitter: @owensingh_
- π¬ Website: Portfolio
Listen up, choombas! Before you jack into this code, you better understand the risks:
-
This software is provided "as is", without warranty of any kind, express or implied.
-
The authors are not liable for any damages or losses resulting from the use of this software.
-
This tool is intended for educational and research purposes only. Any illegal use is strictly prohibited.
-
We do not guarantee the accuracy, completeness, or reliability of any data obtained through this tool.
-
By using this software, you acknowledge that you are doing so at your own risk.
-
You are responsible for complying with all applicable laws and regulations in your use of this software.
-
We reserve the right to modify or discontinue the software at any time without notice.
Remember, samurai: In the dark future of the NET, knowledge is power, but it's also a double-edged sword. Use this tool wisely, and may your connection always be strong and your firewalls impenetrable. Stay frosty out there in the digital frontier.
CyberScraper 2077 β Because in 2077, what makes someone a criminal? Getting caught.
Built with β€οΈ and chrome by the streets of Night City | Β© 2077 Owen Singh
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for CyberScraper-2077
Similar Open Source Tools
CyberScraper-2077
CyberScraper 2077 is an advanced web scraping tool powered by AI, designed to extract data from websites with precision and style. It offers a user-friendly interface, supports multiple data export formats, operates in stealth mode to avoid detection, and promises lightning-fast scraping. The tool respects ethical scraping practices, including robots.txt and site policies. With upcoming features like proxy support and page navigation, CyberScraper 2077 is a futuristic solution for data extraction in the digital realm.
Open_Data_QnA
Open Data QnA is a Python library that allows users to interact with their PostgreSQL or BigQuery databases in a conversational manner, without needing to write SQL queries. The library leverages Large Language Models (LLMs) to bridge the gap between human language and database queries, enabling users to ask questions in natural language and receive informative responses. It offers features such as conversational querying with multiturn support, table grouping, multi schema/dataset support, SQL generation, query refinement, natural language responses, visualizations, and extensibility. The library is built on a modular design and supports various components like Database Connectors, Vector Stores, and Agents for SQL generation, validation, debugging, descriptions, embeddings, responses, and visualizations.
azure-search-openai-javascript
This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access the ChatGPT model (gpt-35-turbo), and Azure AI Search for data indexing and retrieval.
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
clickolas-cage
Clickolas-cage is a Chrome extension designed to autonomously perform web browsing actions to achieve specific goals using LLM as a brain. Users can interact with the extension by setting goals, which triggers a series of actions including navigation, element extraction, and step generation. The extension is developed using Node.js and can be locally run for testing and development purposes before packing it for submission to the Chrome Web Store.
crawlee-python
Crawlee-python is a web scraping and browser automation library that covers crawling and scraping end-to-end, helping users build reliable scrapers fast. It allows users to crawl the web for links, scrape data, and store it in machine-readable formats without worrying about technical details. With rich configuration options, users can customize almost any aspect of Crawlee to suit their project's needs.
azure-search-openai-demo
This sample demonstrates a few approaches for creating ChatGPT-like experiences over your own data using the Retrieval Augmented Generation pattern. It uses Azure OpenAI Service to access a GPT model (gpt-35-turbo), and Azure AI Search for data indexing and retrieval. The repo includes sample data so it's ready to try end to end. In this sample application we use a fictitious company called Contoso Electronics, and the experience allows its employees to ask questions about the benefits, internal policies, as well as job descriptions and roles.
Akagi
Akagi is a project designed to help users understand and improve their performance in Majsoul game matches in real-time. It provides educational insights and tools for analyzing gameplay. Users can install Akagi on Windows or Mac systems and follow the setup instructions to enhance their gaming experience. The project aims to offer features like Autoplay, Auto Ron, and integration with MajsoulUnlocker. It also focuses on enhancing user safety by providing guidelines to minimize the risk of account suspension. Akagi is a tool that combines MITM interception, AI decision-making, and user interaction to optimize gameplay strategies and performance.
uBlockOrigin-HUGE-AI-Blocklist
A huge blocklist of sites containing AI generated content (~950 sites) for cleaning image search engines with uBlock Origin or uBlacklist. Includes hosts file for pi-hole/adguard. Provides instructions for importing blocklists and additional lists for specific content. Allows users to create allowlists and customize filtering based on keywords. Offers tips and tricks for advanced filtering and comparison between uBlock Origin and uBlacklist implementations.
devika
Devika is an advanced AI software engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika utilizes large language models, planning and reasoning algorithms, and web browsing abilities to intelligently develop software. Devika aims to revolutionize the way we build software by providing an AI pair programmer who can take on complex coding tasks with minimal human guidance. Whether you need to create a new feature, fix a bug, or develop an entire project from scratch, Devika is here to assist you.
litlytics
LitLytics is an affordable analytics platform leveraging LLMs for automated data analysis. It simplifies analytics for teams without data scientists, generates custom pipelines, and allows customization. Cost-efficient with low data processing costs. Scalable and flexible, works with CSV, PDF, and plain text data formats.
chat-with-notes
Chat-with-Notes is a Flask web application that enables users to upload text files, view their content, and engage with an AI chatbot for discussions. The application prioritizes privacy by utilizing a locally hosted Ollama Llama 3.1 (8B) model for AI responses, ensuring data security. Users can upload files during conversations, clear chat history, and export chat logs. The tool operates locally, requiring Python 3.x, pip, Git, and a locally running Ollama Llama 3.1 (8B) model as prerequisites.
SalesGPT
SalesGPT is an open-source AI agent designed for sales, utilizing context-awareness and LLMs to work across various communication channels like voice, email, and texting. It aims to enhance sales conversations by understanding the stage of the conversation and providing tools like product knowledge base to reduce errors. The agent can autonomously generate payment links, handle objections, and close sales. It also offers features like automated email communication, meeting scheduling, and integration with various LLMs for customization. SalesGPT is optimized for low latency in voice channels and ensures human supervision where necessary. The tool provides enterprise-grade security and supports LangSmith tracing for monitoring and evaluation of intelligent agents built on LLM frameworks.
serverless-pdf-chat
The serverless-pdf-chat repository contains a sample application that allows users to ask natural language questions of any PDF document they upload. It leverages serverless services like Amazon Bedrock, AWS Lambda, and Amazon DynamoDB to provide text generation and analysis capabilities. The application architecture involves uploading a PDF document to an S3 bucket, extracting metadata, converting text to vectors, and using a LangChain to search for information related to user prompts. The application is not intended for production use and serves as a demonstration and educational tool.
rustcrab
Rustcrab is a repository for Rust developers, offering resources, tools, and guides to enhance Rust programming skills. It is a Next.js application with Tailwind CSS and TypeScript, featuring real-time display of GitHub stars, light/dark mode toggling, integration with daily.dev, and social media links. Users can clone the repository, install dependencies, run the development server, build for production, and deploy to various platforms. Contributions are encouraged through opening issues or submitting pull requests.
civitai
Civitai is a platform where people can share their stable diffusion models (textual inversions, hypernetworks, aesthetic gradients, VAEs, and any other crazy stuff people do to customize their AI generations), collaborate with others to improve them, and learn from each other's work. The platform allows users to create an account, upload their models, and browse models that have been shared by others. Users can also leave comments and feedback on each other's models to facilitate collaboration and knowledge sharing.
For similar tasks
mage-ai
Mage is an open-source data pipeline tool for transforming and integrating data. It offers an easy developer experience, engineering best practices built-in, and data as a first-class citizen. Mage makes it easy to build, preview, and launch data pipelines, and provides observability and scaling capabilities. It supports data integrations, streaming pipelines, and dbt integration.
nucliadb
NucliaDB is a robust database that allows storing and searching on unstructured data. It is an out of the box hybrid search database, utilizing vector, full text and graph indexes. NucliaDB is written in Rust and Python. We designed it to index large datasets and provide multi-teanant support. When utilizing NucliaDB with Nuclia cloud, you are able to the power of an NLP database without the hassle of data extraction, enrichment and inference. We do all the hard work for you.
LLMstudio
LLMstudio by TensorOps is a platform that offers prompt engineering tools for accessing models from providers like OpenAI, VertexAI, and Bedrock. It provides features such as Python Client Gateway, Prompt Editing UI, History Management, and Context Limit Adaptability. Users can track past runs, log costs and latency, and export history to CSV. The tool also supports automatic switching to larger-context models when needed. Coming soon features include side-by-side comparison of LLMs, automated testing, API key administration, project organization, and resilience against rate limits. LLMstudio aims to streamline prompt engineering, provide execution history tracking, and enable effortless data export, offering an evolving environment for teams to experiment with advanced language models.
CyberScraper-2077
CyberScraper 2077 is an advanced web scraping tool powered by AI, designed to extract data from websites with precision and style. It offers a user-friendly interface, supports multiple data export formats, operates in stealth mode to avoid detection, and promises lightning-fast scraping. The tool respects ethical scraping practices, including robots.txt and site policies. With upcoming features like proxy support and page navigation, CyberScraper 2077 is a futuristic solution for data extraction in the digital realm.
kangaroo
Kangaroo is an AI-powered SQL client and admin tool for popular databases like SQLite, MySQL, PostgreSQL, etc. It supports various functionalities such as table design, query, model, sync, export/import, and more. The tool is designed to be comfortable, fun, and developer-friendly, with features like code intellisense and autocomplete. Kangaroo aims to provide a seamless experience for database management across different operating systems.
skyvern
Skyvern automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows, replacing brittle or unreliable automation solutions. Traditional approaches to browser automations required writing custom scripts for websites, often relying on DOM parsing and XPath-based interactions which would break whenever the website layouts changed. Instead of only relying on code-defined XPath interactions, Skyvern adds computer vision and LLMs to the mix to parse items in the viewport in real-time, create a plan for interaction and interact with them. This approach gives us a few advantages: 1. Skyvern can operate on websites itβs never seen before, as itβs able to map visual elements to actions necessary to complete a workflow, without any customized code 2. Skyvern is resistant to website layout changes, as there are no pre-determined XPaths or other selectors our system is looking for while trying to navigate 3. Skyvern leverages LLMs to reason through interactions to ensure we can cover complex situations. Examples include: 1. If you wanted to get an auto insurance quote from Geico, the answer to a common question βWere you eligible to drive at 18?β could be inferred from the driver receiving their license at age 16 2. If you were doing competitor analysis, itβs understanding that an Arnold Palmer 22 oz can at 7/11 is almost definitely the same product as a 23 oz can at Gopuff (even though the sizes are slightly different, which could be a rounding error!) Want to see examples of Skyvern in action? Jump to #real-world-examples-of- skyvern
airbyte-connectors
This repository contains Airbyte connectors used in Faros and Faros Community Edition platforms as well as Airbyte Connector Development Kit (CDK) for JavaScript/TypeScript.
open-parse
Open Parse is a Python library for visually discerning document layouts and chunking them effectively. It is designed to fill the gap in open-source libraries for handling complex documents. Unlike text splitting, which converts a file to raw text and slices it up, Open Parse visually analyzes documents for superior LLM input. It also supports basic markdown for parsing headings, bold, and italics, and has high-precision table support, extracting tables into clean Markdown formats with accuracy that surpasses traditional tools. Open Parse is extensible, allowing users to easily implement their own post-processing steps. It is also intuitive, with great editor support and completion everywhere, making it easy to use and learn.
For similar jobs
lollms-webui
LoLLMs WebUI (Lord of Large Language Multimodal Systems: One tool to rule them all) is a user-friendly interface to access and utilize various LLM (Large Language Models) and other AI models for a wide range of tasks. With over 500 AI expert conditionings across diverse domains and more than 2500 fine tuned models over multiple domains, LoLLMs WebUI provides an immediate resource for any problem, from car repair to coding assistance, legal matters, medical diagnosis, entertainment, and more. The easy-to-use UI with light and dark mode options, integration with GitHub repository, support for different personalities, and features like thumb up/down rating, copy, edit, and remove messages, local database storage, search, export, and delete multiple discussions, make LoLLMs WebUI a powerful and versatile tool.
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customerβs subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.
minio
MinIO is a High Performance Object Storage released under GNU Affero General Public License v3.0. It is API compatible with Amazon S3 cloud storage service. Use MinIO to build high performance infrastructure for machine learning, analytics and application data workloads.
mage-ai
Mage is an open-source data pipeline tool for transforming and integrating data. It offers an easy developer experience, engineering best practices built-in, and data as a first-class citizen. Mage makes it easy to build, preview, and launch data pipelines, and provides observability and scaling capabilities. It supports data integrations, streaming pipelines, and dbt integration.
AiTreasureBox
AiTreasureBox is a versatile AI tool that provides a collection of pre-trained models and algorithms for various machine learning tasks. It simplifies the process of implementing AI solutions by offering ready-to-use components that can be easily integrated into projects. With AiTreasureBox, users can quickly prototype and deploy AI applications without the need for extensive knowledge in machine learning or deep learning. The tool covers a wide range of tasks such as image classification, text generation, sentiment analysis, object detection, and more. It is designed to be user-friendly and accessible to both beginners and experienced developers, making AI development more efficient and accessible to a wider audience.
tidb
TiDB is an open-source distributed SQL database that supports Hybrid Transactional and Analytical Processing (HTAP) workloads. It is MySQL compatible and features horizontal scalability, strong consistency, and high availability.
airbyte
Airbyte is an open-source data integration platform that makes it easy to move data from any source to any destination. With Airbyte, you can build and manage data pipelines without writing any code. Airbyte provides a library of pre-built connectors that make it easy to connect to popular data sources and destinations. You can also create your own connectors using Airbyte's no-code Connector Builder or low-code CDK. Airbyte is used by data engineers and analysts at companies of all sizes to build and manage their data pipelines.
labelbox-python
Labelbox is a data-centric AI platform for enterprises to develop, optimize, and use AI to solve problems and power new products and services. Enterprises use Labelbox to curate data, generate high-quality human feedback data for computer vision and LLMs, evaluate model performance, and automate tasks by combining AI and human-centric workflows. The academic & research community uses Labelbox for cutting-edge AI research.