Best AI tools for< Parse Html Documents >
20 - AI tool Sites
Daxtra
Daxtra is an AI-powered recruitment technology tool designed to help staffing and recruiting professionals find, parse, match, and engage the best candidates quickly and efficiently. The tool offers a suite of products that seamlessly integrate with existing ATS or CRM systems, automating various recruitment processes such as candidate data loading, CV/resume formatting, information extraction, and job matching. Daxtra's solutions cater to corporates, vendors, job boards, and social media partners, providing a comprehensive set of developer components to enhance recruitment workflows.
Extracta.ai
Extracta.ai is an AI data extraction tool for documents and images that automates data extraction processes with easy integration. It allows users to define custom templates for extracting structured data without the need for training. The platform can extract data from various document types, including invoices, resumes, contracts, receipts, and more, providing accurate and efficient results. Extracta.ai ensures data security, encryption, and GDPR compliance, making it a reliable solution for businesses looking to streamline document processing.
HrFlow.ai
HrFlow.ai is an API-first company and the leading AI-powered HR data automation platform. The company helps +1000 customers (HR software vendors, Staffing agencies, large employers, and headhunting firms) to thrive in a high-volume and high-frequency labor market. The platform provides a complete and fully integrated suite of HR data processing products based on the analysis of hundreds of millions of career paths worldwide -- such as Parsing API, Tagging API, Embedding API, Searching API, Scoring API, and Upskilling API. It also offers a catalog of +200 connectors to build custom scenarios that can automate any business logic.
JADBio
JADBio is an automated machine learning (AutoML) platform designed to accelerate biomarker discovery and drug development processes. It offers a no-code solution that automates the discovery of biomarkers and interprets their role based on research needs. JADBio can parse multi-omics data, including genomics, transcriptome, metagenome, proteome, metabolome, phenotype/clinical data, and images, enabling users to efficiently discover valuable insights. The platform is purpose-built for various conditions such as cancer, immune, endocrine, metabolic system, chronic diseases, aging, infectious diseases, and mental health, offering solutions for early biomarker discovery, drug repurposing, lead identification, compound optimization, trial monitoring, and response to treatment. JADBio is trusted by partners in precision health & medicine and is continuously evolving to disrupt drug discovery times and costs at all stages.
NLTK
NLTK (Natural Language Toolkit) is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum. Thanks to a hands-on guide introducing programming fundamentals alongside topics in computational linguistics, plus comprehensive API documentation, NLTK is suitable for linguists, engineers, students, educators, researchers, and industry users alike.
Whitetable
Whitetable is an AI tool that simplifies the hiring process by providing intelligent AI APIs for ultra-fast and optimal hiring. It offers features such as Resume Parsing API, Question API, Ranking API, and Evaluation API to streamline the recruitment process. Whitetable also provides a free AI-powered job search platform and an AI-powered ATS to help companies find the right candidates faster. With a focus on eliminating bias and improving efficiency, Whitetable is shaping the AI-driven future of hiring.
AI Resume Tailor
AI Resume Tailor is an AI-powered application designed to help job seekers create customized resumes tailored to each job description. It offers features such as resume parsing, AI-powered resume building, PDF formatting, privacy protection, and ATS-friendly templates. The platform ensures that users can easily create professional resumes that stand out to potential employers, increasing their chances of getting hired.
Eden AI
Eden AI is an AI tool designed to make AI easy for product builders. It allows users to orchestrate multiple AI models to fit their business needs. The platform offers a wide range of AI technologies such as Generative AI, Image Analysis, Text Analysis, Video Content Analysis, OCR/Document Parsing, and Speech Transcription. Users can access various AI APIs, build workflows, and integrate AI models seamlessly. Eden AI aims to simplify the process of building AI solutions for businesses by providing standardized APIs, easy integration, and cost-effective solutions.
FormX.ai
FormX.ai is an AI-powered data extraction and conversion tool that automates the process of extracting data from physical documents and converting it into digital formats. It supports a wide range of document types, including invoices, receipts, purchase orders, bank statements, contracts, HR forms, shipping orders, loyalty member applications, annual reports, business certificates, personnel licenses, and more. FormX.ai's pre-configured data extraction models and effortless API integration make it easy for businesses to integrate data extraction into their existing systems and workflows. With FormX.ai, businesses can save time and money on manual data entry and improve the accuracy and efficiency of their data processing.
Explosion
Explosion is a software company specializing in developer tools and tailored solutions for AI, Machine Learning, and Natural Language Processing (NLP). They are the makers of spaCy, one of the leading open-source libraries for advanced NLP. The company offers consulting services and builds developer tools for various AI-related tasks, such as coreference resolution, dependency parsing, image classification, named entity recognition, and more.
Imaginary Programming
Imaginary Programming is an AI tool that allows frontend developers to leverage OpenAI's GPT engine to add human-like intelligence to their code effortlessly. By defining function prototypes in TypeScript, developers can access GPT's capabilities without the need for AI model training. The tool enables users to extract structured data, generate text, classify data based on intent or emotion, and parse unstructured language. Imaginary Programming is designed to help developers tackle new challenges and enhance their projects with AI intelligence.
Rgx.tools
Rgx.tools is an AI-powered text-to-regex generator that helps users create regular expressions quickly and easily. It is a wrapper around OpenAI's gpt-3.5-chat model, which generates clean, readable, and efficient regular expressions based on user input. Rgx.tools is designed to make the process of writing regular expressions less painful and more accessible, even for those with limited experience.
LightFeed
LightFeed is an automated news hub powered by LLM technology that allows users to track, filter, and summarize news from any public website. It offers automated daily updates that can be viewed in a browser, email, or RSS format. Users can create their own news hub with a 10-day free trial and no credit card required. LightFeed employs LLMs like GPT-3.5-turbo and Llama 3 to parse, filter, and summarize web pages into structured and readable feeds. The platform also supports customization of news feeds based on user preferences and provides options for automation and scheduling.
Pare
Pare is an AI-powered platform designed to help individuals grow and manage their personal LinkedIn brand with ease. It offers features such as content scheduling, prompt library, AI-powered content creation, and personalized branding suggestions. With simple pricing and seamless brand management, Pare aims to boost engagement effortlessly for its users.
Behnevis
Behnevis is a Persian (Farsi) keyboard, editor, and speech-to-text tool. It allows users to convert Persian written in English letters (Pinglish or Finglish) to the Persian language script. Users can also convert Persian speech to text using the tool. Behnevis offers a paid premium plan with additional features, but the legacy two-part interface is still available for free without limitations.
RSS to Tweet
RSS to Tweet is an AI-powered tool that helps you automate your Twitter marketing by generating unique, ready-to-post tweets from your RSS feeds. It uses ChatGPT to create engaging and informative tweets that will help you reach a wider audience and grow your Twitter following.
Airparser
Airparser is an AI-powered email and document parser tool that revolutionizes data extraction by utilizing the GPT parser engine. It allows users to automate the extraction of structured data from various sources such as emails, PDFs, documents, and handwritten texts. With features like automatic extraction, export to multiple platforms, and support for multiple languages, Airparser simplifies data extraction processes for individuals and businesses. The tool ensures data security and offers seamless integration with other applications through APIs and webhooks.
PizzaGPT
PizzaGPT is an AI-powered chatbot specifically designed for the Italian market. It is trained on a massive dataset of Italian language and culture, enabling it to understand and respond to user queries in a natural and informative way. With PizzaGPT, users can engage in conversations, ask questions, get recommendations, and access a wealth of information on various topics.
SEOBox
SEOBox is an automated AI-based PR and link-building opportunities monitoring tool that streamlines the quote submission process to matched opportunities. By setting up targeted keywords and filters, users receive timely notifications matching their expertise, saving time and effort. The platform connects users with journalists, content managers, and writers on platforms like HARO, HelpAB2BWriter, and PASE, providing personalized PR brand mentions and link-building opportunities directly to the user's inbox. SEOBox helps users focus on responses, build connections, and enhance their online presence and expert reputation.
Boomerang for Gmail
Boomerang for Gmail is a meeting scheduling and email management tool that helps you save time and be more productive. With Boomerang, you can schedule emails to be sent later, set reminders to follow up on messages, and pause your inbox to avoid distractions. Boomerang also includes a number of AI-powered features, such as Respondable, which helps you write better emails, and Inbox Pause, which helps you manage your email flow more effectively.
20 - Open Source AI Tools
sec-parser
The `sec-parser` project simplifies extracting meaningful information from SEC EDGAR HTML documents by organizing them into semantic elements and a tree structure. It helps in parsing SEC filings for financial and regulatory analysis, analytics and data science, AI and machine learning, causal AI, and large language models. The tool is especially beneficial for AI, ML, and LLM applications by streamlining data pre-processing and feature extraction.
Scrapegraph-ai
ScrapeGraphAI is a Python library that uses Large Language Models (LLMs) and direct graph logic to create web scraping pipelines for websites, documents, and XML files. It allows users to extract specific information from web pages by providing a prompt describing the desired data. ScrapeGraphAI supports various LLMs, including Ollama, OpenAI, Gemini, and Docker, enabling users to choose the most suitable model for their needs. The library provides a user-friendly interface through its `SmartScraper` class, which simplifies the process of building and executing scraping pipelines. ScrapeGraphAI is open-source and available on GitHub, with extensive documentation and examples to guide users. It is particularly useful for researchers and data scientists who need to extract structured data from web pages for analysis and exploration.
extractous
Extractous offers a fast and efficient solution for extracting content and metadata from various document types such as PDF, Word, HTML, and many other formats. It is built with Rust, providing high performance, memory safety, and multi-threading capabilities. The tool eliminates the need for external services or APIs, making data processing pipelines faster and more efficient. It supports multiple file formats, including Microsoft Office, OpenOffice, PDF, spreadsheets, web documents, e-books, text files, images, and email formats. Extractous provides a clear and simple API for extracting text and metadata content, with upcoming support for JavaScript/TypeScript. It is free for commercial use under the Apache 2.0 License.
langchain-rust
LangChain Rust is a library for building applications with Large Language Models (LLMs) through composability. It provides a set of tools and components that can be used to create conversational agents, document loaders, and other applications that leverage LLMs. LangChain Rust supports a variety of LLMs, including OpenAI, Azure OpenAI, Ollama, and Anthropic Claude. It also supports a variety of embeddings, vector stores, and document loaders. LangChain Rust is designed to be easy to use and extensible, making it a great choice for developers who want to build applications with LLMs.
open-parse
Open Parse is a Python library for visually discerning document layouts and chunking them effectively. It is designed to fill the gap in open-source libraries for handling complex documents. Unlike text splitting, which converts a file to raw text and slices it up, Open Parse visually analyzes documents for superior LLM input. It also supports basic markdown for parsing headings, bold, and italics, and has high-precision table support, extracting tables into clean Markdown formats with accuracy that surpasses traditional tools. Open Parse is extensible, allowing users to easily implement their own post-processing steps. It is also intuitive, with great editor support and completion everywhere, making it easy to use and learn.
unstructured
The `unstructured` library provides open-source components for ingesting and pre-processing images and text documents, such as PDFs, HTML, Word docs, and many more. The use cases of `unstructured` revolve around streamlining and optimizing the data processing workflow for LLMs. `unstructured` modular functions and connectors form a cohesive system that simplifies data ingestion and pre-processing, making it adaptable to different platforms and efficient in transforming unstructured data into structured outputs.
WDoc
WDoc is a powerful Retrieval-Augmented Generation (RAG) system designed to summarize, search, and query documents across various file types. It supports querying tens of thousands of documents simultaneously, offers tailored summaries to efficiently manage large amounts of information, and includes features like supporting multiple file types, various LLMs, local and private LLMs, advanced RAG capabilities, advanced summaries, trust verification, markdown formatted answers, sophisticated embeddings, extensive documentation, scriptability, type checking, lazy imports, caching, fast processing, shell autocompletion, notification callbacks, and more. WDoc is ideal for researchers, students, and professionals dealing with extensive information sources.
wdoc
wdoc is a powerful Retrieval-Augmented Generation (RAG) system designed to summarize, search, and query documents across various file types. It aims to handle large volumes of diverse document types, making it ideal for researchers, students, and professionals dealing with extensive information sources. wdoc uses LangChain to process and analyze documents, supporting tens of thousands of documents simultaneously. The system includes features like high recall and specificity, support for various Language Model Models (LLMs), advanced RAG capabilities, advanced document summaries, and support for multiple tasks. It offers markdown-formatted answers and summaries, customizable embeddings, extensive documentation, scriptability, and runtime type checking. wdoc is suitable for power users seeking document querying capabilities and AI-powered document summaries.
sparrow
Sparrow is an innovative open-source solution for efficient data extraction and processing from various documents and images. It seamlessly handles forms, invoices, receipts, and other unstructured data sources. Sparrow stands out with its modular architecture, offering independent services and pipelines all optimized for robust performance. One of the critical functionalities of Sparrow - pluggable architecture. You can easily integrate and run data extraction pipelines using tools and frameworks like LlamaIndex, Haystack, or Unstructured. Sparrow enables local LLM data extraction pipelines through Ollama or Apple MLX. With Sparrow solution you get API, which helps to process and transform your data into structured output, ready to be integrated with custom workflows. Sparrow Agents - with Sparrow you can build independent LLM agents, and use API to invoke them from your system. **List of available agents:** * **llamaindex** - RAG pipeline with LlamaIndex for PDF processing * **vllamaindex** - RAG pipeline with LLamaIndex multimodal for image processing * **vprocessor** - RAG pipeline with OCR and LlamaIndex for image processing * **haystack** - RAG pipeline with Haystack for PDF processing * **fcall** - Function call pipeline * **unstructured-light** - RAG pipeline with Unstructured and LangChain, supports PDF and image processing * **unstructured** - RAG pipeline with Weaviate vector DB query, Unstructured and LangChain, supports PDF and image processing * **instructor** - RAG pipeline with Unstructured and Instructor libraries, supports PDF and image processing. Works great for JSON response generation
blinkid-ios
BlinkID iOS is a mobile SDK that enables developers to easily integrate ID scanning and data extraction capabilities into their iOS applications. The SDK supports scanning and processing various types of identity documents, such as passports, driver's licenses, and ID cards. It provides accurate and fast data extraction, including personal information and document details. With BlinkID iOS, developers can enhance their apps with secure and reliable ID verification functionality, improving user experience and streamlining identity verification processes.
paper-qa
PaperQA is a minimal package for question and answering from PDFs or text files, providing very good answers with in-text citations. It uses OpenAI Embeddings to embed and search documents, and includes a process of embedding docs, queries, searching for top passages, creating summaries, using an LLM to re-score and select relevant summaries, putting summaries into prompt, and generating answers. The tool can be used to answer specific questions related to scientific research by leveraging citations and relevant passages from documents.
llmware
LLMWare is a framework for quickly developing LLM-based applications including Retrieval Augmented Generation (RAG) and Multi-Step Orchestration of Agent Workflows. This project provides a comprehensive set of tools that anyone can use - from a beginner to the most sophisticated AI developer - to rapidly build industrial-grade, knowledge-based enterprise LLM applications. Our specific focus is on making it easy to integrate open source small specialized models and connecting enterprise knowledge safely and securely.
extractor
Extractor is an AI-powered data extraction library for Laravel that leverages OpenAI's capabilities to effortlessly extract structured data from various sources, including images, PDFs, and emails. It features a convenient wrapper around OpenAI Chat and Completion endpoints, supports multiple input formats, includes a flexible Field Extractor for arbitrary data extraction, and integrates with Textract for OCR functionality. Extractor utilizes JSON Mode from the latest GPT-3.5 and GPT-4 models, providing accurate and efficient data extraction.
free-for-life
A massive list including a huge amount of products and services that are completely free! ⭐ Star on GitHub • 🤝 Contribute # Table of Contents * APIs, Data & ML * Artificial Intelligence * BaaS * Code Editors * Code Generation * DNS * Databases * Design & UI * Domains * Email * Font * For Students * Forms * Linux Distributions * Messaging & Streaming * PaaS * Payments & Billing * SSL
Awesome-LLM-RAG-Application
Awesome-LLM-RAG-Application is a repository that provides resources and information about applications based on Large Language Models (LLM) with Retrieval-Augmented Generation (RAG) pattern. It includes a survey paper, GitHub repo, and guides on advanced RAG techniques. The repository covers various aspects of RAG, including academic papers, evaluation benchmarks, downstream tasks, tools, and technologies. It also explores different frameworks, preprocessing tools, routing mechanisms, evaluation frameworks, embeddings, security guardrails, prompting tools, SQL enhancements, LLM deployment, observability tools, and more. The repository aims to offer comprehensive knowledge on RAG for readers interested in exploring and implementing LLM-based systems and products.
langchainrb
Langchain.rb is a Ruby library that makes it easy to build LLM-powered applications. It provides a unified interface to a variety of LLMs, vector search databases, and other tools, making it easy to build and deploy RAG (Retrieval Augmented Generation) systems and assistants. Langchain.rb is open source and available under the MIT License.
20 - OpenAI Gpts
Japanese Hiragana Advisor
This GPT is able to parse a sentence, provide an appropriate translation of the input text and be able to provide a response explaining the structure of a sentence in japanese.
Changelog Assistant
Turns any software update info into structured changelogs in imperative tense.
Quick Code Snippet Generator
Generates concise, copy-paste code snippets quickly no unnecessary text.
BioinformaticsManual
Compile instructions from the web and github for bioinformatics applications. Receive line-by-line instructions and commands to get started
Table to JSON
我們經常在看 REST API 參考文件,文件中呈現 Request/Response 參數通常都是用表格的形式,開發人員都要手動轉換成 JSON 結構,有點小麻煩,但透過這個 GPT 只要上傳截圖就可以自動產生 JSON 範例與 JSON Schema 結構。
JSON Outputter
Takes all input into consideration and creates a JSON-appropriate response. Also useful for creating templates.
GASGPT
Soy un experto en Google Apps Script que ayuda a los principiantes, hablo principalmente español.
Idea To Code GPT
Generates a full & complete Python codebase, after clarifying questions, by following a structured section pattern.
RegExp Builder
This GPT lets you build PCRE Regular Expressions (for use the RegExp constructor).
Bot Psycho - Le pervers narcissique.
Je te parle des pervers narcissique. Je t'informe de leurs traits et de leur comportement. Je t'aide à reconnaitre les signes d'une relation toxique.