Best AI tools for< Data Librarian >
Infographic
20 - AI tool Sites
Elicit
Elicit is a research tool that uses artificial intelligence to help researchers analyze research papers more efficiently. It can summarize papers, extract data, and synthesize findings, saving researchers time and effort. Elicit is used by over 800,000 researchers worldwide and has been featured in publications such as Nature and Science. It is a powerful tool that can help researchers stay up-to-date on the latest research and make new discoveries.
AcademicID
AcademicID is an AI-powered platform that helps students and researchers discover and access academic resources. It provides a comprehensive database of academic papers, journals, and other resources, as well as tools to help users organize and manage their research. AcademicID also offers a variety of features to help users collaborate with others and share their research findings.
Wikidata
Wikidata is a free and open knowledge base that can be read and edited by both humans and machines. It acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others. Wikidata also provides support to many other sites and services beyond just Wikimedia projects!
arXiv
arXiv.org is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics. Materials on this site are not peer-reviewed by arXiv.
Ex Libris Products & Services
The website is a comprehensive platform offering a suite of software solutions for library management, research, teaching, and learning in the higher education ecosystem. It leverages generative AI, linked open data, and conversational discovery to optimize operations, integration, personalized experiences, and analytic insights. The platform includes various products and services such as Alma, Primo, Leganto, Rapido, Rosetta, and campusM, catering to the unique needs of academic institutions, libraries, and technology powerhouses. The website features success stories, customer testimonials, webinars, learning resources, and community engagement initiatives.
Semantic Scholar
Semantic Scholar is a free, AI-powered research tool for scientific literature. It is based at the Allen Institute for AI and provides access to over 217 million papers from all fields of science. Semantic Scholar uses AI to help users discover and explore scientific literature, and to stay up-to-date on the latest research. The tool also includes a number of features to help users manage their research, such as the ability to save papers, create bibliographies, and share research with others.
SciSpace
SciSpace is an AI-powered tool that helps researchers understand research papers better. It can explain and elaborate most academic texts in simple words. It is a great tool for students, researchers, and anyone who wants to learn more about a particular topic. SciSpace has a user-friendly interface and is easy to use. Simply upload a research paper or enter a URL, and SciSpace will do the rest. It will highlight key concepts, provide definitions, and generate a summary of the paper. SciSpace can also be used to generate citations and find related papers.
Connected Papers
Connected Papers is a search engine for academic papers that uses artificial intelligence to help users find and explore relevant research. It allows users to search for papers by keyword, author, or title, and then explore the connections between them. Connected Papers also provides a variety of tools to help users organize and manage their research, including the ability to create custom collections of papers, add notes and annotations, and share their research with others.
ResearchRabbit
ResearchRabbit is a research tool that helps researchers discover and organize academic papers. It uses artificial intelligence to recommend papers that are relevant to a researcher's interests and to visualize networks of papers and co-authorships. ResearchRabbit also allows researchers to collaborate on collections of papers and to share their findings with others.
OpenRead
OpenRead is an AI-powered research tool that helps users discover, understand, and organize scientific literature. It offers a variety of features to make research more efficient and effective, including semantic search, AI summarization, and note-taking tools. OpenRead is designed to help researchers of all levels, from students to experienced professionals, save time and improve their research outcomes.
Booltool
Booltool is a free online tool that helps you to create and manage boolean searches. With Booltool, you can easily combine multiple search terms using the AND, OR, and NOT operators to create more precise and effective searches. Booltool also provides a variety of other features to help you refine your searches, such as the ability to exclude specific terms, search within a specific domain, and limit your search to a specific date range.
Lumina
Lumina is a research tool that uses artificial intelligence to help researchers find and analyze information more quickly and easily. It can be used to search for articles, books, and other resources, and it can also be used to analyze data and create visualizations. Lumina is designed to make research more efficient and productive.
mypapers.ai
mypapers.ai is an AI tool designed to assist users in managing and analyzing academic papers efficiently. The tool offers features such as exploring papers and authors, toggling between papers and authors, and tracking the journey of research. Users can also access the code on GitHub to further enhance their research capabilities.
Scite
Scite is an award-winning platform for discovering and evaluating scientific articles via Smart Citations. Smart Citations allow users to see how a publication has been cited by providing the context of the citation and a classification describing whether it provides supporting or contrasting evidence for the cited claim.
ArxivPaperAI
ArxivPaperAI is an AI-powered research paper summarizer that helps you quickly and easily understand the key points of academic papers. With ArxivPaperAI, you can:
Open Knowledge Maps
Open Knowledge Maps is the world's largest AI-based search engine for scientific knowledge. It aims to revolutionize discovery by increasing the visibility of research findings for science and society. The platform is open and nonprofit, based on the principles of open science, with a mission to create an inclusive, sustainable, and equitable infrastructure for all users. Users can map research topics with AI, find documents, and identify concepts to enhance their literature search experience.
Dr.Oracle
Dr.Oracle is a personal AI research assistant that helps you find and understand the latest research in your field. With Dr.Oracle, you can search for research papers, track your favorite authors, and get personalized recommendations for new research. Dr.Oracle is the perfect tool for students, researchers, and anyone who wants to stay up-to-date on the latest research in their field.
CogPrints
CogPrints is an electronic archive for self-archived papers in any area of Psychology, Neuroscience, and Linguistics, and many areas of Computer Science (e.g., artificial intelligence, robotics, vision, learning, speech, neural networks), Philosophy (e.g., mind, language, knowledge, science, logic), Biology (e.g., ethology, behavioral ecology, sociobiology, behavior genetics, evolutionary theory), Medicine (e.g., Psychiatry, Neurology, human genetics, Imaging), Anthropology (e.g., primatology, cognitive ethnology, archeology, paleontology), as well as any other portions of the physical, social and mathematical sciences that are pertinent to the study of cognition.
Perplexica
Perplexica is an AI-powered search tool designed to help users discover content within a library efficiently. The tool utilizes advanced algorithms to provide accurate and relevant search results, making it easier for users to find the information they need quickly and easily. With a user-friendly interface and powerful search capabilities, Perplexica is a valuable tool for researchers, students, and anyone looking to access information within a library or database.
vLex
vLex is a legal intelligence platform that offers innovative technology and comprehensive content to empower legal professionals. With over 30 years of experience, vLex provides cutting-edge AI legal solutions designed to streamline workflows, enhance access to legal knowledge, and drive success. The platform integrates advanced technology with extensive legal databases to deliver unparalleled legal intelligence and resources. Vincent AI, a key feature of vLex, leverages generative AI to automate tasks such as workflow optimization, drafting, research, and comparative legal analysis across multiple jurisdictions.
20 - Open Source Tools
opendataeditor
The Open Data Editor (ODE) is a no-code application to explore, validate and publish data in a simple way. It is an open source project powered by the Frictionless Framework. The ODE is currently available for download and testing in beta.
asreview
The ASReview project implements active learning for systematic reviews, utilizing AI-aided pipelines to assist in finding relevant texts for search tasks. It accelerates the screening of textual data with minimal human input, saving time and increasing output quality. The software offers three modes: Oracle for interactive screening, Exploration for teaching purposes, and Simulation for evaluating active learning models. ASReview LAB is designed to support decision-making in any discipline or industry by improving efficiency and transparency in screening large amounts of textual data.
Awesome-Code-LLM
Analyze the following text from a github repository (name and readme text at end) . Then, generate a JSON object with the following keys and provide the corresponding information for each key, in lowercase letters: 'description' (detailed description of the repo, must be less than 400 words,Ensure that no line breaks and quotation marks.),'for_jobs' (List 5 jobs suitable for this tool,in lowercase letters), 'ai_keywords' (keywords of the tool,user may use those keyword to find the tool,in lowercase letters), 'for_tasks' (list of 5 specific tasks user can use this tool to do,in lowercase letters), 'answer' (in english languages)
SirChatalot
A Telegram bot that proves you don't need a body to have a personality. It can use various text and image generation APIs to generate responses to user messages. For text generation, the bot can use: * OpenAI's ChatGPT API (or other compatible API). Vision capabilities can be used with GPT-4 models. Function calling can be used with Function calling. * Anthropic's Claude API. Vision capabilities can be used with Claude 3 models. Function calling can be used with tool use. * YandexGPT API Bot can also generate images with: * OpenAI's DALL-E * Stability AI * Yandex ART This bot can also be used to generate responses to voice messages. Bot will convert the voice message to text and will then generate a response. Speech recognition can be done using the OpenAI's Whisper model. To use this feature, you need to install the ffmpeg library. This bot is also support working with files, see Files section for more details. If function calling is enabled, bot can generate images and search the web (limited).
warc-gpt
WARC-GPT is an experimental retrieval augmented generation pipeline for web archive collections. It allows users to interact with WARC files, extract text, generate text embeddings, visualize embeddings, and interact with a web UI and API. The tool is highly customizable, supporting various LLMs, providers, and embedding models. Users can configure the application using environment variables, ingest WARC files, start the server, and interact with the web UI and API to search for content and generate text completions. WARC-GPT is designed for exploration and experimentation in exploring web archives using AI.
breadboard
Breadboard is a library for prototyping generative AI applications. It is inspired by the hardware maker community and their boundless creativity. Breadboard makes it easy to wire prototypes and share, remix, reuse, and compose them. The library emphasizes ease and flexibility of wiring, as well as modularity and composability.
data-scientist-roadmap2024
The Data Scientist Roadmap2024 provides a comprehensive guide to mastering essential tools for data science success. It includes programming languages, machine learning libraries, cloud platforms, and concepts categorized by difficulty. The roadmap covers a wide range of topics from programming languages to machine learning techniques, data visualization tools, and DevOps/MLOps tools. It also includes web development frameworks and specific concepts like supervised and unsupervised learning, NLP, deep learning, reinforcement learning, and statistics. Additionally, it delves into DevOps tools like Airflow and MLFlow, data visualization tools like Tableau and Matplotlib, and other topics such as ETL processes, optimization algorithms, and financial modeling.
femtoGPT
femtoGPT is a pure Rust implementation of a minimal Generative Pretrained Transformer. It can be used for both inference and training of GPT-style language models using CPUs and GPUs. The tool is implemented from scratch, including tensor processing logic and training/inference code of a minimal GPT architecture. It is a great start for those fascinated by LLMs and wanting to understand how these models work at deep levels. The tool uses random generation libraries, data-serialization libraries, and a parallel computing library. It is relatively fast on CPU and correctness of gradients is checked using the gradient-check method.
litdata
LitData is a tool designed for blazingly fast, distributed streaming of training data from any cloud storage. It allows users to transform and optimize data in cloud storage environments efficiently and intuitively, supporting various data types like images, text, video, audio, geo-spatial, and multimodal data. LitData integrates smoothly with frameworks such as LitGPT and PyTorch, enabling seamless streaming of data to multiple machines. Key features include multi-GPU/multi-node support, easy data mixing, pause & resume functionality, support for profiling, memory footprint reduction, cache size configuration, and on-prem optimizations. The tool also provides benchmarks for measuring streaming speed and conversion efficiency, along with runnable templates for different data types. LitData enables infinite cloud data processing by utilizing the Lightning.ai platform to scale data processing with optimized machines.
sql-eval
This repository contains the code that Defog uses for the evaluation of generated SQL. It's based off the schema from the Spider, but with a new set of hand-selected questions and queries grouped by query category. The testing procedure involves generating a SQL query, running both the 'gold' query and the generated query on their respective database to obtain dataframes with the results, comparing the dataframes using an 'exact' and a 'subset' match, logging these alongside other metrics of interest, and aggregating the results for reporting. The repository provides comprehensive instructions for installing dependencies, starting a Postgres instance, importing data into Postgres, importing data into Snowflake, using private data, implementing a query generator, and running the test with different runners.
machine-learning-research
The 'machine-learning-research' repository is a comprehensive collection of resources related to mathematics, machine learning, deep learning, artificial intelligence, data science, and various scientific fields. It includes materials such as courses, tutorials, books, podcasts, communities, online courses, papers, and dissertations. The repository covers topics ranging from fundamental math skills to advanced machine learning concepts, with a focus on applications in healthcare, genetics, computational biology, precision health, and AI in science. It serves as a valuable resource for individuals interested in learning and researching in the fields of machine learning and related disciplines.
sycamore
Sycamore is a conversational search and analytics platform for complex unstructured data, such as documents, presentations, transcripts, embedded tables, and internal knowledge repositories. It retrieves and synthesizes high-quality answers through bringing AI to data preparation, indexing, and retrieval. Sycamore makes it easy to prepare unstructured data for search and analytics, providing a toolkit for data cleaning, information extraction, enrichment, summarization, and generation of vector embeddings that encapsulate the semantics of data. Sycamore uses your choice of generative AI models to make these operations simple and effective, and it enables quick experimentation and iteration. Additionally, Sycamore uses OpenSearch for indexing, enabling hybrid (vector + keyword) search, retrieval-augmented generation (RAG) pipelining, filtering, analytical functions, conversational memory, and other features to improve information retrieval.
llm-course
The LLM course is divided into three parts: 1. 🧩 **LLM Fundamentals** covers essential knowledge about mathematics, Python, and neural networks. 2. 🧑🔬 **The LLM Scientist** focuses on building the best possible LLMs using the latest techniques. 3. 👷 **The LLM Engineer** focuses on creating LLM-based applications and deploying them. For an interactive version of this course, I created two **LLM assistants** that will answer questions and test your knowledge in a personalized way: * 🤗 **HuggingChat Assistant**: Free version using Mixtral-8x7B. * 🤖 **ChatGPT Assistant**: Requires a premium account. ## 📝 Notebooks A list of notebooks and articles related to large language models. ### Tools | Notebook | Description | Notebook | |----------|-------------|----------| | 🧐 LLM AutoEval | Automatically evaluate your LLMs using RunPod | ![Open In Colab](img/colab.svg) | | 🥱 LazyMergekit | Easily merge models using MergeKit in one click. | ![Open In Colab](img/colab.svg) | | 🦎 LazyAxolotl | Fine-tune models in the cloud using Axolotl in one click. | ![Open In Colab](img/colab.svg) | | ⚡ AutoQuant | Quantize LLMs in GGUF, GPTQ, EXL2, AWQ, and HQQ formats in one click. | ![Open In Colab](img/colab.svg) | | 🌳 Model Family Tree | Visualize the family tree of merged models. | ![Open In Colab](img/colab.svg) | | 🚀 ZeroSpace | Automatically create a Gradio chat interface using a free ZeroGPU. | ![Open In Colab](img/colab.svg) |
LLM-PowerHouse-A-Curated-Guide-for-Large-Language-Models-with-Custom-Training-and-Inferencing
LLM-PowerHouse is a comprehensive and curated guide designed to empower developers, researchers, and enthusiasts to harness the true capabilities of Large Language Models (LLMs) and build intelligent applications that push the boundaries of natural language understanding. This GitHub repository provides in-depth articles, codebase mastery, LLM PlayLab, and resources for cost analysis and network visualization. It covers various aspects of LLMs, including NLP, models, training, evaluation metrics, open LLMs, and more. The repository also includes a collection of code examples and tutorials to help users build and deploy LLM-based applications.
holoscan-sdk
The Holoscan SDK is part of NVIDIA Holoscan, the AI sensor processing platform that combines hardware systems for low-latency sensor and network connectivity, optimized libraries for data processing and AI, and core microservices to run streaming, imaging, and other applications, from embedded to edge to cloud. It can be used to build streaming AI pipelines for a variety of domains, including Medical Devices, High Performance Computing at the Edge, Industrial Inspection and more.
data-prep-kit
Data Prep Kit is a community project aimed at democratizing and speeding up unstructured data preparation for LLM app developers. It provides high-level APIs and modules for transforming data (code, language, speech, visual) to optimize LLM performance across different use cases. The toolkit supports Python, Ray, Spark, and Kubeflow Pipelines runtimes, offering scalability from laptop to datacenter-scale processing. Developers can contribute new custom modules and leverage the data processing library for building data pipelines. Automation features include workflow automation with Kubeflow Pipelines for transform execution.
awesome-mlops
Awesome MLOps is a curated list of tools related to Machine Learning Operations, covering areas such as AutoML, CI/CD for Machine Learning, Data Cataloging, Data Enrichment, Data Exploration, Data Management, Data Processing, Data Validation, Data Visualization, Drift Detection, Feature Engineering, Feature Store, Hyperparameter Tuning, Knowledge Sharing, Machine Learning Platforms, Model Fairness and Privacy, Model Interpretability, Model Lifecycle, Model Serving, Model Testing & Validation, Optimization Tools, Simplification Tools, Visual Analysis and Debugging, and Workflow Tools. The repository provides a comprehensive collection of tools and resources for individuals and teams working in the field of MLOps.
20 - OpenAI Gpts
Smart Sorter
A versatile, user-friendly Sorting Bot for diverse data types, prioritizing privacy and adaptability.
Theses without Subject Discipline info UK
Expert in analyzing UK theses data without subject info
Ordinals API
Knows the docs and can query official ordinal endpoints—Sat Numbers, Inscription IDs, and more.
Adulting with Social Anxiety
Navigating everyday life with friendly support for social anxiety.
Paper Interpreter (international)
Automatically structure and decode academic papers with ease - simply upload a PDF!
Eureka Research Assessment and Improvement
AI tool for self-evaluating and enhancing scientific research capabilities.