Best AI tools for< Research Linguistics >
20 - AI tool Sites
CogPrints
CogPrints is an electronic archive for self-archived papers in any area of Psychology, Neuroscience, and Linguistics, and many areas of Computer Science (e.g., artificial intelligence, robotics, vision, learning, speech, neural networks), Philosophy (e.g., mind, language, knowledge, science, logic), Biology (e.g., ethology, behavioral ecology, sociobiology, behavior genetics, evolutionary theory), Medicine (e.g., Psychiatry, Neurology, human genetics, Imaging), Anthropology (e.g., primatology, cognitive ethnology, archeology, paleontology), as well as any other portions of the physical, social and mathematical sciences that are pertinent to the study of cognition.
Wolfram|Alpha
Wolfram|Alpha is a computational knowledge engine that answers questions using data, algorithms, and artificial intelligence. It can perform calculations, generate graphs, and provide information on a wide range of topics, including mathematics, science, history, and culture. Wolfram|Alpha is used by students, researchers, and professionals around the world to solve problems, learn new things, and make informed decisions.
Wolfram
Wolfram is a comprehensive platform that unifies algorithms, data, notebooks, linguistics, and deployment to provide a powerful computation platform. It offers a range of products and services for various industries, including education, engineering, science, and technology. Wolfram is known for its revolutionary knowledge-based programming language, Wolfram Language, and its flagship product Wolfram|Alpha, a computational knowledge engine. The platform also includes Wolfram Cloud for cloud-based services, Wolfram Engine for software implementation, and Wolfram Data Framework for real-world data analysis.
NLTK
NLTK (Natural Language Toolkit) is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum. Thanks to a hands-on guide introducing programming fundamentals alongside topics in computational linguistics, plus comprehensive API documentation, NLTK is suitable for linguists, engineers, students, educators, researchers, and industry users alike.
Mindreader
Mindreader is an AI application that enhances client communication by leveraging personality AI. It helps users engage with client personalities better through quizzes, linguistics, and physiognomy. The application ensures success during client research and background analysis by combining psychology, biology, and AI. Mindreader's personality AI minimizes misunderstandings with key decision makers and guarantees effective client communication. It offers a proven profiling framework to identify clients' preferred communication styles, enabling users to build strong relationships effortlessly.
Sarvam AI
Sarvam AI is an AI application focused on leading transformative research in AI to develop, deploy, and distribute Generative AI applications in India. The platform aims to build efficient large language models for India's diverse linguistic culture and enable new GenAI applications through bespoke enterprise models. Sarvam AI is also developing an enterprise-grade platform for developing and evaluating GenAI apps, while contributing to open-source models and datasets to accelerate AI innovation.
Neoform AI
Neoform AI is an innovative AI tool that focuses on developing AI models specifically for African dialects. The platform aims to bridge the gap in AI technology by providing solutions tailored to the linguistic diversity of Africa. With a commitment to inclusivity and cultural representation, Neoform AI is revolutionizing the field of artificial intelligence by addressing the unique challenges faced by African languages. Through cutting-edge research and development, Neoform AI is paving the way for greater accessibility and accuracy in AI applications across the continent.
Grok-1.5 Vision
Grok-1.5 Vision (Grok-1.5V) is a groundbreaking multimodal AI model developed by Elon Musk's research lab, x.AI. This advanced model has the potential to revolutionize the field of artificial intelligence and shape the future of various industries. Grok-1.5V combines the capabilities of computer vision, natural language processing, and other AI techniques to provide a comprehensive understanding of the world around us. With its ability to analyze and interpret visual data, Grok-1.5V can assist in tasks such as object recognition, image classification, and scene understanding. Additionally, its natural language processing capabilities enable it to comprehend and generate human language, making it a powerful tool for communication and information retrieval. Grok-1.5V's multimodal nature sets it apart from traditional AI models, allowing it to handle complex tasks that require a combination of visual and linguistic understanding. This makes it a valuable asset for applications in fields such as healthcare, manufacturing, and customer service.
Bay Area AI
Bay Area AI is a technical AI meetup group based in San Francisco, CA, consisting of startup engineers, research scientists, computational linguists, mathematicians, and philosophers. The group focuses on understanding the meaning of text, reasoning, and human intent through technology to build new businesses and enhance the human experience in the modern connected world. They work on building systems with Machine Learning on top of Data Pipelines, exploring open-source solutions, and modeling human behavior in industry for practical results.
Google Research
Google Research is a leading research organization focusing on advancing science and artificial intelligence. They conduct research in various domains such as AI/ML foundations, responsible human-centric technology, science & societal impact, computing paradigms, and algorithms & optimization. Google Research aims to create an environment for diverse research across different time scales and levels of risk, driving advancements in computer science through fundamental and applied research. They publish hundreds of research papers annually, collaborate with the academic community, and work on projects that impact technology used by billions of people worldwide.
Google Research
Google Research is a team of scientists and engineers working on a wide range of topics in computer science, including artificial intelligence, machine learning, and quantum computing. Our mission is to advance the state of the art in these fields and to develop new technologies that can benefit society. We publish hundreds of research papers each year and collaborate with researchers from around the world. Our work has led to the development of many new products and services, including Google Search, Google Translate, and Google Maps.
Google Research Blog
The Google Research Blog is a platform for researchers at Google to share their latest work in artificial intelligence, machine learning, and other related fields. The blog covers a wide range of topics, from theoretical research to practical applications. The goal of the blog is to provide a forum for researchers to share their ideas and findings, and to foster collaboration between researchers at Google and around the world.
Research Center Trustworthy Data Science and Security
The Research Center Trustworthy Data Science and Security is a hub for interdisciplinary research focusing on building trust in artificial intelligence, machine learning, and cyber security. The center aims to develop trustworthy intelligent systems through research in trustworthy data analytics, explainable machine learning, and privacy-aware algorithms. By addressing the intersection of technological progress and social acceptance, the center seeks to enable private citizens to understand and trust technology in safety-critical applications.
Research Studio
Research Studio is a next-level UX research tool that helps you streamline your user research with AI-enhanced analysis. Whether you're a freelance UX designer, user researcher, or agency, Research Studio can help you get the insights you need to make better decisions about your products and services.
HelpMoji Research
HelpMoji Research is an AI-powered product research assistant that helps users conduct internet research without being tracked by digital advertising giants. It allows users to search for product specifications, compare products, and focus on research without being influenced by targeted ads. The tool works on all devices and browsers, providing a seamless research experience.
RapidAI Research Institute
RapidAI Research Institute is an academic institution under the RapidAI open-source organization, a non-enterprise academic institution. It serves as a platform for academic research and collaboration, providing opportunities for aspiring researchers to publish papers and engage in scholarly activities. The institute offers mentorship programs and benefits for members, including access to resources such as internet connectivity, GPU configurations, and storage space. The management team consists of esteemed professionals in the field, ensuring a conducive environment for academic growth and development.
MIRI (Machine Intelligence Research Institute)
MIRI (Machine Intelligence Research Institute) is a non-profit research organization dedicated to ensuring that artificial intelligence has a positive impact on humanity. MIRI conducts foundational mathematical research on topics such as decision theory, game theory, and reinforcement learning, with the goal of developing new insights into how to build safe and beneficial AI systems.
Branded Research
Branded Research, acquired by Dynata, provides access to AI-verified audience insights. It offers a range of research methods, including surveys, webcam studies, and emotional AI. With its advanced algorithms and extensive profiling, Branded helps businesses connect with their target audience and gain valuable insights to drive innovation. The company serves various industries, including tech, consumer goods, healthcare, and research agencies.
Berkeley Artificial Intelligence Research (BAIR) Lab
The Berkeley Artificial Intelligence Research (BAIR) Lab is a renowned research lab at UC Berkeley focusing on computer vision, machine learning, natural language processing, planning, control, and robotics. With over 50 faculty members and 300 graduate students, BAIR conducts research on fundamental advances in AI and interdisciplinary themes like multi-modal deep learning and human-compatible AI.
AIM Research
AIM Research is a leading platform providing insights and analysis on the Artificial Intelligence industry. The website offers a comprehensive range of resources, including research reports, event coverage, news articles, and expert opinions. AIM Research focuses on highlighting the latest trends, innovations, and key players in the AI sector, catering to professionals, researchers, and enthusiasts seeking in-depth knowledge and understanding of AI technologies and applications.
20 - Open Source AI Tools
probsem
ProbSem is a repository that provides a framework to leverage large language models (LLMs) for assigning context-conditional probability distributions over queried strings. It supports OpenAI engines and HuggingFace CausalLM models, and is flexible for research applications in linguistics, cognitive science, program synthesis, and NLP. Users can define prompts, contexts, and queries to derive probability distributions over possible completions, enabling tasks like cloze completion, multiple-choice QA, semantic parsing, and code completion. The repository offers CLI and API interfaces for evaluation, with options to customize models, normalize scores, and adjust temperature for probability distributions.
Prompt4ReasoningPapers
Prompt4ReasoningPapers is a repository dedicated to reasoning with language model prompting. It provides a comprehensive survey of cutting-edge research on reasoning abilities with language models. The repository includes papers, methods, analysis, resources, and tools related to reasoning tasks. It aims to support various real-world applications such as medical diagnosis, negotiation, etc.
RAGLAB
RAGLAB is a modular, research-oriented open-source framework for Retrieval-Augmented Generation (RAG) algorithms. It offers reproductions of 6 existing RAG algorithms and a comprehensive evaluation system with 10 benchmark datasets, enabling fair comparisons between RAG algorithms and easy expansion for efficient development of new algorithms, datasets, and evaluation metrics. The framework supports the entire RAG pipeline, provides advanced algorithm implementations, fair comparison platform, efficient retriever client, versatile generator support, and flexible instruction lab. It also includes features like Interact Mode for quick understanding of algorithms and Evaluation Mode for reproducing paper results and scientific research.
storm
STORM is a LLM system that writes Wikipedia-like articles from scratch based on Internet search. While the system cannot produce publication-ready articles that often require a significant number of edits, experienced Wikipedia editors have found it helpful in their pre-writing stage. **Try out our [live research preview](https://storm.genie.stanford.edu/) to see how STORM can help your knowledge exploration journey and please provide feedback to help us improve the system 🙏!**
cltk
The Classical Language Toolkit (CLTK) is a Python library that provides natural language processing (NLP) capabilities for pre-modern languages. It offers a modular processing pipeline with pre-configured defaults and supports almost 20 languages. Users can install the latest version using pip and access detailed documentation on the official website. The toolkit is designed to meet the unique needs of researchers working with historical languages, filling a void in the NLP landscape that often neglects non-spoken languages and different research goals.
Conference-Acceptance-Rate
The 'Conference-Acceptance-Rate' repository provides acceptance rates for top-tier AI-related conferences in the fields of Natural Language Processing, Computational Linguistics, Computer Vision, Pattern Recognition, Machine Learning, Learning Theory, Artificial Intelligence, Data Mining, Information Retrieval, Speech Processing, and Signal Processing. The data includes acceptance rates for long papers and short papers over several years for each conference, allowing researchers to track trends and make informed decisions about where to submit their work.
BIG-Bench-Mistake
BIG-Bench Mistake is a dataset of chain-of-thought (CoT) outputs annotated with the location of the first logical mistake. It was released as part of a research paper focusing on benchmarking LLMs in terms of their mistake-finding ability. The dataset includes CoT traces for tasks like Word Sorting, Tracking Shuffled Objects, Logical Deduction, Multistep Arithmetic, and Dyck Languages. Human annotators were recruited to identify mistake steps in these tasks, with automated annotation for Dyck Languages. Each JSONL file contains input questions, steps in the chain of thoughts, model's answer, correct answer, and the index of the first logical mistake.
awesome-deeplogic
Awesome deep logic is a curated list of papers and resources focusing on integrating symbolic logic into deep neural networks. It includes surveys, tutorials, and research papers that explore the intersection of logic and deep learning. The repository aims to provide valuable insights and knowledge on how logic can be used to enhance reasoning, knowledge regularization, weak supervision, and explainability in neural networks.
PIXIU
PIXIU is a project designed to support the development, fine-tuning, and evaluation of Large Language Models (LLMs) in the financial domain. It includes components like FinBen, a Financial Language Understanding and Prediction Evaluation Benchmark, FIT, a Financial Instruction Dataset, and FinMA, a Financial Large Language Model. The project provides open resources, multi-task and multi-modal financial data, and diverse financial tasks for training and evaluation. It aims to encourage open research and transparency in the financial NLP field.
ChatGLM3
ChatGLM3 is a conversational pretrained model jointly released by Zhipu AI and THU's KEG Lab. ChatGLM3-6B is the open-sourced model in the ChatGLM3 series. It inherits the advantages of its predecessors, such as fluent conversation and low deployment threshold. In addition, ChatGLM3-6B introduces the following features: 1. A stronger foundation model: ChatGLM3-6B's foundation model ChatGLM3-6B-Base employs more diverse training data, more sufficient training steps, and more reasonable training strategies. Evaluation on datasets from different perspectives, such as semantics, mathematics, reasoning, code, and knowledge, shows that ChatGLM3-6B-Base has the strongest performance among foundation models below 10B parameters. 2. More complete functional support: ChatGLM3-6B adopts a newly designed prompt format, which supports not only normal multi-turn dialogue, but also complex scenarios such as tool invocation (Function Call), code execution (Code Interpreter), and Agent tasks. 3. A more comprehensive open-source sequence: In addition to the dialogue model ChatGLM3-6B, the foundation model ChatGLM3-6B-Base, the long-text dialogue model ChatGLM3-6B-32K, and ChatGLM3-6B-128K, which further enhances the long-text comprehension ability, are also open-sourced. All the above weights are completely open to academic research and are also allowed for free commercial use after filling out a questionnaire.
LLMBox
LLMBox is a comprehensive library designed for implementing Large Language Models (LLMs) with a focus on a unified training pipeline and comprehensive model evaluation. It serves as a one-stop solution for training and utilizing LLMs, offering flexibility and efficiency in both training and utilization stages. The library supports diverse training strategies, comprehensive datasets, tokenizer vocabulary merging, data construction strategies, parameter efficient fine-tuning, and efficient training methods. For utilization, LLMBox provides comprehensive evaluation on various datasets, in-context learning strategies, chain-of-thought evaluation, evaluation methods, prefix caching for faster inference, support for specific LLM models like vLLM and Flash Attention, and quantization options. The tool is suitable for researchers and developers working with LLMs for natural language processing tasks.
ChatTTS
ChatTTS is a generative speech model optimized for dialogue scenarios, providing natural and expressive speech synthesis with fine-grained control over prosodic features. It supports multiple speakers and surpasses most open-source TTS models in terms of prosody. The model is trained with 100,000+ hours of Chinese and English audio data, and the open-source version on HuggingFace is a 40,000-hour pre-trained model without SFT. The roadmap includes open-sourcing additional features like VQ encoder, multi-emotion control, and streaming audio generation. The tool is intended for academic and research use only, with precautions taken to limit potential misuse.
python-tutorial-notebooks
This repository contains Jupyter-based tutorials for NLP, ML, AI in Python for classes in Computational Linguistics, Natural Language Processing (NLP), Machine Learning (ML), and Artificial Intelligence (AI) at Indiana University.
MiniCheck
MiniCheck is an efficient fact-checking tool designed to verify claims against grounding documents using large language models. It provides a sentence-level fact-checking model that can be used to evaluate the consistency of claims with the provided documents. MiniCheck offers different models, including Bespoke-MiniCheck-7B, which is the state-of-the-art and commercially usable. The tool enables users to fact-check multi-sentence claims by breaking them down into individual sentences for optimal performance. It also supports automatic prefix caching for faster inference when repeatedly fact-checking the same document with different claims.
bonito
Bonito is an open-source model for conditional task generation, converting unannotated text into task-specific training datasets for instruction tuning. It is a lightweight library built on top of Hugging Face `transformers` and `vllm` libraries. The tool supports various task types such as question answering, paraphrase generation, sentiment analysis, summarization, and more. Users can easily generate synthetic instruction tuning datasets using Bonito for zero-shot task adaptation.
ice-score
ICE-Score is a tool designed to instruct large language models to evaluate code. It provides a minimum viable product (MVP) for evaluating generated code snippets using inputs such as problem, output, task, aspect, and model. Users can also evaluate with reference code and enable zero-shot chain-of-thought evaluation. The tool is built on codegen-metrics and code-bert-score repositories and includes datasets like CoNaLa and HumanEval. ICE-Score has been accepted to EACL 2024.
babilong
BABILong is a generative benchmark designed to evaluate the performance of NLP models in processing long documents with distributed facts. It consists of 20 tasks that simulate interactions between characters and objects in various locations, requiring models to distinguish important information from irrelevant details. The tasks vary in complexity and reasoning aspects, with test samples potentially containing millions of tokens. The benchmark aims to challenge and assess the capabilities of Large Language Models (LLMs) in handling complex, long-context information.
zshot
Zshot is a highly customizable framework for performing Zero and Few shot named entity and relationships recognition. It can be used for mentions extraction, wikification, zero and few shot named entity recognition, zero and few shot named relationship recognition, and visualization of zero-shot NER and RE extraction. The framework consists of two main components: the mentions extractor and the linker. There are multiple mentions extractors and linkers available, each serving a specific purpose. Zshot also includes a relations extractor and a knowledge extractor for extracting relations among entities and performing entity classification. The tool requires Python 3.6+ and dependencies like spacy, torch, transformers, evaluate, and datasets for evaluation over datasets like OntoNotes. Optional dependencies include flair and blink for additional functionalities. Zshot provides examples, tutorials, and evaluation methods to assess the performance of the components.
Reflection_Tuning
Reflection-Tuning is a project focused on improving the quality of instruction-tuning data through a reflection-based method. It introduces Selective Reflection-Tuning, where the student model can decide whether to accept the improvements made by the teacher model. The project aims to generate high-quality instruction-response pairs by defining specific criteria for the oracle model to follow and respond to. It also evaluates the efficacy and relevance of instruction-response pairs using the r-IFD metric. The project provides code for reflection and selection processes, along with data and model weights for both V1 and V2 methods.
Cherry_LLM
Cherry Data Selection project introduces a self-guided methodology for LLMs to autonomously discern and select cherry samples from open-source datasets, minimizing manual curation and cost for instruction tuning. The project focuses on selecting impactful training samples ('cherry data') to enhance LLM instruction tuning by estimating instruction-following difficulty. The method involves phases like 'Learning from Brief Experience', 'Evaluating Based on Experience', and 'Retraining from Self-Guided Experience' to improve LLM performance.
20 - OpenAI Gpts
Word Etymology
Uncover the fascinating journeys of words with Word Etymology, your expert guide to linguistic treasures!
Research Paper Explorer
Explains Arxiv papers with examples, analogies, and direct PDF links.
Kemi - Research & Creative Assistant
I improve marketing effectiveness by designing stunning research-led assets in a flash!
Research Radar: Tracking social sciences
Spot emerging trends in the latest social science research ( (also see, just "Research Radar" for all disciplines))
AI Research Assistant
Designed to Provide Comprehensive Insights from the AI industry from Reputable Sources.
Research Proposal Maker
Research Proposal Assistant Pro is designed to provide tailored assistance in research writing.
Academic Research Reviewer
Upon uploading a research paper, I provide a concise section wise analysis covering Abstract, Lit Review, Findings, Methodology, and Conclusion. I also critique the work, highlight its strengths, and answer any open questions from my Knowledge base of Open source materials.