data:image/s3,"s3://crabby-images/74c83/74c83df2ebf176f02fdd6a78b77f5efae33d2d47" alt="MisguidedAttention"
MisguidedAttention
A collection of prompts to challenge the reasoning abilities of large language models in presence of misguiding information
Stars: 321
data:image/s3,"s3://crabby-images/c60e9/c60e9608e0ab8683305b5b2fc09d306e673cb8eb" alt="screenshot"
MisguidedAttention is a collection of prompts designed to challenge the reasoning abilities of large language models by presenting them with modified versions of well-known thought experiments, riddles, and paradoxes. The goal is to assess the logical deduction capabilities of these models and observe any shortcomings or fallacies in their responses. The repository includes a variety of prompts that test different aspects of reasoning, such as decision-making, probability assessment, and problem-solving. By analyzing how language models handle these challenges, researchers can gain insights into their reasoning processes and potential biases.
README:
Version 0.3 (January 2025) - Evaluation
This is a collection of prompts to challenge the reasoning abilities of large language models in presence of misguiding information. They are slight variations of commonly known thought experiments, riddles or paradoxes ("trick questions").
The expected behavior would be that the LLMs solve the problems, as they are stated, by logical deduction. However, many LLMs will mistakenly recognize the unmodified problem due to frequent occurrence in their training data. In consequence, they will respond with a solution to the unmodified problem instead of going through the details step-by-step to find a solution for the modified problem. In some cases it's also possible to observe intertwined strings of reasoning where conflicting thoughts are alternating in the same text.
Parallels to this can be drawn to human behavior, where recognition of familiar patterns leads to the execution of previously learned routines, even if they are not applicable to the current situation. This is known as the Einstellungseffekt. However, we would expect that a computerized reasoning system would not be subject to such a fallacy...
Feel free to contribute more prompts or suggest improvements! Open an issue or start a discussion.
The ability of LLMs to solve these problems has consistently improved over time, especially with the introduction of internal chain-of-though reasoning in OpenAIs o1 models that allows to correct for earlier mistakes in the response.
To track these changes better, I started to set up an evaluation benchmark of a subset. You can find the current status in the evaluation folder. In addition, I will tag releases of this dataset to track the progress over time.
For reference here are links to explanations of some of the original unmodified problems:
- Trolley problem: https://en.wikipedia.org/wiki/Trolley_problem
- Monty Hall problem: https://en.wikipedia.org/wiki/Monty_Hall_problem
- Barber paradox: https://en.wikipedia.org/wiki/Barber_paradox
- Schrödingers cat: https://en.wikipedia.org/wiki/Schr%C3%B6dinger%27s_cat
- Unexpected hanging paradox: https://en.wikipedia.org/wiki/Unexpected_hanging_paradox
- River crossing puzzle: https://en.wikipedia.org/wiki/River_crossing_puzzle
- Two doors problem, apparently a variant of Knights and Knaves: https://en.wikipedia.org/wiki/Knights_and_Knaves from this movie: https://en.wikipedia.org/wiki/Labyrinth_(1986_film)
- Water pouring puzzle: https://en.wikipedia.org/wiki/Water_pouring_puzzle
- Rope burning puzzle: https://en.wikipedia.org/wiki/Rope_burning_puzzle
- Bridge and torch problem: https://en.wikipedia.org/wiki/Bridge_and_torch_problem
"Imagine a runaway trolley is hurtling down a track towards five dead people. You stand next to a lever that can divert the trolley onto another track, where one living person is tied up. Do you pull the lever?"
"Imagine you're on a game show, and there are three doors in front of you. Behind one door is a car, and behind the other two doors are goats. You don't know what's behind any of the doors. You get to choose one door. Let's say you pick Door #1. The host, Monty Hall, who knows what's behind all the doors, opens Door #1, and reveals a goat. Now, you have two doors left: Door #3 and Door #2. You pick Door #3. Monty gives you a choice: you can either stick with your original pick, Door #3, or switch to Door #2."
Thanks to u/TheHoboJed for this one.
"You're on a game show and are presented with three doors. Behind one is a donkey, and behind the other two are luxury cars. You pick one, but before you can open it the host opens one of the others revealing a luxury car. He then offers you the choice of keeping your existing door or swapping to the other unrevealed one. What should you do to win a car?"
Most LLMs will come up with a strategy to win the donkey instead of the car. (Take a look at this artifact to illustrate the difference between normal and Inverse Monty Hall)
"Imagine there's a small town with a very particular barber. This barber has a unique rule: he shaves all the men in town who visit him. Does the barber shave himself?"
"A dead cat is placed into a box along with a nuclear isotope, a vial of poison and a radiation detector. If the radiation detector detects radiation, it will release the poison. The box is opened one day later. What is the probability of the cat being alive?"
"Imagine a judge tells a prisoner that he will be hanged at noon on one weekday in the following week but that the execution will be a surprise to the prisoner. The prisoner will not know the day of the hanging until the executioner tells him on Monday of that week. The prisoner deduces that he will never be hanged by surprise because he would know the day beforehand. The prisoner is executed on a Friday. Was the execution a surprise to the prisoner?"
There is still some room for interpretation in this question. Confusing answers by all LLMs
Thanks to /u/Hugi_R for inspiring this one
"A farmer is on one side of a river with a wolf, a goat, and a cabbage. When he is crossing the river in a boat, he can only take one item with him at a time. The wolf will eat the goat if left alone together, and the goat will eat the cabbage if left alone together. How can the farmer transport the goat across the river without it being eaten?"
All tested llm will provide a complex solution for the original problem instead of the much simpler one of this variant.
An even simpler version of the prompt above, thanks to @DrChristophFH.
"There is a man, a sheep and a boat with space for one human and one animal on one side of a river. How do the man and sheep get to the other side of the river?"
Most if not all LLMs will come up with overly complex scenarios.
Further simplification of the prompt to ensure that there are even fewer opportunities to misunderstand the objective.
"A man with his sheep wants to cross a river. He has a boat that can carry both him and the animal. How do both get to the other side of the river?"
Some LLMs get it, most will still come up with messy solutions. Most llms will hallucinate solutions that involve multiple combinations of back-and-forth trips. Also both subjects eating is a concern that is tried to be addressed, with sometimes hilarious results: e.g. the man needs to be prevented from eating the sheep, or the sheep shall not eat grass.
Thank to /u/hvoecking for this one.
"I have a 6- and a 12-liter jug. I want to measure exactly 6 liters."
Some LLMs will get this right, others come up with amazing ways to make things complicated.
A variation of the prompt above.
"I have a 6- and a 12-liter jug. I want to measure exactly 4 liters."
Most LLMs will try various combinations of filling and emptying the jugs, instead of trying to figure out of this is possible at all.
"I have a 1- and a 2-liter jug. I want to measure exactly 3 liters."
Most LLMs will come up with overly complex nonsense, triggered by a need to write itemized lists.
"I have a 1 liter jug and another 1-liter jug. I want to measure exactly 1 liters."
A most basic version of the jug problem, that still triggers list writing in many smaller LLMs.
"i have a roasting-jug that can hold 300 nuts and a roasting jug that can hold 700 nuts. I also have a digital kitchen scale. i want to roast exactly 600 nuts. what do i do?"
"You are in a room with two doors. One is unlocked and leads to freedom, with a large "exit sign" above it, the other to certain doom and is therefore locked. There are two guards: one always tells the truth, and the other always lies. You don't know which is which. You can ask one guard one question or just leave. What do you do?"
Almost all llms would strike up an unnecessary discussion instead of leaving quietly.
Thanks to /u/Avo-ka for this one!
"Which is heavier, 1 kilogram of feathers or 1 pound of steel?"
The large LLMs seem to be able to solve this, but many smaller ones don't get the difference.
"I have 13 coins, one of them is fake. I also have a digital scale. How do I identify the fake coin?"
All LLMs will produce confusing instructions based on a mechanical scale that can only compare weights. They also do not understand how to partition the 13 coins.
"You have two ropes, each of which takes exactly 60 minutes to burn completely. However, the ropes burn unevenly, meaning some parts may burn faster or slower than others. You have no other timing device. How can you measure exactly 20 minutes using these two ropes and matches to light them?"
There is no clear solution to this problem, yet most LLMs will find one.
"You have two ropes, each of which takes exactly 60 minutes to burn completely. However, the ropes burn unevenly, meaning some parts may burn faster or slower than others. You have no other timing device. How can you measure exactly 60 minutes using these two ropes and matches to light them?"
There is a very simple solution to this problem, yet most LLMs will find a complex one or incorrect one.
"How do i use a rope to measure 10 minutes?"
Valid solutions do not involve using a rope with a burn time of 60 minutes
Most of these were contributed by @av. Thanks a lot!
"In a room of 30 people, what's the probability that at least two do not share a birthday?"
Most LLMs will provide a solution to the original birthday problem instead.
"If it takes 50 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?"
"You are locked in a dungeon. A king decides to show you mercy - he summons you and tells you the rules of the challenge to your escape: 'There are three doors, guarded by a guard that always tells the truth. You may ask the guard one question. Two of the doors lead to certain and painful death. One door leads to your escape.' As you approach the doors, the guard, that knows which doors lead to death and which to escape, says to you 'choose the door that you think leads to your escape'. After you choose it, the guard opens the other door, that leads to certain death. Now there are two closed doors - one leading to escape, another to certain death. The guard allows you to change the door."
"How do you maximize your chances of escape?"
"I have a 7 litre bucket that is missing a bottom, and the top was welded and sealed shut. How much water can I hold in it?"
Thanks to @TheJzuken for these two.
"Four people come to a rickety bridge at night. The bridge can only support two people at a time, and any group crossing must carry the single torch they share to light their way. Person A takes 1 minute to cross, Person B takes 3 minutes, Person C takes 5 minutes, and Person D takes 10 minutes. When two people cross together, they must move at the slower person's pace. For example, if Person A and D cross together, it takes them 10 minutes. After a crossing, someone must bring the torch back for anyone still waiting. The challenge is to get all four people safely across the bridge in no more than 17 minutes. How can they do it?"
"Four people come to a rickety bridge at night. The bridge can support four people at a time, and any group crossing must carry the single torch they share to light their way. Person A takes 1 minute to cross, Person B takes 3 minutes, Person C takes 5 minutes, and Person D takes 10 minutes. When four people cross together, they must move at the slowest person's pace. For example, if Person A and D cross together, it takes them 10 minutes. After a crossing, someone must bring the torch back for anyone still waiting. The challenge is to get all four people safely across the bridge in no more than 17 minutes. How can they do it?"
Suggested by eyTns
"You arrive on an island inhabited solely by two types of people - Knights who always tell the truth, and Knaves who always lie. Standing at a fork in the road, you meet two inhabitants named A and B. A says "B is a Knave." B says "A is telling the truth." You need to determine who is who to find the correct path."
"You arrive on an island inhabited solely by two types of people - Knights who always tell the truth, and Knaves who always lie. Standing at a fork in the road, you meet two inhabitants named A and B. A says "B is a Knave." B says "A is a liar." You need to determine who is who to find the correct path."
"Two girls went to dinner together and both ordered hot tea. One girl pounded down five of them in about a minute, and the other took her time drinking one. The girl who drank one died, while the girl who drank five survived. However, all of the drinks that were served turned out to contain poison. Why did the girl that drank more hot tea survive?"
Thanks to /u/WiSaGaN for the cue.
If you wonder about the correct solution for this one: There are not ice cubes!
"You're a rabbit and are presented with three rabbit holes. In one is a fox, out to eat you. In the other two there are large stashes of delicious carrots. You pick one, but before you enter it, god reveals a stash of carrots on one of the two others. He then offers you the choice of keeping your selected hole or swapping to the other unrevealed one. What should you do to minimize your chances of being eaten?"
"A farmer is at a river with a wolf, a goat, and a cabbage. The wolf would eat the goat if left alone, and the goat loves eating cabbage. What can the farmer do to feed the goat?"
"A farmer is at a river with a wolf, a goat, and a cabbage. The wolf is strictly vegetarian and loves cabbage. The goat is very protective of vegetables. How can the farmer get the cabbage to the wolf?"
"A farmer is at a river with a wolf, a goat, and a cabbage. The cabbage is actually an undercover detective investigating vegetable theft. The wolf and goat are best friends who run a successful food import business. How can the farmer help the detective gather evidence?"
Claude Sonnet suggested the last two. Turns out that many LLMs will somehow inject a river crossing puzzle solution into these scenarios.
"Doom Slayer needs to teleport from Phobos to Deimos. He has his pet bunny, his pet cacodemon, and a UAC scientist who tagged along. The Doom Slayer can only teleport with one of them at a time. But if he leaves the bunny and the cacodemon together alone, the bunny will eat the cacodemon. And if he leaves the cacodemon and the scientist alone, the cacodemon will eat the scientist. How should the Doom Slayer get himself and all his companions safely to Deimos?"
Suggested by int19h.
"I stole a ball and a bat that together cost $1.10. The bat is $1 more than the ball. What did I pay for the ball?"
Reference: https://en.wikipedia.org/wiki/Cognitive_reflection_test
"Which is heavier, 1 kilogram of steel or 1 feather?"
"Linda is 31 years old, single, outspoken, active in the feminist movement and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable?"
- A) Linda is a bank teller.
- B) Linda is a bank teller and is active in the feminist movement.
"Linda is 31 years old, single, outspoken, not active in the feminist movement, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable?"
- A) Linda is a bank teller and is active in the feminist movement.
- B) Linda is a bank teller, active in animal rights, a vegetarian, anti-war, a socialist, and concerned about global poverty.
"Linda is 31 years old, single, outspoken, not a bank teller, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations. Which is more probable?"
- A) Linda is a bank teller.
- B) Linda is a bank teller and is active in the feminist movement.
Suggested by Jona Sassenhagen See here for details
"The number of lotus flowers in the pond increases by two every day. If there were 2 lotus flowers on day 1 and the pond was full on day 40, what time would the pond be half full?"
"There is a car behind one door and a goat behind the other two. You picked all the doors 1, 2, and 3, and the host opened the door 2 to show that there is a goat. So, would you change your choice?"
"A pair of rabbits give birth to two baby rabbits each year from two years after birth. If you had one rabbit, how many would it be in 7 years?"
"There are 3 sticks. There are 3 disks on the leftmost stick, in order from large to small. What is the minimum number of disk moves to move them to the rightmost stick?"
*"An athlete and a tortoise compete for a run. The distance is 100 meters, but the athlete give 100 meters head-start to the tortoise, and the tortoise's speed is twice the athlete's speed. Can athlete catch up with the tortoise?"
"A and B take turns naming increasing numbers. Each player can name as many increasing numbers as they want. The first player starts with 1. Then B takes over and so on. The person who says 31 loses. What should A do to start first and win?"
All of the ones above by eyTns
As suggested by @av, also riddles can be used as a basis for prompts that challenge the reasoning abilities of LLMs.
"I'm tall when I'm young, and I'm taller when I'm old. What am I?"
Definitely not a candle
"What can't you break, even if you never pick it up or touch it?"
Definitely not a promise
"What goes up but never comes up again?"
Definitely not your age
"I never shave, but my beard stays the same. What am I?"
Definitely not a barber
"What has two banks and money?"
Definitely not a river
"What walks on four legs in the morning, four in the afternoon, and four in the evening?"
Definitely not a human
"What occurs once in a second, twice in a moment, but never in a thousand years?"
The answer is not "letter M"
Variant 1: "What happens when a stoppable force meets an immovable object?" Variant 2: "What happens when a unstoppable force meets a movable object?"
The first variant seems to throw off more llms.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for MisguidedAttention
Similar Open Source Tools
data:image/s3,"s3://crabby-images/c60e9/c60e9608e0ab8683305b5b2fc09d306e673cb8eb" alt="MisguidedAttention Screenshot"
MisguidedAttention
MisguidedAttention is a collection of prompts designed to challenge the reasoning abilities of large language models by presenting them with modified versions of well-known thought experiments, riddles, and paradoxes. The goal is to assess the logical deduction capabilities of these models and observe any shortcomings or fallacies in their responses. The repository includes a variety of prompts that test different aspects of reasoning, such as decision-making, probability assessment, and problem-solving. By analyzing how language models handle these challenges, researchers can gain insights into their reasoning processes and potential biases.
data:image/s3,"s3://crabby-images/10380/10380d35f464539677aa67c1a6901d5393f81df6" alt="Deej-AI Screenshot"
Deej-AI
Deej-A.I. is an advanced machine learning project that aims to revolutionize music recommendation systems by using artificial intelligence to analyze and recommend songs based on their content and characteristics. The project involves scraping playlists from Spotify, creating embeddings of songs, training neural networks to analyze spectrograms, and generating recommendations based on similarities in music features. Deej-A.I. offers a unique approach to music curation, focusing on the 'what' rather than the 'how' of DJing, and providing users with personalized and creative music suggestions.
data:image/s3,"s3://crabby-images/c27df/c27df8709cce748bef0d1eb0b0ff5736652dd0e4" alt="blackmarlin Screenshot"
blackmarlin
Black Marlin is a UCI compliant chess engine fully written in Rust by Doruk Sekercioglu. It supports Chess960 and features a variety of search algorithms, pruning techniques, and evaluation methods. Black Marlin is designed to be efficient and accurate, and it has been shown to perform well against other top chess engines.
data:image/s3,"s3://crabby-images/cb556/cb556709c8e79f04b18647c4da57f64a5c887c99" alt="WritingAIPaper Screenshot"
WritingAIPaper
WritingAIPaper is a comprehensive guide for beginners on crafting AI conference papers. It covers topics like paper structure, core ideas, framework construction, result analysis, and introduction writing. The guide aims to help novices navigate the complexities of academic writing and contribute to the field with clarity and confidence. It also provides tips on readability improvement, logical strength, defensibility, confusion time reduction, and information density increase. The appendix includes sections on AI paper production, a checklist for final hours, common negative review comments, and advice on dealing with paper rejection.
data:image/s3,"s3://crabby-images/84aca/84aca1f9720abbe850f1950d9afafcb614bb2e16" alt="WeeaBlind Screenshot"
WeeaBlind
Weeablind is a program that uses modern AI speech synthesis, diarization, language identification, and voice cloning to dub multi-lingual media and anime. It aims to create a pleasant alternative for folks facing accessibility hurdles such as blindness, dyslexia, learning disabilities, or simply those that don't enjoy reading subtitles. The program relies on state-of-the-art technologies such as ffmpeg, pydub, Coqui TTS, speechbrain, and pyannote.audio to analyze and synthesize speech that stays in-line with the source video file. Users have the option of dubbing every subtitle in the video, setting the start and end times, dubbing only foreign-language content, or full-blown multi-speaker dubbing with speaking rate and volume matching.
data:image/s3,"s3://crabby-images/2b9d6/2b9d6dd80a0ce29e7324e692ef59ad190ee3c22a" alt="dota2ai Screenshot"
dota2ai
The Dota2 AI Framework project aims to provide a framework for creating AI bots for Dota2, focusing on coordination and teamwork. It offers a LUA sandbox for scripting, allowing developers to code bots that can compete in standard matches. The project acts as a proxy between the game and a web service through JSON objects, enabling bots to perform actions like moving, attacking, casting spells, and buying items. It encourages contributions and aims to enhance the AI capabilities in Dota2 modding.
data:image/s3,"s3://crabby-images/f20fa/f20fa9c041209174de5356111839033796c01a4d" alt="bidirectional_streaming_ai_voice Screenshot"
bidirectional_streaming_ai_voice
This repository contains Python scripts that enable two-way voice conversations with Anthropic Claude, utilizing ElevenLabs for text-to-speech, Faster-Whisper for speech-to-text, and Pygame for audio playback. The tool operates by transcribing human audio using Faster-Whisper, sending the transcription to Anthropic Claude for response generation, and converting the LLM's response into audio using ElevenLabs. The audio is then played back through Pygame, allowing for a seamless and interactive conversation between the user and the AI. The repository includes variations of the main script to support different operating systems and configurations, such as using CPU transcription on Linux or employing the AssemblyAI API instead of Faster-Whisper.
data:image/s3,"s3://crabby-images/91643/9164303d039dc51ff140d4160f442a6ebf40cdd8" alt="deep-seek Screenshot"
deep-seek
DeepSeek is a new experimental architecture for a large language model (LLM) powered internet-scale retrieval engine. Unlike current research agents designed as answer engines, DeepSeek aims to process a vast amount of sources to collect a comprehensive list of entities and enrich them with additional relevant data. The end result is a table with retrieved entities and enriched columns, providing a comprehensive overview of the topic. DeepSeek utilizes both standard keyword search and neural search to find relevant content, and employs an LLM to extract specific entities and their associated contents. It also includes a smaller answer agent to enrich the retrieved data, ensuring thoroughness. DeepSeek has the potential to revolutionize research and information gathering by providing a comprehensive and structured way to access information from the vastness of the internet.
data:image/s3,"s3://crabby-images/1fde6/1fde66fa4978e099573328ac16f2e48c6b0dbaf6" alt="commonplace-bot Screenshot"
commonplace-bot
Commonplace Bot is a modern representation of the commonplace book, leveraging modern technological advancements in computation, data storage, machine learning, and networking. It aims to capture, engage, and share knowledge by providing a platform for users to collect ideas, quotes, and information, organize them efficiently, engage with the data through various strategies and triggers, and transform the data into new mediums for sharing. The tool utilizes embeddings and cached transformations for efficient data storage and retrieval, flips traditional engagement rules by engaging with the user, and enables users to alchemize raw data into new forms like art prompts. Commonplace Bot offers a unique approach to knowledge management and creative expression.
data:image/s3,"s3://crabby-images/fbc95/fbc956e5087c07c866f4e0b4babf9eb193f0b945" alt="ClipboardConqueror Screenshot"
ClipboardConqueror
Clipboard Conqueror is a multi-platform omnipresent copilot alternative. Currently requiring a kobold united or openAI compatible back end, this software brings powerful LLM based tools to any text field, the universal copilot you deserve. It simply works anywhere. No need to sign in, no required key. Provided you are using local AI, CC is a data secure alternative integration provided you trust whatever backend you use. *Special thank you to the creators of KoboldAi, KoboldCPP, llamma, openAi, and the communities that made all this possible to figure out.
data:image/s3,"s3://crabby-images/de20b/de20b62e393896c07b6f74646d45acb04890a5a2" alt="qlora-pipe Screenshot"
qlora-pipe
qlora-pipe is a pipeline parallel training script designed for efficiently training large language models that cannot fit on one GPU. It supports QLoRA, LoRA, and full fine-tuning, with efficient model loading and the ability to load any dataset that Axolotl can handle. The script allows for raw text training, resuming training from a checkpoint, logging metrics to Tensorboard, specifying a separate evaluation dataset, training on multiple datasets simultaneously, and supports various models like Llama, Mistral, Mixtral, Qwen-1.5, and Cohere (Command R). It handles pipeline- and data-parallelism using Deepspeed, enabling users to set the number of GPUs, pipeline stages, and gradient accumulation steps for optimal utilization.
data:image/s3,"s3://crabby-images/f6f74/f6f746a7123474e4f57df3e167d9f0d422cdbd0d" alt="skyeye Screenshot"
skyeye
SkyEye is an AI-powered Ground Controlled Intercept (GCI) bot designed for the flight simulator Digital Combat Simulator (DCS). It serves as an advanced replacement for the in-game E-2, E-3, and A-50 AI aircraft, offering modern voice recognition, natural-sounding voices, real-world brevity and procedures, a wide range of commands, and intelligent battlespace monitoring. The tool uses Speech-To-Text and Text-To-Speech technology, can run locally or on a cloud server, and is production-ready software used by various DCS communities.
data:image/s3,"s3://crabby-images/90384/90384ba6ec65691e0abc4a75babc674d1be1bfd0" alt="generative-ai-design-patterns Screenshot"
generative-ai-design-patterns
A catalog of design patterns for building generative AI applications, capturing current best practices in the field. The repository serves as a living catalog on GitHub to help practitioners navigate through the noise and identify areas for improvement. It is too early for a book due to the evolving nature of generative AI in production and the lack of concrete evidence to support certain claims.
data:image/s3,"s3://crabby-images/f2f1c/f2f1c690105a84c5939d8ead25ab3a2cab879688" alt="prompt-tuning-playbook Screenshot"
prompt-tuning-playbook
The LLM Prompt Tuning Playbook is a comprehensive guide for improving the performance of post-trained Language Models (LLMs) through effective prompting strategies. It covers topics such as pre-training vs. post-training, considerations for prompting, a rudimentary style guide for prompts, and a procedure for iterating on new system instructions. The playbook emphasizes the importance of clear, concise, and explicit instructions to guide LLMs in generating desired outputs. It also highlights the iterative nature of prompt development and the need for systematic evaluation of model responses.
data:image/s3,"s3://crabby-images/6afe8/6afe8b743b4a3115b210f5fb0f0d94d6b1ea33f4" alt="Winter Screenshot"
Winter
Winter is a UCI chess engine that has competed at top invite-only computer chess events. It is the top-rated chess engine from Switzerland and has a level of play that is super human but below the state of the art reached by large, distributed, and resource-intensive open-source projects like Stockfish and Leela Chess Zero. Winter has relied on many machine learning algorithms and techniques over the course of its development, including certain clustering methods not used in any other chess programs, such as Gaussian Mixture Models and Soft K-Means. As of Winter 0.6.2, the evaluation function relies on a small neural network for more precise evaluations.
data:image/s3,"s3://crabby-images/5d9ca/5d9cafee32e4c3da221cbdbcb89c9aea67745152" alt="aiohomekit Screenshot"
aiohomekit
aiohomekit is a Python library that implements the HomeKit protocol for controlling HomeKit accessories using asyncio. It is primarily used with Home Assistant, targeting the same versions of Python and following their code standards. The library is still under development and does not offer API guarantees yet. It aims to match the behavior of real HAP controllers, even when not strictly specified, and works around issues like JSON formatting, boolean encoding, header sensitivity, and TCP packet splitting. aiohomekit is primarily tested with Phillips Hue and Eve Extend bridges via Home Assistant, but is known to work with many more devices. It does not support BLE accessories and is intended for client-side use only.
For similar tasks
data:image/s3,"s3://crabby-images/c60e9/c60e9608e0ab8683305b5b2fc09d306e673cb8eb" alt="MisguidedAttention Screenshot"
MisguidedAttention
MisguidedAttention is a collection of prompts designed to challenge the reasoning abilities of large language models by presenting them with modified versions of well-known thought experiments, riddles, and paradoxes. The goal is to assess the logical deduction capabilities of these models and observe any shortcomings or fallacies in their responses. The repository includes a variety of prompts that test different aspects of reasoning, such as decision-making, probability assessment, and problem-solving. By analyzing how language models handle these challenges, researchers can gain insights into their reasoning processes and potential biases.
For similar jobs
data:image/s3,"s3://crabby-images/7a828/7a828889d979cbf4be5a04454f679734bb36585f" alt="sweep Screenshot"
sweep
Sweep is an AI junior developer that turns bugs and feature requests into code changes. It automatically handles developer experience improvements like adding type hints and improving test coverage.
data:image/s3,"s3://crabby-images/cac11/cac1100b7e92d3c9c9529eacfe5a6e8d943d8f57" alt="teams-ai Screenshot"
teams-ai
The Teams AI Library is a software development kit (SDK) that helps developers create bots that can interact with Teams and Microsoft 365 applications. It is built on top of the Bot Framework SDK and simplifies the process of developing bots that interact with Teams' artificial intelligence capabilities. The SDK is available for JavaScript/TypeScript, .NET, and Python.
data:image/s3,"s3://crabby-images/10f6b/10f6b939c21eecaacb4aeb678159f5a587a20256" alt="ai-guide Screenshot"
ai-guide
This guide is dedicated to Large Language Models (LLMs) that you can run on your home computer. It assumes your PC is a lower-end, non-gaming setup.
data:image/s3,"s3://crabby-images/8b8c3/8b8c30180bcfba25fde40a102b6ae98fd35704b8" alt="classifai Screenshot"
classifai
Supercharge WordPress Content Workflows and Engagement with Artificial Intelligence. Tap into leading cloud-based services like OpenAI, Microsoft Azure AI, Google Gemini and IBM Watson to augment your WordPress-powered websites. Publish content faster while improving SEO performance and increasing audience engagement. ClassifAI integrates Artificial Intelligence and Machine Learning technologies to lighten your workload and eliminate tedious tasks, giving you more time to create original content that matters.
data:image/s3,"s3://crabby-images/c6b52/c6b52a0438e707c19f9dcb358608627496141f31" alt="chatbot-ui Screenshot"
chatbot-ui
Chatbot UI is an open-source AI chat app that allows users to create and deploy their own AI chatbots. It is easy to use and can be customized to fit any need. Chatbot UI is perfect for businesses, developers, and anyone who wants to create a chatbot.
data:image/s3,"s3://crabby-images/2fa15/2fa15d62e208bea0a119405a82ad37a6b24564c0" alt="BricksLLM Screenshot"
BricksLLM
BricksLLM is a cloud native AI gateway written in Go. Currently, it provides native support for OpenAI, Anthropic, Azure OpenAI and vLLM. BricksLLM aims to provide enterprise level infrastructure that can power any LLM production use cases. Here are some use cases for BricksLLM: * Set LLM usage limits for users on different pricing tiers * Track LLM usage on a per user and per organization basis * Block or redact requests containing PIIs * Improve LLM reliability with failovers, retries and caching * Distribute API keys with rate limits and cost limits for internal development/production use cases * Distribute API keys with rate limits and cost limits for students
data:image/s3,"s3://crabby-images/e597e/e597e24a3c2657c376591c1e0da9159b22cd2ff2" alt="uAgents Screenshot"
uAgents
uAgents is a Python library developed by Fetch.ai that allows for the creation of autonomous AI agents. These agents can perform various tasks on a schedule or take action on various events. uAgents are easy to create and manage, and they are connected to a fast-growing network of other uAgents. They are also secure, with cryptographically secured messages and wallets.
data:image/s3,"s3://crabby-images/8ab69/8ab692a869eef895ffca840dda9b43d13f3cf958" alt="griptape Screenshot"
griptape
Griptape is a modular Python framework for building AI-powered applications that securely connect to your enterprise data and APIs. It offers developers the ability to maintain control and flexibility at every step. Griptape's core components include Structures (Agents, Pipelines, and Workflows), Tasks, Tools, Memory (Conversation Memory, Task Memory, and Meta Memory), Drivers (Prompt and Embedding Drivers, Vector Store Drivers, Image Generation Drivers, Image Query Drivers, SQL Drivers, Web Scraper Drivers, and Conversation Memory Drivers), Engines (Query Engines, Extraction Engines, Summary Engines, Image Generation Engines, and Image Query Engines), and additional components (Rulesets, Loaders, Artifacts, Chunkers, and Tokenizers). Griptape enables developers to create AI-powered applications with ease and efficiency.