DataHorse

DataHorse

Chat with your data, modify it, visualize it, create and test machine learning models all in plain English. DataHorse makes data analysis and data science conversational using LLMs.

Stars: 200

Visit
 screenshot

DataHorse is an open-source tool and Python library that simplifies data science for everyone. It allows users to interact with data in plain English without requiring technical skills. Users can create graphs, modify data, and build machine learning models to make predictions. The tool is designed to help businesses and individuals quickly understand their data and make data-driven decisions with ease.

README:

🎉 Do data science and data analysis in plain english 🌟

🚀 DataHorse is an open-source tool and Python library that simplifies data science for everyone. It lets users interact with data in plain English 📝, without needing technical skills or watching tutorials 🎥 to learn how to use it. With DataHorse, you can create graphs 📊, modify data 🛠️, and even create smart systems called machine learning models 🤖 to get answers or make predictions. It’s designed to help businesses and individuals 💼 regardless of knowledge background to quickly understand their data and make smart, data-driven decisions, all with ease. ✨

Quick Installation

pip install datahorse

Examples

We’re using the Iris flower dataset as an example to demonstrate how DataHorse simplifies data analysis. This example showcases how our tool can handle real-world data, making it easier to work with and understand.

Setup and usage examples are available in this Google Colab notebook.

import datahorse

df = datahorse.read('https://raw.githubusercontent.com/plotly/datasets/master/iris-data.csv')
df = df.chat('convert species names to numeric codes')
  • seed=int: Ensures that the generated function is reproducible across different runs.
  • cache_req=True: Enables caching for the API request, ensuring that identical prompts won't trigger unnecessary API calls.
df = df.chat('convert species names to numeric codes', seed=int, cache_req=True)

Model training

df.chat('train a classification model and save the model')

Model testing

datahorse.test("path of the saved model",[["list of testing features"]])

Library Demo

Guide for running the DataHorse WebUI

Clone the repository

git clone https://github.com/DeDolphins/DataHorse.git

Go to the directory

cd DataHorseUI

Install the requirements

pip install -r requirements.text

Run DataHorse WebUI

streamlit run app.py

WebUI Demo

Please support the work by giving the repository a star, contributing to it, or

follow us on LinkedIn

Star History

⭐️ Star DataHorse to increase our visibility

Star History Chart

Contribute

Found a bug or have an improvement in mind? Fantastic!

Got a solution ready? That's even better!

Ready to share it with us? We're all ears!

Start at the contributing guide!

For Tasks:

Click tags to check more tools for each tasks

For Jobs:

Alternative AI tools for DataHorse

Similar Open Source Tools

For similar tasks

For similar jobs