finagg
A Python package for aggregating and normalizing historical data from popular and free financial APIs.
Stars: 410
finagg is a Python package that provides implementations of popular and free financial APIs, tools for aggregating historical data from those APIs into SQL databases, and tools for transforming aggregated data into features useful for analysis and AI/ML. It offers documentation, installation instructions, and basic usage examples for exploring various financial APIs and features. Users can install recommended datasets from 3rd party APIs into a local SQL database, access Bureau of Economic Analysis (BEA) data, Federal Reserve Economic Data (FRED), Securities and Exchange Commission (SEC) filings, and more. The package also allows users to explore raw data features, install refined data features, and perform refined aggregations of raw data. Configuration options for API keys, user agents, and data locations are provided, along with information on dependencies and related projects.
README:
finagg is a Python package that provides implementations of popular and free financial APIs, tools for aggregating historical data from those APIs into SQL databases, and tools for transforming aggregated data into features useful for analysis and AI/ML.
- Documentation: https://theogognf.github.io/finagg/
- PyPI: https://pypi.org/project/finagg/
- Repository: https://github.com/theOGognf/finagg
Install with pip for the latest stable version.
pip install finaggInstall from GitHub for the latest unstable version.
git clone https://github.com/theOGognf/finagg.git
pip install ./finagg/Optionally install the recommended datasets (economic data, company financials, stock histories, etc.) from 3rd party APIs into a local SQL database.
finagg install -ss economic -ts indices -z -rThe installation will point you where to get free API keys for each API that
requires one and will write those API keys to a local .env file for storage.
Run finagg install --help for more installation options and details.
These are just finagg usage samples. See the documentation for all the supported APIs and features.
These methods require internet access and API keys/user agent declarations.
Get Bureau of Economic Analysis (BEA) data.
>>> finagg.bea.api.gdp_by_industry.get(year=[2019]).head(5)
table_id freq year quarter industry industry_description ...
0 1 Q 2019 1 11 Agriculture, forestry, fishing, and hunting ...
1 1 Q 2019 1 111CA Farms ...
2 1 Q 2019 1 113FF Forestry, fishing, and related activities ...
3 1 Q 2019 1 21 Mining ...
4 1 Q 2019 1 211 Oil and gas extraction ...Get Federal Reserve Economic Data (FRED).
>>> finagg.fred.api.series.observations.get(
... "CPIAUCNS",
... realtime_start=0,
... realtime_end=-1,
... output_type=4
... ).head(5)
realtime_start realtime_end date value series_id
0 1949-04-22 1953-02-26 1949-03-01 169.5 CPIAUCNS
1 1949-05-23 1953-02-26 1949-04-01 169.7 CPIAUCNS
2 1949-06-24 1953-02-26 1949-05-01 169.2 CPIAUCNS
3 1949-07-22 1953-02-26 1949-06-01 169.6 CPIAUCNS
4 1949-08-26 1953-02-26 1949-07-01 168.5 CPIAUCNSGet Securities and Exchange Commission (SEC) filings.
>>> finagg.sec.api.company_facts.get(ticker="AAPL").head(5)
end value accn fy fp form filed ...
0 2009-06-27 895816758.0 0001193125-09-153165 2009 Q3 10-Q 2009-07-22 ...
1 2009-10-16 900678473.0 0001193125-09-214859 2009 FY 10-K 2009-10-27 ...
2 2009-10-16 900678473.0 0001193125-10-012091 2009 FY 10-K/A 2010-01-25 ...
3 2010-01-15 906794589.0 0001193125-10-012085 2010 Q1 10-Q 2010-01-25 ...
4 2010-04-09 909938383.0 0001193125-10-088957 2010 Q2 10-Q 2010-04-21 ...These methods require internet access, API keys/user agent declarations, and
downloading and installing raw data through the finagg install or
finagg <api/subpackage> install commands.
Get the most popular FRED features all in one dataframe.
>>> finagg.fred.feat.economic.from_raw().head(5)
CIVPART LOG_CHANGE(CPIAUCNS) LOG_CHANGE(CSUSHPINSA) FEDFUNDS ...
date ...
2014-10-06 62.8 0.0 0.0 0.09 ...
2014-10-08 62.8 0.0 0.0 0.09 ...
2014-10-13 62.8 0.0 0.0 0.09 ...
2014-10-15 62.8 0.0 0.0 0.09 ...
2014-10-20 62.8 0.0 0.0 0.09 ...Get quarterly report features from SEC data.
>>> finagg.sec.feat.quarterly.from_raw("AAPL").head(5)
LOG_CHANGE(Assets) LOG_CHANGE(AssetsCurrent) ...
fy fp filed ...
2010 Q1 2010-01-25 0.182629 -0.023676 ...
Q2 2010-04-21 0.000000 0.000000 ...
Q3 2010-07-21 0.000000 0.000000 ...
2011 Q1 2011-01-19 0.459174 0.278241 ...
Q2 2011-04-21 0.000000 0.000000 ...Get an aggregation of quarterly and daily features for a particular ticker.
>>> finagg.fundam.feat.fundam.from_raw("AAPL").head(5)
PriceBookRatio PriceEarningsRatio
date
2010-01-25 0.175061 2.423509
2010-01-26 0.178035 2.464678
2010-01-27 0.178813 2.475448
2010-01-28 0.177154 2.452471
2010-01-29 0.173825 2.406396These methods require installing refined data through the finagg install
or finagg <api/subpackage> install commands.
Get a ticker's industry's averaged quarterly report features.
>>> finagg.sec.feat.quarterly.industry.from_refined(ticker="AAPL").head(5)
mean ...
name AssetCoverageRatio BookRatio DebtEquityRatio ...
fy fp filed ...
2014 Q1 2014-05-15 10.731301 9.448954 0.158318 ...
Q2 2014-08-14 10.731301 9.448954 0.158318 ...
Q3 2014-11-14 10.731301 9.448954 0.158318 ...
2015 Q1 2015-05-15 16.738972 9.269250 0.294238 ...
Q2 2015-08-13 16.738972 9.269250 0.294238 ...Get a ticker's industry-averaged quarterly report features.
>>> finagg.sec.feat.quarterly.normalized.from_refined("AAPL").head(5)
NORM(LOG_CHANGE(Assets)) NORM(LOG_CHANGE(AssetsCurrent)) ...
fy fp filed ...
2010 Q2 2010-04-21 0.000000 0.000000 ...
Q3 2010-07-21 0.000000 0.000000 ...
2011 Q1 2011-01-19 0.978816 0.074032 ...
Q2 2011-04-21 0.000000 0.000000 ...
Q3 2011-07-20 -0.353553 -0.353553 ...Get tickers sorted by an industry-averaged quarterly report feature.
>>> finagg.sec.feat.quarterly.normalized.get_tickers_sorted_by(
... "NORM(EarningsPerShareBasic)",
... year=2019
... )[:5]
['XRAY', 'TSLA', 'SYY', 'WHR', 'KMB']Get tickers sorted by an industry-averaged fundamental feature.
>>> finagg.fundam.feat.fundam.normalized.get_tickers_sorted_by(
... "NORM(PriceEarningsRatio)",
... date="2019-01-04"
... )[:5]
['AMD', 'TRGP', 'HPE', 'CZR', 'TSLA']API keys and user agent declarations are required for most of the APIs. You can set environment variables to expose your API keys and user agents to finagg, or you can pass your API keys and user agents to the implemented APIs programmatically. The following environment variables are used for configuring API keys and user agents:
-
BEA_API_KEYis for the Bureau of Economic Analysis's API key. You can get a free API key from the BEA API site. -
FRED_API_KEYis for the Federal Reserve Economic Data API key. You can get a free API key from the FRED API site. -
INDICES_API_USER_AGENTis for scraping popular indices' compositions from Wikipedia and should be equivalent to a browser's user agent declaration. This defaults to a hardcoded value, but it may not always work. -
SEC_API_USER_AGENTis for the Securities and Exchange Commission's API. This should be of the formatFIRST_NAME LAST_NAME E_MAIL.
finagg's root path, HTTP cache path, and database path are all configurable
through environment variables. By default, all data related to finagg is put
in a ./findata directory relative to a root directory. You can change these
locations by modifying the respective environment variables:
-
FINAGG_ROOT_PATHpoints to the parent directory of the./findatadirectory. Defaults to your current working directory. -
FINAGG_HTTP_CACHE_PATHpoints to the HTTP requests cache SQLite storage. Defaults to./findata/http_cache.sqlite. -
FINAGG_DATABASE_URLpoints to the finagg data storage. Defaults to./findata/finagg.sqlite.
You can change some finagg behavior with other environment variables:
-
FINAGG_DISABLE_HTTP_CACHE: Set this to"1"or"True"to disable the HTTP requests cache. Instead of a cachable session, a default, uncached user session will be used for all requests.
- pandas for fast, flexible, and expressive representations of relational data.
- requests for HTTP requests to 3rd party APIs.
- requests-cache for caching HTTP requests to avoid getting throttled by 3rd party API servers.
- SQLAlchemy for a SQL Python interface.
- yfinance for historical stock data from Yahoo! Finance.
- The BEA API and the BEA API key registration link.
- The FRED API and the FRED API key registration link.
- The SEC API.
- FinRL is a collection of financial reinforcement learning environments and tools.
- fredapi is an implementation of the FRED API.
- OpenBBTerminal is an open-source version of the Bloomberg Terminal.
- sec-edgar is an implementation of a file-based SEC EDGAR parser.
- sec-edgar-api is an implementation of the SEC EDGAR REST API.
Aggregate some data, create some analysis notebooks, or create some RL environments using the implemented data features and SQL tables. This project was originally created to make RL environments for financial applications but has since focused its purpose to just aggregating financial data and features. That being said, all the implemented features are defined in such a way to make it very easy to develop financial AI/ML, so we encourage you to do just that!
Implemented APIs may be relatively new and simply may not provide data for a particular ticker or economic data series. For example, earnings per share may not be accessible for all companies through the SEC EDGAR API. In some cases, APIs may raise an HTTP error, causing installations to skip the ticker or series. Additionally, not all tickers and economic data series contain sufficient data for feature normalization. If a ticker or series only has one data point, that data point could be dropped when computing a feature (such as percent change), causing no data to be installed.
Python 3.10 and up are supported. We don't plan on supporting lower versions because 3.10 introduces some nice quality of life updates that are used throughout the package.
The package is developed and tested on both Linux and Windows, but we recommend using Linux or WSL in practice. The package performs a good amount of I/O and interprocess operations that could result in a noticeable performance degradation on Windows.
For Tasks:
Click tags to check more tools for each tasksFor Jobs:
Alternative AI tools for finagg
Similar Open Source Tools
finagg
finagg is a Python package that provides implementations of popular and free financial APIs, tools for aggregating historical data from those APIs into SQL databases, and tools for transforming aggregated data into features useful for analysis and AI/ML. It offers documentation, installation instructions, and basic usage examples for exploring various financial APIs and features. Users can install recommended datasets from 3rd party APIs into a local SQL database, access Bureau of Economic Analysis (BEA) data, Federal Reserve Economic Data (FRED), Securities and Exchange Commission (SEC) filings, and more. The package also allows users to explore raw data features, install refined data features, and perform refined aggregations of raw data. Configuration options for API keys, user agents, and data locations are provided, along with information on dependencies and related projects.
bigcodebench
BigCodeBench is an easy-to-use benchmark for code generation with practical and challenging programming tasks. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more complex instructions and diverse function calls. BigCodeBench focuses on the evaluation of LLM4Code with diverse function calls and complex instructions, providing precise evaluation & ranking and pre-generated samples to accelerate code intelligence research. It inherits the design of the EvalPlus framework but differs in terms of execution environment and test evaluation.
BIRD-CRITIC-1
BIRD-CRITIC 1.0 is a SQL benchmark designed to evaluate the capability of large language models (LLMs) in diagnosing and solving user issues within real-world database environments. It comprises 600 tasks for development and 200 held-out out-of-distribution tests across 4 prominent open-source SQL dialects. The benchmark expands beyond simple SELECT queries to cover a wider range of SQL operations, reflecting actual application scenarios. An optimized execution-based evaluation environment is included for rigorous and efficient validation.
ragflow
RAGFlow is an open-source Retrieval-Augmented Generation (RAG) engine that combines deep document understanding with Large Language Models (LLMs) to provide accurate question-answering capabilities. It offers a streamlined RAG workflow for businesses of all sizes, enabling them to extract knowledge from unstructured data in various formats, including Word documents, slides, Excel files, images, and more. RAGFlow's key features include deep document understanding, template-based chunking, grounded citations with reduced hallucinations, compatibility with heterogeneous data sources, and an automated and effortless RAG workflow. It supports multiple recall paired with fused re-ranking, configurable LLMs and embedding models, and intuitive APIs for seamless integration with business applications.
RD-Agent
RD-Agent is a tool designed to automate critical aspects of industrial R&D processes, focusing on data-driven scenarios to streamline model and data development. It aims to propose new ideas ('R') and implement them ('D') automatically, leading to solutions of significant industrial value. The tool supports scenarios like Automated Quantitative Trading, Data Mining Agent, Research Copilot, and more, with a framework to push the boundaries of research in data science. Users can create a Conda environment, install the RDAgent package from PyPI, configure GPT model, and run various applications for tasks like quantitative trading, model evolution, medical prediction, and more. The tool is intended to enhance R&D processes and boost productivity in industrial settings.
ichigo
Ichigo is a local real-time voice AI tool that uses an early fusion technique to extend a text-based LLM to have native 'listening' ability. It is an open research experiment with improved multiturn capabilities and the ability to refuse processing inaudible queries. The tool is designed for open data, open weight, on-device Siri-like functionality, inspired by Meta's Chameleon paper. Ichigo offers a web UI demo and Gradio web UI for users to interact with the tool. It has achieved enhanced MMLU scores, stronger context handling, advanced noise management, and improved multi-turn capabilities for a robust user experience.
pgvecto.rs
pgvecto.rs is a Postgres extension written in Rust that provides vector similarity search functions. It offers ultra-low-latency, high-precision vector search capabilities, including sparse vector search and full-text search. With complete SQL support, async indexing, and easy data management, it simplifies data handling. The extension supports various data types like FP16/INT8, binary vectors, and Matryoshka embeddings. It ensures system performance with production-ready features, high availability, and resource efficiency. Security and permissions are managed through easy access control. The tool allows users to create tables with vector columns, insert vector data, and calculate distances between vectors using different operators. It also supports half-precision floating-point numbers for better performance and memory usage optimization.
ALMA
ALMA (Advanced Language Model-based Translator) is a many-to-many LLM-based translation model that utilizes a two-step fine-tuning process on monolingual and parallel data to achieve strong translation performance. ALMA-R builds upon ALMA models with LoRA fine-tuning and Contrastive Preference Optimization (CPO) for even better performance, surpassing GPT-4 and WMT winners. The repository provides ALMA and ALMA-R models, datasets, environment setup, evaluation scripts, training guides, and data information for users to leverage these models for translation tasks.
nano-graphrag
nano-GraphRAG is a simple, easy-to-hack implementation of GraphRAG that provides a smaller, faster, and cleaner version of the official implementation. It is about 800 lines of code, small yet scalable, asynchronous, and fully typed. The tool supports incremental insert, async methods, and various parameters for customization. Users can replace storage components and LLM functions as needed. It also allows for embedding function replacement and comes with pre-defined prompts for entity extraction and community reports. However, some features like covariates and global search implementation differ from the original GraphRAG. Future versions aim to address issues related to data source ID, community description truncation, and add new components.
evalverse
Evalverse is an open-source project designed to support Large Language Model (LLM) evaluation needs. It provides a standardized and user-friendly solution for processing and managing LLM evaluations, catering to AI research engineers and scientists. Evalverse supports various evaluation methods, insightful reports, and no-code evaluation processes. Users can access unified evaluation with submodules, request evaluations without code via Slack bot, and obtain comprehensive reports with scores, rankings, and visuals. The tool allows for easy comparison of scores across different models and swift addition of new evaluation tools.
LEANN
LEANN is an innovative vector database that democratizes personal AI, transforming your laptop into a powerful RAG system that can index and search through millions of documents using 97% less storage than traditional solutions without accuracy loss. It achieves this through graph-based selective recomputation and high-degree preserving pruning, computing embeddings on-demand instead of storing them all. LEANN allows semantic search of file system, emails, browser history, chat history, codebase, or external knowledge bases on your laptop with zero cloud costs and complete privacy. It is a drop-in semantic search MCP service fully compatible with Claude Code, enabling intelligent retrieval without changing your workflow.
iloom-cli
iloom is a tool designed to streamline AI-assisted development by focusing on maintaining alignment between human developers and AI agents. It treats context as a first-class concern, persisting AI reasoning in issue comments rather than temporary chats. The tool allows users to collaborate with AI agents in an isolated environment, switch between complex features without losing context, document AI decisions publicly, and capture key insights and lessons learned from AI sessions. iloom is not just a tool for managing git worktrees, but a control plane for maintaining alignment between users and their AI assistants.
SWE-bench-Live
SWE-bench-Live is a live benchmark dataset for evaluating AI systems' ability to complete real-world software engineering tasks. It is continuously updated through an automated curation pipeline, providing the community with up-to-date task instances for rigorous and contamination-free evaluation. The dataset is designed to test the performance of various AI models on software engineering tasks and supports multiple programming languages and operating systems.
Devon
Devon is an open-source pair programmer tool designed to facilitate collaborative coding sessions. It provides features such as multi-file editing, codebase exploration, test writing, bug fixing, and architecture exploration. The tool supports Anthropic, OpenAI, and Groq APIs, with plans to add more models in the future. Devon is community-driven, with ongoing development goals including multi-model support, plugin system for tool builders, self-hostable Electron app, and setting SOTA on SWE-bench Lite. Users can contribute to the project by developing core functionality, conducting research on agent performance, providing feedback, and testing the tool.
llm-leaderboard
Nejumi Leaderboard 3 is a comprehensive evaluation platform for large language models, assessing general language capabilities and alignment aspects. The evaluation framework includes metrics for language processing, translation, summarization, information extraction, reasoning, mathematical reasoning, entity extraction, knowledge/question answering, English, semantic analysis, syntactic analysis, alignment, ethics/moral, toxicity, bias, truthfulness, and robustness. The repository provides an implementation guide for environment setup, dataset preparation, configuration, model configurations, and chat template creation. Users can run evaluation processes using specified configuration files and log results to the Weights & Biases project.
plexe
Plexe is a tool that allows users to create machine learning models by describing them in plain language. Users can explain their requirements, provide a dataset, and the AI-powered system will build a fully functional model through an automated agentic approach. It supports multiple AI agents and model building frameworks like XGBoost, CatBoost, and Keras. Plexe also provides Docker images with pre-configured environments, YAML configuration for customization, and support for multiple LiteLLM providers. Users can visualize experiment results using the built-in Streamlit dashboard and extend Plexe's functionality through custom integrations.
For similar tasks
finagg
finagg is a Python package that provides implementations of popular and free financial APIs, tools for aggregating historical data from those APIs into SQL databases, and tools for transforming aggregated data into features useful for analysis and AI/ML. It offers documentation, installation instructions, and basic usage examples for exploring various financial APIs and features. Users can install recommended datasets from 3rd party APIs into a local SQL database, access Bureau of Economic Analysis (BEA) data, Federal Reserve Economic Data (FRED), Securities and Exchange Commission (SEC) filings, and more. The package also allows users to explore raw data features, install refined data features, and perform refined aggregations of raw data. Configuration options for API keys, user agents, and data locations are provided, along with information on dependencies and related projects.
financial-datasets
Financial Datasets is an open-source Python library that allows users to create question and answer financial datasets using Large Language Models (LLMs). With this library, users can easily generate realistic financial datasets from 10-K, 10-Q, PDF, and other financial texts. The library provides three main methods for generating datasets: from any text, from a 10-K filing, or from a PDF URL. Financial Datasets can be used for a variety of tasks, including financial analysis, research, and education.
zillionare
This repository contains a collection of articles and tutorials on quantitative finance, including topics such as machine learning, statistical arbitrage, and risk management. The articles are written in a clear and concise style, and they are suitable for both beginners and experienced practitioners. The repository also includes a number of Jupyter notebooks that demonstrate how to use Python for quantitative finance.
lyraios
LYRAIOS (LLM-based Your Reliable AI Operating System) is an advanced AI assistant platform built with FastAPI and Streamlit, designed to serve as an operating system for AI applications. It offers core features such as AI process management, memory system, and I/O system. The platform includes built-in tools like Calculator, Web Search, Financial Analysis, File Management, and Research Tools. It also provides specialized assistant teams for Python and research tasks. LYRAIOS is built on a technical architecture comprising FastAPI backend, Streamlit frontend, Vector Database, PostgreSQL storage, and Docker support. It offers features like knowledge management, process control, and security & access control. The roadmap includes enhancements in core platform, AI process management, memory system, tools & integrations, security & access control, open protocol architecture, multi-agent collaboration, and cross-platform support.
For similar jobs
weave
Weave is a toolkit for developing Generative AI applications, built by Weights & Biases. With Weave, you can log and debug language model inputs, outputs, and traces; build rigorous, apples-to-apples evaluations for language model use cases; and organize all the information generated across the LLM workflow, from experimentation to evaluations to production. Weave aims to bring rigor, best-practices, and composability to the inherently experimental process of developing Generative AI software, without introducing cognitive overhead.
agentcloud
AgentCloud is an open-source platform that enables companies to build and deploy private LLM chat apps, empowering teams to securely interact with their data. It comprises three main components: Agent Backend, Webapp, and Vector Proxy. To run this project locally, clone the repository, install Docker, and start the services. The project is licensed under the GNU Affero General Public License, version 3 only. Contributions and feedback are welcome from the community.
oss-fuzz-gen
This framework generates fuzz targets for real-world `C`/`C++` projects with various Large Language Models (LLM) and benchmarks them via the `OSS-Fuzz` platform. It manages to successfully leverage LLMs to generate valid fuzz targets (which generate non-zero coverage increase) for 160 C/C++ projects. The maximum line coverage increase is 29% from the existing human-written targets.
LLMStack
LLMStack is a no-code platform for building generative AI agents, workflows, and chatbots. It allows users to connect their own data, internal tools, and GPT-powered models without any coding experience. LLMStack can be deployed to the cloud or on-premise and can be accessed via HTTP API or triggered from Slack or Discord.
VisionCraft
The VisionCraft API is a free API for using over 100 different AI models. From images to sound.
kaito
Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster. It manages large model files using container images, avoids tuning deployment parameters to fit GPU hardware by providing preset configurations, auto-provisions GPU nodes based on model requirements, and hosts large model images in the public Microsoft Container Registry (MCR) if the license allows. Using Kaito, the workflow of onboarding large AI inference models in Kubernetes is largely simplified.
PyRIT
PyRIT is an open access automation framework designed to empower security professionals and ML engineers to red team foundation models and their applications. It automates AI Red Teaming tasks to allow operators to focus on more complicated and time-consuming tasks and can also identify security harms such as misuse (e.g., malware generation, jailbreaking), and privacy harms (e.g., identity theft). The goal is to allow researchers to have a baseline of how well their model and entire inference pipeline is doing against different harm categories and to be able to compare that baseline to future iterations of their model. This allows them to have empirical data on how well their model is doing today, and detect any degradation of performance based on future improvements.
Azure-Analytics-and-AI-Engagement
The Azure-Analytics-and-AI-Engagement repository provides packaged Industry Scenario DREAM Demos with ARM templates (Containing a demo web application, Power BI reports, Synapse resources, AML Notebooks etc.) that can be deployed in a customer’s subscription using the CAPE tool within a matter of few hours. Partners can also deploy DREAM Demos in their own subscriptions using DPoC.