Day 6: Building an AI-Powered Budget Analysis System – ML Pipeline & API Success

Day 6: AI Budget ML Pipeline – 95% Accuracy Classification & Anomaly Detection


Day 6 delivered the most technically ambitious milestone yet – transforming yesterday’s data processing system into a sophisticated AI budget analysis machine learning pipeline with real predictive capabilities.

What We Built Today

In just 3 hours, I created a production-ready ML system that can:

  • Predict daily spending patterns using Random Forest models
  • Classify transactions automatically with 95% accuracy using XGBoost
  • Detect financial anomalies using both statistical and ML methods
  • Serve predictions via REST API with 6 functional endpoints
  • Engineer 114 features from 23 original data columns

The Technical Achievement

Machine Learning Pipeline Architecture

The system now spans two powerful machines working in harmony:

Mac Studio M2 Max handles the ML development:

  • Python 3.9.6 environment with scikit-learn, XGBoost, pandas
  • Feature engineering pipeline creating 114 predictive features
  • Model training and validation with real financial data
  • FastAPI service serving ML predictions on port 8001

Proxmox VM (192.168.xxx.xxx) provides the data infrastructure:

  • PostgreSQL database with 535 real transactions
  • 18 categories of spending data
  • Redis caching for session management
  • Real-time data access for ML training

Feature Engineering Breakthrough

The most impressive technical achievement was transforming 23 basic transaction columns into 114 sophisticated features:

Time-Based Intelligence (22 features)

Created cyclical encodings for seasonal patterns – the system now understands that December spending differs from July, and Friday patterns differ from Monday, using mathematical sine/cosine transformations.

Behavioral Pattern Recognition (11 features)

Built features that detect spending velocity, transaction clustering, and recurring payment patterns. The system can identify when someone is making multiple purchases in one day versus their normal spending rhythm.

Text Analysis Capabilities (11 features)

Developed merchant type detection that automatically recognizes “Woolworths” as supermarket, “Shell” as gas station, and “Netflix” as entertainment subscription.

Rolling Statistics (24 features)

Implemented 7, 30, and 90-day rolling averages that give the models memory – they can see spending trends and seasonal variations.

Model Performance Results

Category Classification: 95% Accuracy

The XGBoost classifier significantly outperformed other approaches:

  • XGBoost: 95% accuracy
  • Random Forest: 82.5% accuracy
  • Logistic Regression: 87.5% accuracy

This means the system correctly categorizes 19 out of 20 transactions automatically.

Anomaly Detection: Real-World Validation

The system successfully identified genuine anomalies in my actual financial data:

High-Severity Anomalies Detected:

  • Large payment: $xxxxx (z-score: 10.6)
  • Mortgage payments: $1xxxx each (z-score: 3.9)
  • Unusual grocery spending: $285.70 at Woolworths (z-score: 2.9)

Machine Learning Anomalies: The Isolation Forest algorithm flagged 20 additional transactions as unusual based on multi-dimensional pattern analysis, including small transfers and irregular payment timings.

REST API Implementation

Built a production-grade FastAPI service with comprehensive endpoints:

Core Functionality Endpoints

  • Health Check: Database connectivity and model status
  • Spending Prediction: 7-day daily spending forecasts
  • Transaction Classification: Real-time category prediction
  • Anomaly Detection: Fraud and unusual pattern identification
  • Model Training: Automated retraining with new data
  • Model Information: Performance metrics and feature importance

Real API Response Example

{
  "status": "healthy",
  "database_connected": true,
  "models_loaded": true,
  "models_count": 7
}

The API responds in under 1 second for all endpoints and includes comprehensive error handling.

Technical Challenges Overcome

LightGBM Compatibility Issue

Problem: OpenMP library missing on macOS causing import failures Solution: Focused on XGBoost and Random Forest, maintaining high performance Result: 95% accuracy without dependency issues

Limited Historical Data

Problem: Only 25 days of spending data for daily prediction models Solution: Created intelligent fallback systems and transaction-level modeling Result: Functional prediction system that improves with more data

FastAPI Lifecycle Management

Problem: Deprecated startup decorators in newer FastAPI versions Solution: Implemented modern lifespan context manager Result: Clean, maintainable service initialization

Real-World Impact

Automated Financial Intelligence

The system now automatically:

  • Categorizes transactions as I spend money
  • Alerts me to unusual spending patterns
  • Predicts my daily spending needs
  • Identifies potential fraud or errors

Privacy-First Architecture

Running everything locally means:

  • No financial data leaves my network
  • Complete control over AI model training
  • No subscription costs for ML services
  • Customizable for my specific spending patterns

Performance Metrics

Feature Engineering: 496% expansion from 23 to 114 features Model Accuracy: 95% classification accuracy achieved API Response Time: <1 second for all endpoints Anomaly Detection: 28 anomalies identified in test datasetMemory Efficiency: <512MB total system footprint Model Size: <1MB per model for fast loading

What’s Next: Day 7 Preview

Tomorrow I’m building the analytics dashboard that will transform these ML capabilities into actionable insights:

  • Real-time spending dashboard with live transaction analysis
  • Predictive budget alerts based on spending patterns
  • Home Assistant integration for complete smart home budget management
  • Automated financial reporting with ML-powered insights

Development Insights

The Power of Feature Engineering

The jump from 23 to 114 features wasn’t just about quantity – it was about giving the models the right information to make intelligent decisions. Cyclical time encoding alone improved prediction accuracy by recognizing that spending patterns follow weekly and monthly cycles.

XGBoost vs Traditional Models

The 95% accuracy from XGBoost compared to 82.5% from Random Forest demonstrates why gradient boosting has become the gold standard for tabular data. The 12.5 percentage point improvement translates to significantly better real-world performance.

API-First Architecture Benefits

Building the ML system as a REST API from the start creates flexibility for future integrations. Whether connecting to mobile apps, web dashboards, or Home Assistant automations, the API provides a clean interface.

Technical Deep Dive Resources

For developers interested in implementation details:

  • Complete feature engineering pipeline with 6 distinct feature categories
  • Model training and validation code with performance benchmarking
  • FastAPI service architecture with modern lifecycle management
  • Anomaly detection algorithms combining statistical and ML approaches
  • Database integration patterns for real-time ML applications

Lessons Learned

Start with solid data foundations: The robust database work from previous days enabled sophisticated ML development

Feature engineering drives performance: More time spent on features yielded bigger improvements than model tuning

Real data reveals real challenges: Working with actual financial transactions exposed edge cases that synthetic data wouldn’t

Privacy-first ML is powerful: Local processing provides both security and customization benefits

Community Impact

This project demonstrates that sophisticated AI budget analysis doesn’t require cloud services or expensive SaaS subscriptions. With open-source tools and local processing, anyone can build production-grade financial intelligence systems.

The combination of homelab infrastructure and machine learning creates powerful possibilities for personal financial management while maintaining complete data privacy.


Day 6 transformed a data processing system into an intelligent financial analysis platform. With 95% accuracy classification and comprehensive anomaly detection, the AI budget system is now capable of providing genuine insights and automated financial intelligence.

Tomorrow: Building the dashboard that makes these AI capabilities accessible through beautiful, actionable visualizations.


Tags: artificial intelligence, machine learning, budget analysis, financial technology, homelab, XGBoost, FastAPI, anomaly detection, feature engineering, personal finance automation, predictive analytics, data science

Categories: Technology, Personal Finance, AI/Machine Learning, Homelab, Data Science

Internal Links:

  • Previous: “Day 5: Database & Web Interface Implementation”
  • Related: “Feature Engineering for Financial Data”
  • Related: “Building ML APIs with FastAPI”

External Links:

Social Media Snippets:

  • Twitter: “Day 6 ✅ Built complete ML pipeline: 95% accuracy transaction classification, anomaly detection, REST API serving predictions. From 23 columns → 114 features in 3 hours! #MachineLearning #FinTech #Homelab”
  • LinkedIn: “Transformed financial data into AI-powered insights today. XGBoost achieved 95% accuracy for automatic transaction classification, while Isolation Forest detected genuine anomalies in spending patterns. The power of local ML development!”

Yoast SEO Optimization:

  • Focus keyword density: 1.2% for “AI budget analysis machine learning pipeline”
  • Meta description length: 156 characters
  • Readability: Flesch score 65+ (good readability)
  • Internal links: 3 relevant internal links included
  • External links: 3 authoritative technical references

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.