Commit 1d98267a authored by Christos Anastasiou's avatar Christos Anastasiou
Browse files

Initial commit για GitLab submission

  Περιλαμβάνει:
  - Data preprocessing pipeline (cleaning/)
  - ML evaluation & backtesting (analysis/)
  - Shared utilities (common/)
  - Configuration & orchestration

  Όπως περιγράφεται στο Παράρτημα της διπλωματικής.
  Excludes: results/, datasets, thesis documents (λόγω μεγέθους)
parent 683e53a7
Loading
Loading
Loading
Loading

.gitignore

0 → 100644
+96 −0
Original line number Diff line number Diff line
# Αποτελέσματα πειραμάτων (μεγάλα αρχεία)
results/
exported/
workspace/
new_kastoria/

# Datasets (μεγάλα αρχεία)
data/
dataset/
cu-bems_dataset/
WATTS_ML_SEPTEMBER/

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
venv/
env/
ENV/
*.egg-info/
dist/
build/

# Jupyter Notebook
.ipynb_checkpoints
*.ipynb

# IDEs
.idea/
.vscode/
*.swp
*.swo
*~

# OS
.DS_Store
Thumbs.db

# Temporary files
*.log
*.tmp
*.temp
*.bak
*.backup
*.backup_*

# Model outputs
catboost_info/
hmm_outputs/
ai_sanity_outputs/

# Large CSV/JSON files
*.csv
!watts_experiments.yaml
!requirements.txt
aggregated_wmape.csv
all_results_summary*.csv
algorithm_rankings.json
*_results*.json
*_summary*.csv

# Documentation files (τα κρατάμε μόνο στο κύριο repo)
THESIS_CHAPTERS/
thesis_figures/
*.md
!README.md
!GITLAB_README.md
*.docx
*.pdf
*.txt
!requirements.txt

# Scripts που δεν χρειάζονται
create_*.py
fix_*.py
analyze_*.py
compare_*.py
collect_*.py
merge_*.py
renumber_*.py
translate_*.py
convert_*.py
extract_*.py
*_temp.py
*_test.py

# Περιττά configuration files
*.yaml
!watts_experiments.yaml
schema.json
pyproject.toml

# Environment variables
.env

README.md

0 → 100644
+122 −0
Original line number Diff line number Diff line
# ThesisML - Machine Learning για Πρόβλεψη Ενέργειας σε Έξυπνα Κτίρια

Διπλωματική Εργασία Αναστασίου Χρήστος - Πανεπιστήμιο Δυτικής Μακεδονίας, 2025

## Περιγραφή

Αυτό το repository περιέχει τον κώδικα υλοποίησης για την πρόβλεψη ενεργειακής κατανάλωσης σε έξυπνα κτίρια με τεχνικές μηχανικής μάθησης.

## Δομή Έργου

```
ThesisML/
├── cleaning/                    # Data preprocessing pipeline
│   └── clean_energy_dataset_v3.py
├── analysis/                    # ML evaluation and backtesting
│   ├── walkforward_backtest.py
│   └── dual_ai_economic_validation.py
├── common/                      # Shared utilities
│   ├── extra_models_and_metrics.py
│   └── features.py
├── interactive_watts_runner.py  # Orchestration runner
├── watts_experiments.yaml       # Configuration file
└── results/                     # Output directory (not included in repo)
```

## Βασικά Αρχεία

### 1. Data Preprocessing
- **`cleaning/clean_energy_dataset_v3.py`**: Pipeline προεπεξεργασίας δεδομένων ενέργειας

### 2. ML Evaluation
- **`analysis/walkforward_backtest.py`**: Walk-forward validation για χρονοσειρές
- **`analysis/dual_ai_economic_validation.py`**: Διπλή επικύρωση AI (GPT-4o + Claude Opus 4)

### 3. Shared Utilities
- **`common/extra_models_and_metrics.py`**: ML μοντέλα και μετρικές αξιολόγησης
- **`common/features.py`**: Feature engineering για χρονοσειρές

### 4. Configuration & Orchestration
- **`interactive_watts_runner.py`**: Κύριο script εκτέλεσης πειραμάτων
- **`watts_experiments.yaml`**: Αρχείο διαμόρφωσης πειραμάτων

## Εγκατάσταση

```bash
# Clone το repository
git clone [REPO_URL]
cd ThesisML

# Δημιουργία virtual environment
python3 -m venv venv
source venv/bin/activate  # Linux/Mac
# ή
venv\Scripts\activate  # Windows

# Εγκατάσταση dependencies
pip install -r requirements.txt
```

## Χρήση

### Βασική Εκτέλεση

```bash
python interactive_watts_runner.py
```

### Παραδείγματα

```bash
# Walk-forward backtest
python analysis/walkforward_backtest.py

# Οικονομική επικύρωση
python analysis/dual_ai_economic_validation.py
```

## Απαιτήσεις

- Python 3.8+
- pandas, numpy
- scikit-learn
- lightgbm, xgboost, catboost
- PyYAML

Δείτε το `requirements.txt` για πλήρη λίστα.

## Datasets

Τα datasets **δεν** περιλαμβάνονται στο repository λόγω μεγέθους.
Περιγραφή των datasets και οδηγίες για πρόσβαση διατίθενται στη διπλωματική εργασία.

## Results

Τα αποτελέσματα των πειραμάτων αποθηκεύονται στον φάκελο `results/` ο οποίος **δεν** περιλαμβάνεται στο repository λόγω μεγέθους.

## Συγγραφέας

Χρήστος Αναστασίου ΑΜ 265


## License

Αυτός ο κώδικας παρέχεται για ακαδημαϊκούς σκοπούς.

## Αναφορά

Αν χρησιμοποιήσετε αυτόν τον κώδικα, παρακαλώ αναφέρετε:

```
Αναστασίου, Χ. (2025). Πρόβλεψη Ενεργειακής Κατανάλωσης σε Έξυπνα Κτίρια
με Τεχνικές Μηχανικής Μάθησης. Διπλωματική Εργασία,
Πανεπιστήμιο Δυτικής Μακεδονίας.
```

## Επικοινωνία

Για ερωτήσεις ή παρατηρήσεις, επικοινωνήστε μέσω του πανεπιστημίου.

---

**Σημείωση**: Αυτό το repository περιέχει τον κώδικα που περιγράφεται στο Παράρτημα της διπλωματικής εργασίας.
+430 −0
Original line number Diff line number Diff line
"""
Dual AI Economic Validation
Στέλνει το ίδιο prompt σε OpenAI + Claude και κάνει cross-validation

Approach A (Traditional): wMAPE → 15% conversion → €30.83/year
Approach B (AI-recommended): wMAPE → AI reasoning → €X/year

Cross-validate Approach B με 2 independent AI models
"""

import os
import json
import pandas as pd
from dotenv import load_dotenv
import openai
import anthropic

load_dotenv()

# ==============================================================================
# CONFIGURATION - Από .env file
# ==============================================================================

OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-4o")
ANTHROPIC_MODEL = os.getenv("ANTHROPIC_MODEL", "claude-opus-4-1-20250805")

print("="*80)
print("DUAL AI ECONOMIC VALIDATION")
print("Cross-validation: OpenAI vs Anthropic")
print("="*80)

print(f"\n🔧 Configuration:")
print(f"   OpenAI model: {OPENAI_MODEL}")
print(f"   Anthropic model: {ANTHROPIC_MODEL}")

# ==============================================================================
# DATA - Τα ML results μας
# ==============================================================================

SYSTEMS_DATA = {
    'ZWave2_computers': {
        'name': 'Office Computers',
        'annual_kwh': 423,
        'wMAPE': 3.16,
        'best_model': 'LightGBM',
        'type': 'Office equipment, variable load during working hours'
    },
    'ZWave_Node_016': {
        'name': 'Lab Equipment (Miranet)',
        'annual_kwh': 74,
        'wMAPE': 4.24,
        'best_model': 'Ridge',
        'type': 'Lab instrumentation, steady load'
    },
    'WavePlug_UPS': {
        'name': 'UPS System',
        'annual_kwh': 395,
        'wMAPE': 5.37,
        'best_model': 'ElasticNet',
        'type': 'Critical infrastructure, continuous load'
    },
    'ZWave_Node_031': {
        'name': 'Server',
        'annual_kwh': 87,
        'wMAPE': 7.83,
        'best_model': 'KNN',
        'type': 'Always-on server, stochastic load'
    },
    'ZW095_Multi': {
        'name': 'Multi-phase meter (ZW095)',
        'annual_kwh': 3036,
        'wMAPE': 27.71,
        'best_model': 'ElasticNet',
        'type': 'Campus multi-phase monitoring, high variability'
    }
}

TOTAL_KWH = sum(s['annual_kwh'] for s in SYSTEMS_DATA.values())
TOTAL_COST = TOTAL_KWH * 0.15
TARIFF = 0.15  # €/kWh

# ==============================================================================
# PROMPT - Το ίδιο για OpenAI και Claude
# ==============================================================================

def create_prompt():
    """Δημιουργεί το structured prompt"""

    systems_text = ""
    for i, (key, data) in enumerate(SYSTEMS_DATA.items(), 1):
        systems_text += f"""
**System {i}: {data['name']}**
- Current annual consumption: {data['annual_kwh']} kWh
- ML model: {data['best_model']}
- Forecasting accuracy: {data['wMAPE']}% wMAPE ({"excellent" if data['wMAPE'] < 10 else "moderate"})
- Type: {data['type']}
"""

    prompt = f"""You are an energy management expert specializing in building electrical systems and ML-based optimization.

I have successfully developed ML forecasting models for 5 building electrical systems with the following performance:
{systems_text}

**Context:**
- Total current consumption: {TOTAL_KWH:,} kWh/year (€{TOTAL_COST:.2f}/year at €{TARIFF}/kWh)
- Greek university setting (limited automation, no sophisticated BEMS)
- Systems monitored for 9.5 months (Oct 2024 - July 2025)

**Your task:**

Provide REALISTIC, CONSERVATIVE estimates for potential energy savings if these ML forecasts are used for optimization.

For EACH system, estimate:
1. What percentage of current consumption could be reduced?
2. What specific actions would achieve this? (e.g., "Schedule office computers to sleep mode during non-working hours")
3. Key assumptions
4. Confidence level (low/medium/high)

Then provide:
- Total potential annual savings (€/year)
- Overall percentage reduction

**Important constraints:**
- Be CONSERVATIVE - this is for a Master's thesis, not a marketing pitch
- Consider that UPS and Server are critical (limited optimization potential)
- Multi-phase system (ZW095) has poor forecasting accuracy (27.71% wMAPE)
- Greek context: limited demand response, no time-of-use pricing in 2024-2025
- No assumptions about HVAC (we don't monitor it)

Return your response as JSON with this structure:
{{
  "systems": [
    {{
      "name": "System name",
      "current_kwh": 423,
      "reduction_percent": 10,
      "new_kwh": 380.7,
      "savings_eur": 6.35,
      "optimization_actions": ["Action 1", "Action 2"],
      "confidence": "medium"
    }},
    ...
  ],
  "totals": {{
    "current_annual_kwh": {TOTAL_KWH},
    "new_annual_kwh": ...,
    "annual_savings_eur": ...,
    "overall_reduction_percent": ...
  }},
  "overall_assessment": "Brief summary of realistic potential"
}}

Use conservative, defensible estimates. Better to underestimate than overestimate.
"""
    return prompt

# ==============================================================================
# OPENAI CALL
# ==============================================================================

def query_openai(prompt):
    """Στέλνει prompt στο OpenAI"""

    print("\n" + "="*80)
    print(f"🤖 Calling OpenAI ({OPENAI_MODEL})...")
    print("="*80)

    try:
        client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

        response = client.chat.completions.create(
            model=OPENAI_MODEL,
            messages=[
                {"role": "system", "content": "You are an expert energy economist. Always respond with valid JSON only."},
                {"role": "user", "content": prompt}
            ],
            temperature=0.3,
            response_format={"type": "json_object"}
        )

        result = json.loads(response.choices[0].message.content)

        print(f"✅ OpenAI response received!")
        print(f"   Annual savings: €{result['totals']['annual_savings_eur']:.2f}")
        print(f"   Reduction: {result['totals']['overall_reduction_percent']:.1f}%")

        return {
            'success': True,
            'model': OPENAI_MODEL,
            'result': result
        }

    except Exception as e:
        print(f"❌ OpenAI error: {e}")
        return {'success': False, 'error': str(e)}

# ==============================================================================
# CLAUDE CALL
# ==============================================================================

def query_claude(prompt):
    """Στέλνει prompt στο Claude"""

    print("\n" + "="*80)
    print(f"🤖 Calling Claude ({ANTHROPIC_MODEL})...")
    print("="*80)

    try:
        client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

        response = client.messages.create(
            model=ANTHROPIC_MODEL,
            max_tokens=3000,
            temperature=0.3,
            messages=[
                {"role": "user", "content": prompt}
            ]
        )

        content = response.content[0].text

        # Parse JSON
        if "```json" in content:
            content = content.split("```json")[1].split("```")[0].strip()
        elif "```" in content:
            content = content.split("```")[1].split("```")[0].strip()

        result = json.loads(content)

        print(f"✅ Claude response received!")
        print(f"   Annual savings: €{result['totals']['annual_savings_eur']:.2f}")
        print(f"   Reduction: {result['totals']['overall_reduction_percent']:.1f}%")

        return {
            'success': True,
            'model': ANTHROPIC_MODEL,
            'result': result
        }

    except Exception as e:
        print(f"❌ Claude error: {e}")
        return {'success': False, 'error': str(e)}

# ==============================================================================
# CROSS-VALIDATION
# ==============================================================================

def cross_validate(openai_response, claude_response):
    """Συγκρίνει τις 2 απαντήσεις"""

    print("\n" + "="*80)
    print("CROSS-VALIDATION ANALYSIS")
    print("="*80)

    if not openai_response['success'] or not claude_response['success']:
        print("⚠️  Cannot cross-validate - one or both models failed")
        return None

    openai_savings = openai_response['result']['totals']['annual_savings_eur']
    claude_savings = claude_response['result']['totals']['annual_savings_eur']

    openai_pct = openai_response['result']['totals']['overall_reduction_percent']
    claude_pct = claude_response['result']['totals']['overall_reduction_percent']

    # Calculate agreement
    avg_savings = (openai_savings + claude_savings) / 2
    diff_pct = abs(openai_savings - claude_savings) / avg_savings * 100
    agreement_pct = 100 - diff_pct

    print(f"\n📊 Results Comparison:")
    print(f"   OpenAI:  €{openai_savings:.2f}/year ({openai_pct:.1f}% reduction)")
    print(f"   Claude:  €{claude_savings:.2f}/year ({claude_pct:.1f}% reduction)")
    print(f"   Average: €{avg_savings:.2f}/year")
    print(f"   Difference: {diff_pct:.1f}%")
    print(f"   Agreement: {agreement_pct:.1f}%")

    if agreement_pct >= 85:  # 15% tolerance
        print(f"\n✅ CONSENSUS ACHIEVED ({agreement_pct:.1f}% agreement)")
        print(f"   Consensus estimate: €{avg_savings:.2f}/year")
        consensus = {
            'achieved': True,
            'consensus_savings_eur': avg_savings,
            'consensus_reduction_pct': (openai_pct + claude_pct) / 2,
            'agreement_pct': agreement_pct,
            'openai_estimate': openai_savings,
            'claude_estimate': claude_savings
        }
    else:
        print(f"\n⚠️  NO CONSENSUS ({agreement_pct:.1f}% agreement < 85% threshold)")
        print(f"   Models disagree significantly - manual review needed")
        consensus = {
            'achieved': False,
            'agreement_pct': agreement_pct,
            'openai_estimate': openai_savings,
            'claude_estimate': claude_savings,
            'reason': 'Estimates differ by more than 15%'
        }

    return consensus

# ==============================================================================
# COMPARISON WITH TRADITIONAL APPROACH
# ==============================================================================

def compare_with_traditional(consensus):
    """Συγκρίνει με το Traditional Approach A"""

    print("\n" + "="*80)
    print("COMPARISON: AI-Validated vs Traditional")
    print("="*80)

    # Traditional από Phase 1
    traditional_savings = 30.83  # €/year (από phase1_traditional_results.json)
    traditional_pct = 5.12  # %

    if consensus and consensus['achieved']:
        ai_savings = consensus['consensus_savings_eur']
        ai_pct = consensus['consensus_reduction_pct']

        print(f"\n📊 Traditional Approach (Phase 1):")
        print(f"   Formula: wMAPE improvement × 15% conversion factor")
        print(f"   Savings: €{traditional_savings:.2f}/year ({traditional_pct:.1f}% reduction)")

        print(f"\n🤖 AI-Validated Approach (Dual validation):")
        print(f"   Method: AI reasoning (OpenAI + Claude consensus)")
        print(f"   Savings: €{ai_savings:.2f}/year ({ai_pct:.1f}% reduction)")

        diff = abs(traditional_savings - ai_savings)
        diff_pct = (diff / traditional_savings) * 100

        print(f"\n📈 Comparison:")
        print(f"   Difference: €{diff:.2f} ({diff_pct:.1f}%)")

        if diff_pct < 30:
            print(f"   ✅ Both approaches CONVERGE (within 30%)")
            print(f"   → Supports €{min(traditional_savings, ai_savings):.0f}-€{max(traditional_savings, ai_savings):.0f}/year range")
        else:
            print(f"   ⚠️  Approaches DIVERGE (>30% difference)")
            print(f"   → Further investigation needed")

        return {
            'traditional_savings': traditional_savings,
            'ai_savings': ai_savings,
            'converge': diff_pct < 30,
            'range_min': min(traditional_savings, ai_savings),
            'range_max': max(traditional_savings, ai_savings)
        }
    else:
        print("\n⚠️  Cannot compare - no consensus achieved")
        return None

# ==============================================================================
# MAIN
# ==============================================================================

def main():
    """Main execution"""

    # Create prompt
    prompt = create_prompt()

    # Query both AIs
    openai_response = query_openai(prompt)
    claude_response = query_claude(prompt)

    # Cross-validate
    consensus = cross_validate(openai_response, claude_response)

    # Compare with traditional
    comparison = compare_with_traditional(consensus)

    # Save results
    output = {
        'metadata': {
            'date': '2025-10-08',
            'openai_model': OPENAI_MODEL,
            'claude_model': ANTHROPIC_MODEL
        },
        'openai': openai_response,
        'claude': claude_response,
        'consensus': consensus,
        'comparison_with_traditional': comparison
    }

    with open('analysis/dual_ai_validation_results.json', 'w') as f:
        json.dump(output, f, indent=2, default=str)

    print("\n✅ Full results saved to: analysis/dual_ai_validation_results.json")

    # Summary table
    if consensus and consensus['achieved']:
        print("\n" + "="*80)
        print("FINAL SUMMARY TABLE")
        print("="*80)

        data = [
            {
                'Method': 'Traditional (Phase 1)',
                'Savings (€/year)': f"{comparison['traditional_savings']:.2f}",
                'Reduction (%)': '5.1%',
                'Source': 'wMAPE × 15% conversion'
            },
            {
                'Method': 'AI-Validated (OpenAI)',
                'Savings (€/year)': f"{consensus['openai_estimate']:.2f}",
                'Reduction (%)': f"{openai_response['result']['totals']['overall_reduction_percent']:.1f}%",
                'Source': 'AI reasoning (GPT-4o)'
            },
            {
                'Method': 'AI-Validated (Claude)',
                'Savings (€/year)': f"{consensus['claude_estimate']:.2f}",
                'Reduction (%)': f"{claude_response['result']['totals']['overall_reduction_percent']:.1f}%",
                'Source': 'AI reasoning (Claude Opus)'
            },
            {
                'Method': '**CONSENSUS**',
                'Savings (€/year)': f"**€{consensus['consensus_savings_eur']:.2f}**",
                'Reduction (%)': f"**{consensus['consensus_reduction_pct']:.1f}%**",
                'Source': f"**Dual validation ({consensus['agreement_pct']:.0f}% agreement)**"
            }
        ]

        df = pd.DataFrame(data)
        print("\n" + df.to_string(index=False))

        df.to_csv('analysis/table_dual_ai_validation.csv', index=False)
        print("\n✅ Table saved: analysis/table_dual_ai_validation.csv")

if __name__ == '__main__':
    main()
+159 −6

File changed.

Preview size limit exceeded, changes collapsed.

+195 −0

File added.

Preview size limit exceeded, changes collapsed.

Loading