Initial commit για GitLab submission (1d98267a) · Commits · MIRANET / Diploma Projects / Thesis Anastasiou ML

.gitignore

0 → 100644

+96 −0

Original line number	Diff line number	Diff line
		# Αποτελέσματα πειραμάτων (μεγάλα αρχεία)
		results/
		exported/
		workspace/
		new_kastoria/

		# Datasets (μεγάλα αρχεία)
		data/
		dataset/
		cu-bems_dataset/
		WATTS_ML_SEPTEMBER/

		# Python
		__pycache__/
		*.py[cod]
		*$py.class
		*.so
		.Python
		venv/
		env/
		ENV/
		*.egg-info/
		dist/
		build/

		# Jupyter Notebook
		.ipynb_checkpoints
		*.ipynb

		# IDEs
		.idea/
		.vscode/
		*.swp
		*.swo
		*~

		# OS
		.DS_Store
		Thumbs.db

		# Temporary files
		*.log
		*.tmp
		*.temp
		*.bak
		*.backup
		.backup_

		# Model outputs
		catboost_info/
		hmm_outputs/
		ai_sanity_outputs/

		# Large CSV/JSON files
		*.csv
		!watts_experiments.yaml
		!requirements.txt
		aggregated_wmape.csv
		all_results_summary*.csv
		algorithm_rankings.json
		_results.json
		_summary.csv

		# Documentation files (τα κρατάμε μόνο στο κύριο repo)
		THESIS_CHAPTERS/
		thesis_figures/
		*.md
		!README.md
		!GITLAB_README.md
		*.docx
		*.pdf
		*.txt
		!requirements.txt

		# Scripts που δεν χρειάζονται
		create_*.py
		fix_*.py
		analyze_*.py
		compare_*.py
		collect_*.py
		merge_*.py
		renumber_*.py
		translate_*.py
		convert_*.py
		extract_*.py
		*_temp.py
		*_test.py

		# Περιττά configuration files
		*.yaml
		!watts_experiments.yaml
		schema.json
		pyproject.toml

		# Environment variables
		.env

README.md

0 → 100644

+122 −0

Original line number	Diff line number	Diff line
		# ThesisML - Machine Learning για Πρόβλεψη Ενέργειας σε Έξυπνα Κτίρια

		Διπλωματική Εργασία Αναστασίου Χρήστος - Πανεπιστήμιο Δυτικής Μακεδονίας, 2025

		## Περιγραφή

		Αυτό το repository περιέχει τον κώδικα υλοποίησης για την πρόβλεψη ενεργειακής κατανάλωσης σε έξυπνα κτίρια με τεχνικές μηχανικής μάθησης.

		## Δομή Έργου

		```
		ThesisML/
		├── cleaning/ # Data preprocessing pipeline
		│ └── clean_energy_dataset_v3.py
		├── analysis/ # ML evaluation and backtesting
		│ ├── walkforward_backtest.py
		│ └── dual_ai_economic_validation.py
		├── common/ # Shared utilities
		│ ├── extra_models_and_metrics.py
		│ └── features.py
		├── interactive_watts_runner.py # Orchestration runner
		├── watts_experiments.yaml # Configuration file
		└── results/ # Output directory (not included in repo)
		```

		## Βασικά Αρχεία

		### 1. Data Preprocessing
		- `cleaning/clean_energy_dataset_v3.py`: Pipeline προεπεξεργασίας δεδομένων ενέργειας

		### 2. ML Evaluation
		- `analysis/walkforward_backtest.py`: Walk-forward validation για χρονοσειρές
		- `analysis/dual_ai_economic_validation.py`: Διπλή επικύρωση AI (GPT-4o + Claude Opus 4)

		### 3. Shared Utilities
		- `common/extra_models_and_metrics.py`: ML μοντέλα και μετρικές αξιολόγησης
		- `common/features.py`: Feature engineering για χρονοσειρές

		### 4. Configuration & Orchestration
		- `interactive_watts_runner.py`: Κύριο script εκτέλεσης πειραμάτων
		- `watts_experiments.yaml`: Αρχείο διαμόρφωσης πειραμάτων

		## Εγκατάσταση

		```bash
		# Clone το repository
		git clone [REPO_URL]
		cd ThesisML

		# Δημιουργία virtual environment
		python3 -m venv venv
		source venv/bin/activate # Linux/Mac
		# ή
		venv\Scripts\activate # Windows

		# Εγκατάσταση dependencies
		pip install -r requirements.txt
		```

		## Χρήση

		### Βασική Εκτέλεση

		```bash
		python interactive_watts_runner.py
		```

		### Παραδείγματα

		```bash
		# Walk-forward backtest
		python analysis/walkforward_backtest.py

		# Οικονομική επικύρωση
		python analysis/dual_ai_economic_validation.py
		```

		## Απαιτήσεις

		- Python 3.8+
		- pandas, numpy
		- scikit-learn
		- lightgbm, xgboost, catboost
		- PyYAML

		Δείτε το `requirements.txt` για πλήρη λίστα.

		## Datasets

		Τα datasets δεν περιλαμβάνονται στο repository λόγω μεγέθους.
		Περιγραφή των datasets και οδηγίες για πρόσβαση διατίθενται στη διπλωματική εργασία.

		## Results

		Τα αποτελέσματα των πειραμάτων αποθηκεύονται στον φάκελο `results/` ο οποίος δεν περιλαμβάνεται στο repository λόγω μεγέθους.

		## Συγγραφέας

		Χρήστος Αναστασίου ΑΜ 265


		## License

		Αυτός ο κώδικας παρέχεται για ακαδημαϊκούς σκοπούς.

		## Αναφορά

		Αν χρησιμοποιήσετε αυτόν τον κώδικα, παρακαλώ αναφέρετε:

		```
		Αναστασίου, Χ. (2025). Πρόβλεψη Ενεργειακής Κατανάλωσης σε Έξυπνα Κτίρια
		με Τεχνικές Μηχανικής Μάθησης. Διπλωματική Εργασία,
		Πανεπιστήμιο Δυτικής Μακεδονίας.
		```

		## Επικοινωνία

		Για ερωτήσεις ή παρατηρήσεις, επικοινωνήστε μέσω του πανεπιστημίου.

		---

		Σημείωση: Αυτό το repository περιέχει τον κώδικα που περιγράφεται στο Παράρτημα της διπλωματικής εργασίας.

analysis/dual_ai_economic_validation.py

0 → 100644

+430 −0

Original line number	Diff line number	Diff line
		"""
		Dual AI Economic Validation
		Στέλνει το ίδιο prompt σε OpenAI + Claude και κάνει cross-validation

		Approach A (Traditional): wMAPE → 15% conversion → €30.83/year
		Approach B (AI-recommended): wMAPE → AI reasoning → €X/year

		Cross-validate Approach B με 2 independent AI models
		"""

		import os
		import json
		import pandas as pd
		from dotenv import load_dotenv
		import openai
		import anthropic

		load_dotenv()

		# ==============================================================================
		# CONFIGURATION - Από .env file
		# ==============================================================================

		OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-4o")
		ANTHROPIC_MODEL = os.getenv("ANTHROPIC_MODEL", "claude-opus-4-1-20250805")

		print("="*80)
		print("DUAL AI ECONOMIC VALIDATION")
		print("Cross-validation: OpenAI vs Anthropic")
		print("="*80)

		print(f"\n🔧 Configuration:")
		print(f" OpenAI model: {OPENAI_MODEL}")
		print(f" Anthropic model: {ANTHROPIC_MODEL}")

		# ==============================================================================
		# DATA - Τα ML results μας
		# ==============================================================================

		SYSTEMS_DATA = {
		'ZWave2_computers': {
		'name': 'Office Computers',
		'annual_kwh': 423,
		'wMAPE': 3.16,
		'best_model': 'LightGBM',
		'type': 'Office equipment, variable load during working hours'
		},
		'ZWave_Node_016': {
		'name': 'Lab Equipment (Miranet)',
		'annual_kwh': 74,
		'wMAPE': 4.24,
		'best_model': 'Ridge',
		'type': 'Lab instrumentation, steady load'
		},
		'WavePlug_UPS': {
		'name': 'UPS System',
		'annual_kwh': 395,
		'wMAPE': 5.37,
		'best_model': 'ElasticNet',
		'type': 'Critical infrastructure, continuous load'
		},
		'ZWave_Node_031': {
		'name': 'Server',
		'annual_kwh': 87,
		'wMAPE': 7.83,
		'best_model': 'KNN',
		'type': 'Always-on server, stochastic load'
		},
		'ZW095_Multi': {
		'name': 'Multi-phase meter (ZW095)',
		'annual_kwh': 3036,
		'wMAPE': 27.71,
		'best_model': 'ElasticNet',
		'type': 'Campus multi-phase monitoring, high variability'
		}
		}

		TOTAL_KWH = sum(s['annual_kwh'] for s in SYSTEMS_DATA.values())
		TOTAL_COST = TOTAL_KWH * 0.15
		TARIFF = 0.15 # €/kWh

		# ==============================================================================
		# PROMPT - Το ίδιο για OpenAI και Claude
		# ==============================================================================

		def create_prompt():
		"""Δημιουργεί το structured prompt"""

		systems_text = ""
		for i, (key, data) in enumerate(SYSTEMS_DATA.items(), 1):
		systems_text += f"""
		System {i}: {data['name']}
		- Current annual consumption: {data['annual_kwh']} kWh
		- ML model: {data['best_model']}
		- Forecasting accuracy: {data['wMAPE']}% wMAPE ({"excellent" if data['wMAPE'] < 10 else "moderate"})
		- Type: {data['type']}
		"""

		prompt = f"""You are an energy management expert specializing in building electrical systems and ML-based optimization.

		I have successfully developed ML forecasting models for 5 building electrical systems with the following performance:
		{systems_text}

		Context:
		- Total current consumption: {TOTAL_KWH:,} kWh/year (€{TOTAL_COST:.2f}/year at €{TARIFF}/kWh)
		- Greek university setting (limited automation, no sophisticated BEMS)
		- Systems monitored for 9.5 months (Oct 2024 - July 2025)

		Your task:

		Provide REALISTIC, CONSERVATIVE estimates for potential energy savings if these ML forecasts are used for optimization.

		For EACH system, estimate:
		1. What percentage of current consumption could be reduced?
		2. What specific actions would achieve this? (e.g., "Schedule office computers to sleep mode during non-working hours")
		3. Key assumptions
		4. Confidence level (low/medium/high)

		Then provide:
		- Total potential annual savings (€/year)
		- Overall percentage reduction

		Important constraints:
		- Be CONSERVATIVE - this is for a Master's thesis, not a marketing pitch
		- Consider that UPS and Server are critical (limited optimization potential)
		- Multi-phase system (ZW095) has poor forecasting accuracy (27.71% wMAPE)
		- Greek context: limited demand response, no time-of-use pricing in 2024-2025
		- No assumptions about HVAC (we don't monitor it)

		Return your response as JSON with this structure:
		{{
		"systems": [
		{{
		"name": "System name",
		"current_kwh": 423,
		"reduction_percent": 10,
		"new_kwh": 380.7,
		"savings_eur": 6.35,
		"optimization_actions": ["Action 1", "Action 2"],
		"confidence": "medium"
		}},
		...
		],
		"totals": {{
		"current_annual_kwh": {TOTAL_KWH},
		"new_annual_kwh": ...,
		"annual_savings_eur": ...,
		"overall_reduction_percent": ...
		}},
		"overall_assessment": "Brief summary of realistic potential"
		}}

		Use conservative, defensible estimates. Better to underestimate than overestimate.
		"""
		return prompt

		# ==============================================================================
		# OPENAI CALL
		# ==============================================================================

		def query_openai(prompt):
		"""Στέλνει prompt στο OpenAI"""

		print("\n" + "="*80)
		print(f"🤖 Calling OpenAI ({OPENAI_MODEL})...")
		print("="*80)

		try:
		client = openai.OpenAI(api_key=os.getenv("OPENAI_API_KEY"))

		response = client.chat.completions.create(
		model=OPENAI_MODEL,
		messages=[
		{"role": "system", "content": "You are an expert energy economist. Always respond with valid JSON only."},
		{"role": "user", "content": prompt}
		],
		temperature=0.3,
		response_format={"type": "json_object"}
		)

		result = json.loads(response.choices[0].message.content)

		print(f"✅ OpenAI response received!")
		print(f" Annual savings: €{result['totals']['annual_savings_eur']:.2f}")
		print(f" Reduction: {result['totals']['overall_reduction_percent']:.1f}%")

		return {
		'success': True,
		'model': OPENAI_MODEL,
		'result': result
		}

		except Exception as e:
		print(f"❌ OpenAI error: {e}")
		return {'success': False, 'error': str(e)}

		# ==============================================================================
		# CLAUDE CALL
		# ==============================================================================

		def query_claude(prompt):
		"""Στέλνει prompt στο Claude"""

		print("\n" + "="*80)
		print(f"🤖 Calling Claude ({ANTHROPIC_MODEL})...")
		print("="*80)

		try:
		client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

		response = client.messages.create(
		model=ANTHROPIC_MODEL,
		max_tokens=3000,
		temperature=0.3,
		messages=[
		{"role": "user", "content": prompt}
		]
		)

		content = response.content[0].text

		# Parse JSON
		if "```json" in content:
		content = content.split("```json")[1].split("```")[0].strip()
		elif "```" in content:
		content = content.split("```")[1].split("```")[0].strip()

		result = json.loads(content)

		print(f"✅ Claude response received!")
		print(f" Annual savings: €{result['totals']['annual_savings_eur']:.2f}")
		print(f" Reduction: {result['totals']['overall_reduction_percent']:.1f}%")

		return {
		'success': True,
		'model': ANTHROPIC_MODEL,
		'result': result
		}

		except Exception as e:
		print(f"❌ Claude error: {e}")
		return {'success': False, 'error': str(e)}

		# ==============================================================================
		# CROSS-VALIDATION
		# ==============================================================================

		def cross_validate(openai_response, claude_response):
		"""Συγκρίνει τις 2 απαντήσεις"""

		print("\n" + "="*80)
		print("CROSS-VALIDATION ANALYSIS")
		print("="*80)

		if not openai_response['success'] or not claude_response['success']:
		print("⚠️ Cannot cross-validate - one or both models failed")
		return None

		openai_savings = openai_response['result']['totals']['annual_savings_eur']
		claude_savings = claude_response['result']['totals']['annual_savings_eur']

		openai_pct = openai_response['result']['totals']['overall_reduction_percent']
		claude_pct = claude_response['result']['totals']['overall_reduction_percent']

		# Calculate agreement
		avg_savings = (openai_savings + claude_savings) / 2
		diff_pct = abs(openai_savings - claude_savings) / avg_savings * 100
		agreement_pct = 100 - diff_pct

		print(f"\n📊 Results Comparison:")
		print(f" OpenAI: €{openai_savings:.2f}/year ({openai_pct:.1f}% reduction)")
		print(f" Claude: €{claude_savings:.2f}/year ({claude_pct:.1f}% reduction)")
		print(f" Average: €{avg_savings:.2f}/year")
		print(f" Difference: {diff_pct:.1f}%")
		print(f" Agreement: {agreement_pct:.1f}%")

		if agreement_pct >= 85: # 15% tolerance
		print(f"\n✅ CONSENSUS ACHIEVED ({agreement_pct:.1f}% agreement)")
		print(f" Consensus estimate: €{avg_savings:.2f}/year")
		consensus = {
		'achieved': True,
		'consensus_savings_eur': avg_savings,
		'consensus_reduction_pct': (openai_pct + claude_pct) / 2,
		'agreement_pct': agreement_pct,
		'openai_estimate': openai_savings,
		'claude_estimate': claude_savings
		}
		else:
		print(f"\n⚠️ NO CONSENSUS ({agreement_pct:.1f}% agreement < 85% threshold)")
		print(f" Models disagree significantly - manual review needed")
		consensus = {
		'achieved': False,
		'agreement_pct': agreement_pct,
		'openai_estimate': openai_savings,
		'claude_estimate': claude_savings,
		'reason': 'Estimates differ by more than 15%'
		}

		return consensus

		# ==============================================================================
		# COMPARISON WITH TRADITIONAL APPROACH
		# ==============================================================================

		def compare_with_traditional(consensus):
		"""Συγκρίνει με το Traditional Approach A"""

		print("\n" + "="*80)
		print("COMPARISON: AI-Validated vs Traditional")
		print("="*80)

		# Traditional από Phase 1
		traditional_savings = 30.83 # €/year (από phase1_traditional_results.json)
		traditional_pct = 5.12 # %

		if consensus and consensus['achieved']:
		ai_savings = consensus['consensus_savings_eur']
		ai_pct = consensus['consensus_reduction_pct']

		print(f"\n📊 Traditional Approach (Phase 1):")
		print(f" Formula: wMAPE improvement × 15% conversion factor")
		print(f" Savings: €{traditional_savings:.2f}/year ({traditional_pct:.1f}% reduction)")

		print(f"\n🤖 AI-Validated Approach (Dual validation):")
		print(f" Method: AI reasoning (OpenAI + Claude consensus)")
		print(f" Savings: €{ai_savings:.2f}/year ({ai_pct:.1f}% reduction)")

		diff = abs(traditional_savings - ai_savings)
		diff_pct = (diff / traditional_savings) * 100

		print(f"\n📈 Comparison:")
		print(f" Difference: €{diff:.2f} ({diff_pct:.1f}%)")

		if diff_pct < 30:
		print(f" ✅ Both approaches CONVERGE (within 30%)")
		print(f" → Supports €{min(traditional_savings, ai_savings):.0f}-€{max(traditional_savings, ai_savings):.0f}/year range")
		else:
		print(f" ⚠️ Approaches DIVERGE (>30% difference)")
		print(f" → Further investigation needed")

		return {
		'traditional_savings': traditional_savings,
		'ai_savings': ai_savings,
		'converge': diff_pct < 30,
		'range_min': min(traditional_savings, ai_savings),
		'range_max': max(traditional_savings, ai_savings)
		}
		else:
		print("\n⚠️ Cannot compare - no consensus achieved")
		return None

		# ==============================================================================
		# MAIN
		# ==============================================================================

		def main():
		"""Main execution"""

		# Create prompt
		prompt = create_prompt()

		# Query both AIs
		openai_response = query_openai(prompt)
		claude_response = query_claude(prompt)

		# Cross-validate
		consensus = cross_validate(openai_response, claude_response)

		# Compare with traditional
		comparison = compare_with_traditional(consensus)

		# Save results
		output = {
		'metadata': {
		'date': '2025-10-08',
		'openai_model': OPENAI_MODEL,
		'claude_model': ANTHROPIC_MODEL
		},
		'openai': openai_response,
		'claude': claude_response,
		'consensus': consensus,
		'comparison_with_traditional': comparison
		}

		with open('analysis/dual_ai_validation_results.json', 'w') as f:
		json.dump(output, f, indent=2, default=str)

		print("\n✅ Full results saved to: analysis/dual_ai_validation_results.json")

		# Summary table
		if consensus and consensus['achieved']:
		print("\n" + "="*80)
		print("FINAL SUMMARY TABLE")
		print("="*80)

		data = [
		{
		'Method': 'Traditional (Phase 1)',
		'Savings (€/year)': f"€{comparison['traditional_savings']:.2f}",
		'Reduction (%)': '5.1%',
		'Source': 'wMAPE × 15% conversion'
		},
		{
		'Method': 'AI-Validated (OpenAI)',
		'Savings (€/year)': f"€{consensus['openai_estimate']:.2f}",
		'Reduction (%)': f"{openai_response['result']['totals']['overall_reduction_percent']:.1f}%",
		'Source': 'AI reasoning (GPT-4o)'
		},
		{
		'Method': 'AI-Validated (Claude)',
		'Savings (€/year)': f"€{consensus['claude_estimate']:.2f}",
		'Reduction (%)': f"{claude_response['result']['totals']['overall_reduction_percent']:.1f}%",
		'Source': 'AI reasoning (Claude Opus)'
		},
		{
		'Method': 'CONSENSUS',
		'Savings (€/year)': f"€{consensus['consensus_savings_eur']:.2f}",
		'Reduction (%)': f"{consensus['consensus_reduction_pct']:.1f}%",
		'Source': f"Dual validation ({consensus['agreement_pct']:.0f}% agreement)"
		}
		]

		df = pd.DataFrame(data)
		print("\n" + df.to_string(index=False))

		df.to_csv('analysis/table_dual_ai_validation.csv', index=False)
		print("\n✅ Table saved: analysis/table_dual_ai_validation.csv")

		if __name__ == '__main__':
		main()

analysis/walkforward_backtest.py

+159 −6

File changed.

Preview size limit exceeded, changes collapsed.

cleaning/clean_energy_dataset_v3.py

0 → 100644

+195 −0

File added.

Preview size limit exceeded, changes collapsed.