Configuration Guide¶
Overview¶
The Lean Proof Automation MCP uses a comprehensive configuration system to control all heuristic parameters for scoring and ranking automation potential. This guide explains the configuration architecture, all available parameters, and how to customize the system for your needs.
Architecture¶
The configuration system follows hexagonal architecture principles:
- Core logic depends on the
HeuristicsConfigprotocol (port), not concrete implementations - Configuration is immutable once loaded - changes require reloading
- Validation happens at load time with clear error messages
- Configuration is versioned for compatibility tracking
Configuration Sources (Priority Order)¶
- Explicit parameter:
config_pathin tool request - Environment variable:
LEAN_PROOF_AUTO_MCP_CONFIG - Package default: Built-in
heuristics.yaml
Quick Start¶
Using Default Configuration¶
The package ships with a default configuration that works well for typical Lean codebases. No configuration is required:
{
"file": "MyFile.lean",
"skip_already_automated": false
}
Using Custom Configuration¶
Create a custom YAML file and reference it:
{
"file": "MyFile.lean",
"skip_already_automated": false,
"config_path": "/path/to/custom_heuristics.yaml"
}
Or set the environment variable:
export LEAN_PROOF_AUTO_MCP_CONFIG=/path/to/custom_heuristics.yaml
Configuration File Format¶
All configuration files must:
- Be valid YAML
- Include version: "1.0" at the top
- Follow the schema defined below
- Use values in valid ranges (validated at load time)
Minimal Example¶
version: "1.0"
confidence:
base_score: 0.3
bonuses:
proof_structure: 0.2
tactic_detection: 0.2
structural_tactics: 0.15
term_mode_patterns: 0.15
automation_tactics: 0.1
indentation_consistency: 0.05
penalties:
sorry: -0.3
error_patterns: -0.1
thresholds:
proof_length_min: 1
proof_length_max: 50
proof_length_max_bonus: 0.05
# ... (all other sections required)
Configuration Parameters¶
Version¶
Required: Must be "1.0" for API 1.0 compatibility.
version: "1.0"
Confidence Calculation¶
Controls how confident the system is in proof detection and analysis.
confidence:
base_score: 0.3 # Starting confidence level [0.0, 1.0]
bonuses:
proof_structure: 0.2 # Bonus for well-structured proofs [0.0, 1.0]
tactic_detection: 0.2 # Bonus for detecting tactics [0.0, 1.0]
structural_tactics: 0.15 # Bonus for structural tactics [0.0, 1.0]
term_mode_patterns: 0.15 # Bonus for term-mode patterns [0.0, 1.0]
automation_tactics: 0.1 # Bonus for automation tactics [0.0, 1.0]
indentation_consistency: 0.05 # Bonus for consistent indentation [0.0, 1.0]
penalties:
sorry: -0.3 # Penalty for incomplete proofs [-1.0, 0.0]
error_patterns: -0.1 # Penalty for error indicators [-1.0, 0.0]
thresholds:
proof_length_min: 1 # Minimum proof lines for confidence (positive integer)
proof_length_max: 50 # Maximum proof lines for full confidence (positive integer)
proof_length_max_bonus: 0.05 # Bonus for appropriate length [0.0, 1.0]
Use Cases:
- Increase base_score if your proofs are generally well-structured
- Adjust proof_length_max based on typical proof complexity in your codebase
- Increase sorry penalty if you want to strongly discourage incomplete proofs
Aesop Scoring¶
Aesop works well on structural proofs with constructors and destructors.
aesop_scoring:
base_score: 0.2 # Starting score for aesop potential [0.0, 1.0]
confidence_bonuses:
high: { threshold: 0.8, bonus: 0.3 } # High confidence boost
good: { threshold: 0.6, bonus: 0.2 } # Good confidence boost
moderate: { threshold: 0.4, bonus: 0.1 } # Moderate confidence boost
low: { threshold: 0.0, bonus: 0.05 } # Any confidence is better than none
structural_tactics:
bonus_per_tactic: 0.08 # Bonus per structural tactic [0.0, 1.0]
max_bonus: 0.3 # Maximum bonus from structural tactics [0.0, 1.0]
term_mode:
bonus_per_pattern: 0.1 # Bonus per term-mode pattern [0.0, 1.0]
max_bonus: 0.25 # Maximum bonus from term-mode patterns [0.0, 1.0]
proof_length_bonuses:
- { max_lines: 5, bonus: 0.2 } # Very short proofs
- { max_lines: 10, bonus: 0.15 } # Short proofs
- { max_lines: 20, bonus: 0.1 } # Medium proofs
penalties:
rewrite_heavy: { threshold: 5, penalty: -0.15 } # Heavy rewriting
rewrite_moderate: { threshold: 2, penalty: -0.08 } # Moderate rewriting
simp_heavy: { threshold: 3, penalty: -0.1 } # Heavy simp usage
bonuses:
tactic_mode: 0.05 # Bonus for tactic-mode proofs [0.0, 1.0]
Use Cases:
- Increase base_score if your codebase has many structural proofs
- Adjust proof_length_bonuses based on typical proof lengths
- Increase rewrite_heavy.penalty if you want to discourage rewrite-heavy proofs
Grind Scoring¶
Grind excels at equational reasoning and simplification.
grind_scoring:
base_score: 0.15 # Starting score for grind potential [0.0, 1.0]
confidence_bonuses:
high: { threshold: 0.8, bonus: 0.25 } # High confidence boost
good: { threshold: 0.6, bonus: 0.18 } # Good confidence boost
moderate: { threshold: 0.4, bonus: 0.1 } # Moderate confidence boost
low: { threshold: 0.0, bonus: 0.05 } # Any confidence is better than none
rewrite:
bonus_per_count: 0.06 # Bonus per rewrite occurrence [0.0, 1.0]
max_bonus: 0.35 # Maximum bonus from rewrites [0.0, 1.0]
simp:
bonus_per_count: 0.08 # Bonus per simp occurrence [0.0, 1.0]
max_bonus: 0.25 # Maximum bonus from simp [0.0, 1.0]
algebraic_terms:
bonus_per_pattern: 0.08 # Bonus per algebraic term pattern [0.0, 1.0]
max_bonus: 0.2 # Maximum bonus from algebraic terms [0.0, 1.0]
proof_length_bonuses:
- { min_lines: 5, max_lines: 30, bonus: 0.15 } # Medium-length proofs
- { max_lines: 5, bonus: 0.08 } # Short proofs
penalties:
induction: -0.15 # Penalty for induction [-1.0, 0.0]
cases: -0.1 # Penalty for case analysis [-1.0, 0.0]
algebraic_tactics:
bonus_per_tactic: 0.08 # Bonus per algebraic tactic [0.0, 1.0]
max_bonus: 0.2 # Maximum bonus from algebraic tactics [0.0, 1.0]
bonuses:
pure_term_proof: 0.1 # Bonus for pure term-mode proofs [0.0, 1.0]
Use Cases:
- Increase base_score if your codebase has many equational proofs
- Adjust rewrite.bonus_per_count based on typical rewrite usage
- Decrease induction penalty if you want to encourage grind on inductive proofs
Annotation Value Scoring¶
Estimates ROI for adding automation annotations.
annotation_value_scoring:
base_score: 0.05 # Starting annotation value [0.0, 1.0]
confidence_bonuses:
high: { threshold: 0.8, bonus: 0.25 } # High confidence proofs
good: { threshold: 0.6, bonus: 0.18 } # Good confidence proofs
moderate: { threshold: 0.4, bonus: 0.1 } # Some confidence
low: { threshold: 0.0, bonus: 0.05 } # Any confidence
penalties:
low_confidence: -0.1 # Penalty for low confidence [-1.0, 0.0]
proof_length_bonuses:
- { min_lines: 20, bonus: 0.3 } # Long proofs
- { min_lines: 10, bonus: 0.2 } # Medium proofs
- { min_lines: 5, bonus: 0.15 } # Short proofs
- { min_lines: 2, bonus: 0.1 } # Very short proofs
local_lemmas:
bonus_per_count: 0.08 # Bonus per local lemma [0.0, 1.0]
max_bonus: 0.25 # Maximum bonus from local lemmas [0.0, 1.0]
tactic_diversity_bonuses:
- { min_diversity: 5, bonus: 0.15 } # High diversity
- { min_diversity: 3, bonus: 0.1 } # Medium diversity
- { min_diversity: 0, bonus: 0.05 } # Any detected patterns
rewrite_bonuses:
- { min_count: 3, bonus: 0.12 } # Many rewrites
- { min_count: 0, bonus: 0.08 } # Some rewrites
simp_bonuses:
- { min_count: 2, bonus: 0.08 } # Multiple simps
- { min_count: 0, bonus: 0.05 } # Some simp usage
bonuses:
term_application: 0.1 # Complex term applications [0.0, 1.0]
structural_proof: 0.1 # Structural proofs with cases [0.0, 1.0]
Use Cases:
- Increase proof_length_bonuses if you want to prioritize longer proofs
- Adjust local_lemmas.bonus_per_count based on typical local lemma usage
- Increase tactic_diversity_bonuses if you value diverse proof strategies
Subgoal Potential Scoring¶
Estimates potential for automating individual subgoals.
subgoal_potential_scoring:
confidence_multipliers:
high: { threshold: 0.8, multiplier: 0.8 } # High confidence
good: { threshold: 0.6, multiplier: 0.7 } # Good confidence
moderate: { threshold: 0.4, multiplier: 0.6 } # Some confidence
low: { threshold: 0.0, multiplier: 0.5 } # Low confidence
none: { threshold: 0.0, multiplier: 0.0 } # No confidence
bonuses:
induction_or_cases: 0.25 # Can automate branches [0.0, 1.0]
term_application: 0.1 # Term applications [0.0, 1.0]
algebraic_term: 0.12 # Algebraic terms [0.0, 1.0]
structure_cases:
bonus_per_case: 0.04 # Bonus per case branch [0.0, 1.0]
max_bonus: 0.2 # Maximum bonus from cases [0.0, 1.0]
rewrite_blocks:
bonus_per_block: 0.08 # Bonus per rewrite block [0.0, 1.0]
max_bonus: 0.2 # Maximum bonus from rewrite blocks [0.0, 1.0]
Use Cases:
- Increase induction_or_cases bonus if you want to prioritize case analysis
- Adjust confidence_multipliers based on your confidence in proof detection
- Increase structure_cases.bonus_per_case if you have many case-heavy proofs
Risk Scoring¶
Estimates risk of automation breaking or causing issues.
risk_scoring:
global_change_risk:
rewrite_heavy: { threshold: 5, risk: 0.6 } # Heavy rewriting
simp_heavy: { threshold: 3, risk: 0.4 } # Heavy simp
simp_question_mark: 0.8 # simp? is very risky [0.0, 1.0]
simp_risk:
heavy: { threshold: 3, risk: 0.6 } # Heavy simp usage
moderate: { threshold: 1, risk: 0.3 } # Moderate simp usage
local_lemma_risk:
heavy: { threshold: 3, risk: 0.5 } # Many local lemmas
moderate: { threshold: 1, risk: 0.2 } # Some local lemmas
low_confidence_risk:
very_low: { threshold: 0.4, risk: 0.7 } # Very low confidence
low: { threshold: 0.6, risk: 0.4 } # Low confidence
component_weights:
global_change: 0.3 # Weight for global change risk [0.0, 1.0]
simp: 0.3 # Weight for simp risk [0.0, 1.0]
local_lemma: 0.2 # Weight for local lemma risk [0.0, 1.0]
confidence: 0.2 # Weight for confidence risk [0.0, 1.0]
Use Cases:
- Increase simp_question_mark risk if you want to strongly discourage simp?
- Adjust component_weights to emphasize different risk factors
- Increase local_lemma_risk if you're concerned about local lemma usage
Impact Scoring¶
Estimates time saved by automation.
impact_scoring:
proof_length_scores:
- { max_lines: 0, score: 0.0 } # No proof
- { max_lines: 5, score: 0.2 } # Very short
- { max_lines: 10, score: 0.4 } # Short
- { max_lines: 20, score: 0.6 } # Medium
- { max_lines: 40, score: 0.8 } # Long
- { min_lines: 40, score: 1.0 } # Very long
reusability:
score_per_local_lemma: 0.2 # Score per local lemma [0.0, 1.0]
max_score: 1.0 # Maximum reusability score [0.0, 1.0]
component_weights:
proof_length: 0.5 # Weight for proof length [0.0, 1.0]
annotation_value: 0.3 # Weight for annotation value [0.0, 1.0]
reusability: 0.2 # Weight for reusability [0.0, 1.0]
Use Cases:
- Adjust proof_length_scores based on typical proof lengths in your codebase
- Increase reusability.score_per_local_lemma if you value reusable proofs
- Adjust component_weights to emphasize different impact factors
Success Likelihood Scoring¶
Estimates likelihood of successful automation.
success_likelihood_scoring:
complexity_penalties:
induction_or_cases: 0.2 # Penalty for structural complexity [0.0, 1.0]
long_proof: { threshold: 30, penalty: 0.1 } # Penalty for long proofs
many_local_lemmas: { threshold: 3, penalty: 0.1 } # Penalty for many local lemmas
max_penalty: 0.5 # Maximum complexity penalty [0.0, 1.0]
component_weights:
max_potential: 0.6 # Weight for maximum automation potential [0.0, 1.0]
confidence: 0.3 # Weight for confidence [0.0, 1.0]
complexity_penalty: -0.1 # Weight for complexity penalty (negative)
Use Cases:
- Adjust complexity_penalties based on your tolerance for complex proofs
- Increase max_penalty if you want to strongly discourage complex proofs
- Adjust component_weights to emphasize different success factors
Objectives¶
Different ranking strategies for different use cases.
objectives:
maximize_success:
description: "Prioritize theorems most likely to be automated successfully"
use_case: "When you want quick wins and high success rate"
weights:
success_likelihood: 0.50
impact: 0.10
annotation_value: 0.10
subgoal_potential: 0.10
risk: -0.20
maximize_impact:
description: "Prioritize theorems that save the most time when automated"
use_case: "When you want maximum ROI on automation effort"
weights:
success_likelihood: 0.20
impact: 0.40
annotation_value: 0.30
subgoal_potential: 0.05
risk: -0.05
maximize_subgoal_automation:
description: "Prioritize theorems with good partial automation opportunities"
use_case: "When you want to automate proof steps rather than whole goals"
weights:
success_likelihood: 0.15
impact: 0.15
annotation_value: 0.20
subgoal_potential: 0.40
risk: -0.10
balanced:
description: "Balanced weighting across all factors"
use_case: "When you want a general-purpose ranking"
weights:
success_likelihood: 0.25
impact: 0.25
annotation_value: 0.20
subgoal_potential: 0.20
risk: -0.10
Use Cases: - Create custom objectives by adding new entries - Adjust weights to emphasize different factors - Use negative weights for risk to penalize risky theorems
Already-Automated Detection¶
Detects theorems that already use automation tactics or attributes.
already_automated:
penalties:
tactic_usage: 0.3 # Penalty for using automation tactics [0.0, 1.0]
attribute: 0.5 # Penalty for automation attributes [0.0, 1.0]
trivial_proof: 0.8 # Penalty for trivial proofs [0.0, 1.0]
tactic_patterns:
- "by aesop"
- "by grind"
- "by simp"
- "by omega"
- "by decide"
- "by tauto"
- "aesop"
- "grind"
- "simp"
attribute_patterns:
- "@[aesop"
- "@[simp"
trivial_patterns:
- ":= rfl"
- "by rfl"
- ":= trivial"
- "by trivial"
Use Cases: - Add custom tactic patterns to detect additional automation tactics - Adjust penalties based on how much you want to discourage already-automated theorems - Add custom attribute patterns for your own automation attributes
Tiers¶
Percentile thresholds for S/A/B/C/D tier assignment.
tiers:
s_tier_percentile: 10 # Top 10% are S-tier (positive number < 100)
a_tier_percentile: 25 # Top 10-25% are A-tier (positive number < 100)
b_tier_percentile: 50 # Top 25-50% are B-tier (positive number < 100)
c_tier_percentile: 75 # Top 50-75% are C-tier (positive number < 100)
# Bottom 25% are D-tier (implicit)
Use Cases:
- Adjust percentiles to make tiers more or less selective
- Use smaller s_tier_percentile for stricter S-tier classification
- Use larger percentiles for more lenient tier classification
Validation¶
Configuration files are validated at load time. Invalid configurations fail fast with clear error messages:
Version Validation¶
version: "2.0" # ❌ Invalid version
Error: Configuration version must be "1.0", got "2.0"
Range Validation¶
confidence:
base_score: 1.5 # ❌ Out of range
Error: confidence.base_score must be in [0.0, 1.0], got 1.5
Threshold Validation¶
tiers:
s_tier_percentile: 30
a_tier_percentile: 20 # ❌ Not monotonic
Error: Tier thresholds must be strictly increasing
Example Custom Configurations¶
Conservative Configuration¶
Prioritize safety and low risk:
version: "1.0"
# ... (standard sections)
objectives:
conservative:
description: "Prioritize safe, low-risk automation"
use_case: "When you want to minimize breaking changes"
weights:
success_likelihood: 0.40
impact: 0.15
annotation_value: 0.15
subgoal_potential: 0.10
risk: -0.40 # Strong risk penalty
risk_scoring:
component_weights:
global_change: 0.4 # Emphasize global change risk
simp: 0.3
local_lemma: 0.2
confidence: 0.1
Aggressive Configuration¶
Prioritize impact and automation potential:
version: "1.0"
# ... (standard sections)
objectives:
aggressive:
description: "Prioritize maximum automation potential"
use_case: "When you want to automate as much as possible"
weights:
success_likelihood: 0.30
impact: 0.50
annotation_value: 0.20
subgoal_potential: 0.15
risk: -0.05 # Weak risk penalty
aesop_scoring:
base_score: 0.3 # Higher base score
confidence_bonuses:
high: { threshold: 0.8, bonus: 0.4 } # Larger bonuses
Research Configuration¶
Prioritize experimentation and exploration:
version: "1.0"
# ... (standard sections)
tiers:
s_tier_percentile: 5 # Top 5% are S-tier (stricter)
a_tier_percentile: 15 # Top 5-15% are A-tier
b_tier_percentile: 40 # Top 15-40% are B-tier
c_tier_percentile: 70 # Top 40-70% are C-tier
confidence:
base_score: 0.2 # Lower base (more conservative)
bonuses:
proof_structure: 0.3 # Higher bonuses for good structure
Troubleshooting¶
Configuration Not Loading¶
Problem: Custom configuration not being used.
Solution: Check configuration source in response metadata:
{
"metadata": {
"config_source": "default" // Should be "path" or "environment"
}
}
Verify: 1. File path is correct and accessible 2. Environment variable is set correctly 3. YAML syntax is valid
Validation Errors¶
Problem: Configuration fails to load with validation error.
Solution: Check error message for specific field and valid range:
Error: confidence.base_score must be in [0.0, 1.0], got 1.5
Fix the invalid value and reload.
Unexpected Scores¶
Problem: Scores don't match expectations.
Solution:
1. Check objective weights in configuration
2. Verify component scores in response (use include_components: true)
3. Adjust relevant parameters incrementally
4. Test with small changes to understand impact
Best Practices¶
- Start with defaults: Use the package default configuration initially
- Make incremental changes: Adjust one parameter at a time
- Test thoroughly: Verify changes produce expected results
- Document changes: Add comments explaining why parameters were changed
- Version control: Track configuration files in version control
- Share configurations: Share successful configurations with your team
- Validate early: Test configuration loading before running full analysis
See Also¶
- Tool Contract Documentation - API reference
- Overview - Quick start context
- Default configuration:
src/lean_proof_auto_mcp/heuristics.yaml