User Guide
Table of Contents
Overview
This comprehensive guide covers all aspects of using pycefrl to analyze Python code complexity. Whether youβre a beginner or an experienced developer, youβll find detailed information on how to get the most out of pycefrl.
Main Concepts
CEFR Framework Adaptation
pycefrl is inspired by the Common European Framework of Reference for Languages (CEFR), which defines six levels of language proficiency from A1 (beginner) to C2 (proficient).
We apply this concept to Python code:
- A1 (Basic): Simple data structures, basic assignments, print statements
- A2 (Elementary): File operations, simple loops, basic function calls
- B1 (Intermediate): Functions with parameters, classes, exception handling
- B2 (Upper Intermediate): Decorators, inheritance, advanced OOP
- C1 (Advanced): Comprehensions, generators, metaclasses
- C2 (Proficient): Complex nested comprehensions, advanced design patterns
Abstract Syntax Trees (AST)
pycefrl analyzes Python code by parsing it into an Abstract Syntax Tree (AST) using Pythonβs built-in ast module. This allows precise identification of code constructs without executing the code.
Level Dictionary
The dicc.txt file (generated by dict.py) maps specific AST node patterns to CEFR levels. This mapping is customizable through the configuration.cfg file.
Installation & Setup
Initial Setup
- Clone the repository:
git clone https://github.com/raux/pycefrl.git cd pycefrl - Install dependencies:
pip3 install -r requirements.txt - Generate level dictionary:
python3 dict.py
See the Installation Guide for detailed instructions.
Basic Usage
Analyzing a Directory
The most common use case is analyzing a local project directory:
python3 pycerfl.py directory /path/to/project
What happens:
- pycefrl scans for all
.pyfiles in the directory - Each file is parsed into an AST
- Each code element is classified by level
- Results are saved to
data.jsonanddata.csv
Example Output Structure:
project/
βββ data.json # Complete results
βββ data.csv # CSV format
βββ DATA_JSON/
β βββ summary_data.json # Aggregated statistics
β βββ total_data.json # File-level breakdown
β βββ project.json # Repository summary
βββ DATA_CSV/
βββ file1.csv # Individual file results
βββ file2.csv
Analyzing GitHub Repositories
Analyze any public GitHub repository:
python3 pycerfl.py repo https://github.com/username/repository
Process:
- Repository is cloned to a temporary location
- All Python files are analyzed
- Results include repository name in output
- Temporary clone is retained for reference
Useful for:
- Comparing different projects
- Studying open-source codebases
- Benchmarking your code against established projects
Analyzing GitHub Users
Analyze all public repositories of a GitHub user:
python3 pycerfl.py user username
Process:
- Fetches list of userβs public repositories via GitHub API
- Clones and analyzes each repository
- Aggregates results across all repositories
Useful for:
- Assessing a developerβs overall coding style
- Portfolio analysis
- Skill level progression tracking
Output Files Explained
data.json
Complete analysis data in JSON array format:
[
{
"Repository": "myproject",
"File Name": "main.py",
"Class": "Simple Function",
"Start Line": 10,
"End Line": 15,
"Displacement": 4,
"Level": "B1"
}
]
Fields:
Repository: Source repository or directory nameFile Name: Python file being analyzedClass: Type of code elementStart Line: Line where element startsEnd Line: Line where element endsDisplacement: Indentation level (column offset)Level: CEFR level assignment
data.csv
Same information in CSV format for spreadsheet analysis:
Repository,File Name,Class,Start Line,End Line,Displacement,Level
myproject,main.py,Simple Function,10,15,4,B1
summary_data.json
Aggregated level statistics:
{
"Levels": {
"A1": 450,
"A2": 380,
"B1": 120,
"B2": 45,
"C1": 28,
"C2": 12
},
"Class": {
"Simple List": 81,
"Simple Function": 42,
"Simple Class": 15
}
}
Use this for:
- Quick overview of code complexity
- Comparing multiple projects
- Generating reports
total_data.json
File-level breakdown:
{
"myproject": {
"main.py": {
"Levels": {
"A1": 25,
"A2": 15,
"B1": 8
}
}
}
}
Use this for:
- Identifying complex files
- Prioritizing refactoring efforts
- File-by-file comparison
Advanced Usage
Custom Configuration
Edit configuration.cfg to customize level assignments:
[A1]
Simple List = ast.List
Simple Assignment = ast.Assign
[B1]
Function = ast.FunctionDef
After editing, regenerate the dictionary:
python3 dict.py
Customization scenarios:
- Adjust difficulty levels for your teamβs skill distribution
- Focus on specific Python features
- Create domain-specific complexity metrics
Filtering Results
Use command-line tools to filter results:
# Find all C1/C2 elements
cat data.csv | grep -E "C1|C2"
# Count elements by level
cat data.csv | cut -d',' -f7 | sort | uniq -c
# Find files with most complex elements
cat data.csv | awk -F',' '{if($7=="C1" || $7=="C2") print $2}' | sort | uniq -c | sort -rn
Batch Analysis
Create scripts to analyze multiple projects:
#!/bin/bash
for dir in projects/*; do
echo "Analyzing $dir..."
python3 pycerfl.py directory "$dir"
mv data.json "results/$(basename $dir)_data.json"
done
Integration with CI/CD
Add code complexity checks to your CI pipeline:
# .github/workflows/complexity.yml
name: Code Complexity Check
on: [push, pull_request]
jobs:
analyze:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: '3.9'
- name: Analyze complexity
run: |
git clone https://github.com/raux/pycefrl.git
cd pycefrl
pip3 install -r requirements.txt
python3 dict.py
python3 pycerfl.py directory ../
- name: Check complexity threshold
run: |
# Add custom checks here
echo "Analysis complete"
Streamlit Interface
Launching the App
python3 -m streamlit run app.py
Features
- Mode Selection
- Directory analysis
- GitHub repository analysis
- GitHub user analysis
- Real-Time Monitoring
- Live execution logs
- CPU and RAM usage statistics
- Progress indicators
- Interactive Visualizations
- Bubble Chart: Category vs Level with size representing frequency
- Heatmap: File vs Level distribution
- Treemap: Hierarchical drill-down (Level β Category β Element)
- Export Options
- Download JSON reports
- Download CSV reports
- Copy charts to clipboard
Best Practices with Streamlit
- Performance: Start with small projects to understand output
- GitHub Rate Limits: Be mindful of API rate limits for user analysis
- System Resources: Monitor CPU/RAM for large repository analyses
- Data Export: Always export results for offline analysis
Results Dashboard
Using the Web Dashboard
- Open dashboard.html in your browser
- Click βLoad JSON Fileβ to upload results
- Explore visualizations and statistics
- Use filters to focus on specific elements
- Export filtered data as CSV
Dashboard Features
- Level Distribution Cards: Quick statistics for each CEFR level
- Bar Charts: Visual representation of level distribution
- Element Frequency: Top 10 most common code constructs
- File Analysis Table: Detailed file-by-file breakdown
- Search & Filter: Focus on specific files or complexity levels
- CSV Export: Download filtered results
Interpreting Results
Understanding Level Distribution
High A1/A2 (60%+):
- β Easy to read and maintain
- β Good for beginners
- β οΈ May underutilize Python features
- π‘ Consider using comprehensions, context managers
Balanced (30-40% each tier):
- β Well-structured code
- β Appropriate complexity
- β Good separation of concerns
- π Ideal for most projects
High C1/C2 (40%+):
- β Advanced Python usage
- β οΈ May be difficult for juniors
- β οΈ Potential over-engineering
- π‘ Consider simplifying where possible
Code Quality Indicators
Red Flags:
- Very high C2 percentage (>20%) in utility code
- Single files with extreme level variation
- Many B2+ elements in configuration files
- No B1+ elements in large codebases (underutilization)
Green Flags:
- Gradual increase in complexity from utilities to core logic
- Consistent style within modules
- Appropriate use of advanced features
- Good balance matching team skill level
Actionable Insights
- Refactoring Priorities
- Files with high C1/C2 concentration
- Long functions with multiple complexity levels
- Duplicate complex patterns
- Learning Opportunities
- Codebases at your target skill level
- Files demonstrating specific patterns well
- Projects with balanced complexity
- Code Review Focus
- New C2 elements in simple modules
- Complexity increases in PRs
- Inconsistencies with project norms
Common Tasks
Task 1: Assess Project Complexity
# Analyze project
python3 pycerfl.py directory ./myproject
# View summary
cat DATA_JSON/summary_data.json
# Calculate complexity score
python3 -c "
import json
with open('DATA_JSON/summary_data.json') as f:
data = json.load(f)
levels = data['Levels']
total = sum(levels.values())
scores = {'A1': 1, 'A2': 2, 'B1': 3, 'B2': 4, 'C1': 5, 'C2': 6}
weighted = sum(levels.get(l, 0) * scores[l] for l in scores)
avg = weighted / total if total > 0 else 0
print(f'Average complexity: {avg:.2f}')
"
Task 2: Compare Two Projects
# Analyze both projects
python3 pycerfl.py directory ./project1
cp DATA_JSON/summary_data.json project1_summary.json
python3 pycerfl.py directory ./project2
cp DATA_JSON/summary_data.json project2_summary.json
# Compare
python3 -c "
import json
with open('project1_summary.json') as f1, open('project2_summary.json') as f2:
p1 = json.load(f1)['Levels']
p2 = json.load(f2)['Levels']
print('Level | Project1 | Project2')
print('-------|----------|----------')
for level in ['A1', 'A2', 'B1', 'B2', 'C1', 'C2']:
print(f'{level:6} | {p1.get(level, 0):8} | {p2.get(level, 0):8}')
"
Task 3: Track Changes Over Time
#!/bin/bash
# track_complexity.sh
mkdir -p complexity_history
# Get current complexity
python3 pycerfl.py directory ./src
timestamp=$(date +%Y%m%d_%H%M%S)
cp DATA_JSON/summary_data.json "complexity_history/${timestamp}.json"
# Generate report
python3 -c "
import json
import glob
import os
files = sorted(glob.glob('complexity_history/*.json'))
print('Timestamp | A1 | A2 | B1 | B2 | C1 | C2')
print('-------------------|-----|-----|----|----|----|----|')
for f in files:
timestamp = os.path.basename(f).replace('.json', '')
with open(f) as fp:
data = json.load(fp)
levels = data['Levels']
print(f'{timestamp} | {levels.get(\"A1\", 0):3} | {levels.get(\"A2\", 0):3} | {levels.get(\"B1\", 0):2} | {levels.get(\"B2\", 0):2} | {levels.get(\"C1\", 0):2} | {levels.get(\"C2\", 0):2}')
"
Task 4: Identify Refactoring Candidates
# Find files with high complexity
python3 pycerfl.py directory ./src
python3 -c "
import json
with open('DATA_JSON/total_data.json') as f:
data = json.load(f)
for repo, files in data.items():
print(f'\nRepository: {repo}')
for filename, stats in files.items():
if 'Levels' in stats:
levels = stats['Levels']
complex_count = levels.get('C1', 0) + levels.get('C2', 0)
total = sum(levels.values())
if total > 0 and complex_count / total > 0.3:
print(f' {filename}: {complex_count}/{total} complex elements ({complex_count/total*100:.1f}%)')
"
Troubleshooting
Issue: No results generated
Check:
- Are there Python files in the target directory?
- Does
dicc.txtexist? Runpython3 dict.pyif not - Check for syntax errors in target files
- Verify write permissions
Issue: Incomplete results
Possible causes:
- Files with syntax errors are skipped
- Empty or comment-only files produce no results
- Very simple files may have few detectable elements
Solution: Check console output for errors or warnings
Issue: GitHub API errors
Rate limiting:
- GitHub API has rate limits
- Wait a few minutes and retry
- Consider using a GitHub token for higher limits
Issue: Memory errors
For very large projects:
# Analyze in smaller chunks
for dir in src/*/; do
python3 pycerfl.py directory "$dir"
# Process results
done
Best Practices
- Regular Analysis: Run periodic analyses to track complexity evolution
- Baseline Metrics: Establish complexity baselines for your projects
- Team Standards: Use results to define coding standards
- Code Reviews: Include complexity metrics in review process
- Learning Tool: Study well-designed projects to improve skills
- Continuous Improvement: Set goals for complexity reduction
- Context Matters: High complexity isnβt always bad - consider the domain
- Balance: Aim for appropriate complexity, not minimal complexity
Next Steps
- API Reference - Detailed API documentation
- Examples - Practical usage examples
- Contributing - Contribute to pycefrl
- Dashboard - Visualize your results
Getting Help
- GitHub Issues - Report bugs or request features
- Discussions - Ask questions and share ideas
- Examples - See real-world usage patterns