User Guide

Overview

This comprehensive guide covers all aspects of using pycefrl to analyze Python code complexity. Whether you’re a beginner or an experienced developer, you’ll find detailed information on how to get the most out of pycefrl.

Main Concepts

CEFR Framework Adaptation

pycefrl is inspired by the Common European Framework of Reference for Languages (CEFR), which defines six levels of language proficiency from A1 (beginner) to C2 (proficient).

We apply this concept to Python code:

A1 (Basic): Simple data structures, basic assignments, print statements
A2 (Elementary): File operations, simple loops, basic function calls
B1 (Intermediate): Functions with parameters, classes, exception handling
B2 (Upper Intermediate): Decorators, inheritance, advanced OOP
C1 (Advanced): Comprehensions, generators, metaclasses
C2 (Proficient): Complex nested comprehensions, advanced design patterns

Abstract Syntax Trees (AST)

pycefrl analyzes Python code by parsing it into an Abstract Syntax Tree (AST) using Python’s built-in ast module. This allows precise identification of code constructs without executing the code.

Level Dictionary

The dicc.txt file (generated by dict.py) maps specific AST node patterns to CEFR levels. This mapping is customizable through the configuration.cfg file.

Installation & Setup

Initial Setup

Clone the repository:

git clone https://github.com/raux/pycefrl.git
cd pycefrl

Install dependencies:
```
pip3 install -r requirements.txt
```
Generate level dictionary:
```
python3 dict.py
```

See the Installation Guide for detailed instructions.

Basic Usage

Analyzing a Directory

The most common use case is analyzing a local project directory:

python3 pycerfl.py directory /path/to/project

What happens:

pycefrl scans for all .py files in the directory
Each file is parsed into an AST
Each code element is classified by level
Results are saved to data.json and data.csv

Example Output Structure:

project/
├── data.json              # Complete results
├── data.csv               # CSV format
├── DATA_JSON/
│   ├── summary_data.json  # Aggregated statistics
│   ├── total_data.json    # File-level breakdown
│   └── project.json       # Repository summary
└── DATA_CSV/
    ├── file1.csv          # Individual file results
    └── file2.csv

Analyzing GitHub Repositories

Analyze any public GitHub repository:

python3 pycerfl.py repo https://github.com/username/repository

Process:

Repository is cloned to a temporary location
All Python files are analyzed
Results include repository name in output
Temporary clone is retained for reference

Useful for:

Comparing different projects
Studying open-source codebases
Benchmarking your code against established projects

Analyzing GitHub Users

Analyze all public repositories of a GitHub user:

python3 pycerfl.py user username

Process:

Fetches list of user’s public repositories via GitHub API
Clones and analyzes each repository
Aggregates results across all repositories

Useful for:

Assessing a developer’s overall coding style
Portfolio analysis
Skill level progression tracking

Output Files Explained

data.json

Complete analysis data in JSON array format:

[
  {
    "Repository": "myproject",
    "File Name": "main.py",
    "Class": "Simple Function",
    "Start Line": 10,
    "End Line": 15,
    "Displacement": 4,
    "Level": "B1"
  }
]

Fields:

Repository: Source repository or directory name
File Name: Python file being analyzed
Class: Type of code element
Start Line: Line where element starts
End Line: Line where element ends
Displacement: Indentation level (column offset)
Level: CEFR level assignment

data.csv

Same information in CSV format for spreadsheet analysis:

Repository,File Name,Class,Start Line,End Line,Displacement,Level
myproject,main.py,Simple Function,10,15,4,B1

summary_data.json

Aggregated level statistics:

{
  "Levels": {
    "A1": 450,
    "A2": 380,
    "B1": 120,
    "B2": 45,
    "C1": 28,
    "C2": 12
  },
  "Class": {
    "Simple List": 81,
    "Simple Function": 42,
    "Simple Class": 15
  }
}

Use this for:

Quick overview of code complexity
Comparing multiple projects
Generating reports

total_data.json

File-level breakdown:

{
  "myproject": {
    "main.py": {
      "Levels": {
        "A1": 25,
        "A2": 15,
        "B1": 8
      }
    }
  }
}

Use this for:

Identifying complex files
Prioritizing refactoring efforts
File-by-file comparison

Advanced Usage

Custom Configuration

Edit configuration.cfg to customize level assignments:

[A1]
Simple List = ast.List
Simple Assignment = ast.Assign

[B1]
Function = ast.FunctionDef

After editing, regenerate the dictionary:

python3 dict.py

Customization scenarios:

Adjust difficulty levels for your team’s skill distribution
Focus on specific Python features
Create domain-specific complexity metrics

Filtering Results

Use command-line tools to filter results:

# Find all C1/C2 elements
cat data.csv | grep -E "C1|C2"

# Count elements by level
cat data.csv | cut -d',' -f7 | sort | uniq -c

# Find files with most complex elements
cat data.csv | awk -F',' '{if($7=="C1" || $7=="C2") print $2}' | sort | uniq -c | sort -rn

Batch Analysis

Create scripts to analyze multiple projects:

#!/bin/bash
for dir in projects/*; do
  echo "Analyzing $dir..."
  python3 pycerfl.py directory "$dir"
  mv data.json "results/$(basename $dir)_data.json"
done

Integration with CI/CD

Add code complexity checks to your CI pipeline:

# .github/workflows/complexity.yml
name: Code Complexity Check
on: [push, pull_request]
jobs:
  analyze:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: '3.9'
      - name: Analyze complexity
        run: |
          git clone https://github.com/raux/pycefrl.git
          cd pycefrl
          pip3 install -r requirements.txt
          python3 dict.py
          python3 pycerfl.py directory ../
      - name: Check complexity threshold
        run: |
          # Add custom checks here
          echo "Analysis complete"

Streamlit Interface

Launching the App

python3 -m streamlit run app.py

Features

Mode Selection
- Directory analysis
- GitHub repository analysis
- GitHub user analysis
Real-Time Monitoring
- Live execution logs
- CPU and RAM usage statistics
- Progress indicators
Interactive Visualizations
- Bubble Chart: Category vs Level with size representing frequency
- Heatmap: File vs Level distribution
- Treemap: Hierarchical drill-down (Level → Category → Element)
Export Options
- Download JSON reports
- Download CSV reports
- Copy charts to clipboard

Best Practices with Streamlit

Performance: Start with small projects to understand output
GitHub Rate Limits: Be mindful of API rate limits for user analysis
System Resources: Monitor CPU/RAM for large repository analyses
Data Export: Always export results for offline analysis

Results Dashboard

Using the Web Dashboard

Open dashboard.html in your browser
Click “Load JSON File” to upload results
Explore visualizations and statistics
Use filters to focus on specific elements
Export filtered data as CSV

Dashboard Features

Level Distribution Cards: Quick statistics for each CEFR level
Bar Charts: Visual representation of level distribution
Element Frequency: Top 10 most common code constructs
File Analysis Table: Detailed file-by-file breakdown
Search & Filter: Focus on specific files or complexity levels
CSV Export: Download filtered results

Interpreting Results

Understanding Level Distribution

High A1/A2 (60%+):

✅ Easy to read and maintain
✅ Good for beginners
⚠️ May underutilize Python features
💡 Consider using comprehensions, context managers

Balanced (30-40% each tier):

✅ Well-structured code
✅ Appropriate complexity
✅ Good separation of concerns
👍 Ideal for most projects

High C1/C2 (40%+):

✅ Advanced Python usage
⚠️ May be difficult for juniors
⚠️ Potential over-engineering
💡 Consider simplifying where possible

Code Quality Indicators

Red Flags:

Very high C2 percentage (>20%) in utility code
Single files with extreme level variation
Many B2+ elements in configuration files
No B1+ elements in large codebases (underutilization)

Green Flags:

Gradual increase in complexity from utilities to core logic
Consistent style within modules
Appropriate use of advanced features
Good balance matching team skill level

Actionable Insights

Refactoring Priorities
- Files with high C1/C2 concentration
- Long functions with multiple complexity levels
- Duplicate complex patterns
Learning Opportunities
- Codebases at your target skill level
- Files demonstrating specific patterns well
- Projects with balanced complexity
Code Review Focus
- New C2 elements in simple modules
- Complexity increases in PRs
- Inconsistencies with project norms

Common Tasks

Task 1: Assess Project Complexity

# Analyze project
python3 pycerfl.py directory ./myproject

# View summary
cat DATA_JSON/summary_data.json

# Calculate complexity score
python3 -c "
import json
with open('DATA_JSON/summary_data.json') as f:
    data = json.load(f)
    levels = data['Levels']
    total = sum(levels.values())
    scores = {'A1': 1, 'A2': 2, 'B1': 3, 'B2': 4, 'C1': 5, 'C2': 6}
    weighted = sum(levels.get(l, 0) * scores[l] for l in scores)
    avg = weighted / total if total > 0 else 0
    print(f'Average complexity: {avg:.2f}')
"

Task 2: Compare Two Projects

# Analyze both projects
python3 pycerfl.py directory ./project1
cp DATA_JSON/summary_data.json project1_summary.json

python3 pycerfl.py directory ./project2
cp DATA_JSON/summary_data.json project2_summary.json

# Compare
python3 -c "
import json
with open('project1_summary.json') as f1, open('project2_summary.json') as f2:
    p1 = json.load(f1)['Levels']
    p2 = json.load(f2)['Levels']
    print('Level  | Project1 | Project2')
    print('-------|----------|----------')
    for level in ['A1', 'A2', 'B1', 'B2', 'C1', 'C2']:
        print(f'{level:6} | {p1.get(level, 0):8} | {p2.get(level, 0):8}')
"

Task 3: Track Changes Over Time

#!/bin/bash
# track_complexity.sh

mkdir -p complexity_history

# Get current complexity
python3 pycerfl.py directory ./src
timestamp=$(date +%Y%m%d_%H%M%S)
cp DATA_JSON/summary_data.json "complexity_history/${timestamp}.json"

# Generate report
python3 -c "
import json
import glob
import os

files = sorted(glob.glob('complexity_history/*.json'))
print('Timestamp          | A1  | A2  | B1 | B2 | C1 | C2')
print('-------------------|-----|-----|----|----|----|----|')
for f in files:
    timestamp = os.path.basename(f).replace('.json', '')
    with open(f) as fp:
        data = json.load(fp)
        levels = data['Levels']
        print(f'{timestamp} | {levels.get(\"A1\", 0):3} | {levels.get(\"A2\", 0):3} | {levels.get(\"B1\", 0):2} | {levels.get(\"B2\", 0):2} | {levels.get(\"C1\", 0):2} | {levels.get(\"C2\", 0):2}')
"

Task 4: Identify Refactoring Candidates

# Find files with high complexity
python3 pycerfl.py directory ./src

python3 -c "
import json
with open('DATA_JSON/total_data.json') as f:
    data = json.load(f)
    for repo, files in data.items():
        print(f'\nRepository: {repo}')
        for filename, stats in files.items():
            if 'Levels' in stats:
                levels = stats['Levels']
                complex_count = levels.get('C1', 0) + levels.get('C2', 0)
                total = sum(levels.values())
                if total > 0 and complex_count / total > 0.3:
                    print(f'  {filename}: {complex_count}/{total} complex elements ({complex_count/total*100:.1f}%)')
"

Troubleshooting

Issue: No results generated

Check:

Are there Python files in the target directory?
Does dicc.txt exist? Run python3 dict.py if not
Check for syntax errors in target files
Verify write permissions

Issue: Incomplete results

Possible causes:

Files with syntax errors are skipped
Empty or comment-only files produce no results
Very simple files may have few detectable elements

Solution: Check console output for errors or warnings

Issue: GitHub API errors

Rate limiting:

GitHub API has rate limits
Wait a few minutes and retry
Consider using a GitHub token for higher limits

Issue: Memory errors

For very large projects:

# Analyze in smaller chunks
for dir in src/*/; do
  python3 pycerfl.py directory "$dir"
  # Process results
done

Best Practices

Regular Analysis: Run periodic analyses to track complexity evolution
Baseline Metrics: Establish complexity baselines for your projects
Team Standards: Use results to define coding standards
Code Reviews: Include complexity metrics in review process
Learning Tool: Study well-designed projects to improve skills
Continuous Improvement: Set goals for complexity reduction
Context Matters: High complexity isn’t always bad - consider the domain
Balance: Aim for appropriate complexity, not minimal complexity

Next Steps

API Reference - Detailed API documentation
Examples - Practical usage examples
Contributing - Contribute to pycefrl
Dashboard - Visualize your results

Getting Help

GitHub Issues - Report bugs or request features
Discussions - Ask questions and share ideas
Examples - See real-world usage patterns