User Guide

Table of Contents

Overview

This comprehensive guide covers all aspects of using pycefrl to analyze Python code complexity. Whether you’re a beginner or an experienced developer, you’ll find detailed information on how to get the most out of pycefrl.

Main Concepts

CEFR Framework Adaptation

pycefrl is inspired by the Common European Framework of Reference for Languages (CEFR), which defines six levels of language proficiency from A1 (beginner) to C2 (proficient).

We apply this concept to Python code:

  • A1 (Basic): Simple data structures, basic assignments, print statements
  • A2 (Elementary): File operations, simple loops, basic function calls
  • B1 (Intermediate): Functions with parameters, classes, exception handling
  • B2 (Upper Intermediate): Decorators, inheritance, advanced OOP
  • C1 (Advanced): Comprehensions, generators, metaclasses
  • C2 (Proficient): Complex nested comprehensions, advanced design patterns

Abstract Syntax Trees (AST)

pycefrl analyzes Python code by parsing it into an Abstract Syntax Tree (AST) using Python’s built-in ast module. This allows precise identification of code constructs without executing the code.

Level Dictionary

The dicc.txt file (generated by dict.py) maps specific AST node patterns to CEFR levels. This mapping is customizable through the configuration.cfg file.

Installation & Setup

Initial Setup

  1. Clone the repository:
    git clone https://github.com/raux/pycefrl.git
    cd pycefrl
    
  2. Install dependencies:
    pip3 install -r requirements.txt
    
  3. Generate level dictionary:
    python3 dict.py
    

See the Installation Guide for detailed instructions.

Basic Usage

Analyzing a Directory

The most common use case is analyzing a local project directory:

python3 pycerfl.py directory /path/to/project

What happens:

  1. pycefrl scans for all .py files in the directory
  2. Each file is parsed into an AST
  3. Each code element is classified by level
  4. Results are saved to data.json and data.csv

Example Output Structure:

project/
β”œβ”€β”€ data.json              # Complete results
β”œβ”€β”€ data.csv               # CSV format
β”œβ”€β”€ DATA_JSON/
β”‚   β”œβ”€β”€ summary_data.json  # Aggregated statistics
β”‚   β”œβ”€β”€ total_data.json    # File-level breakdown
β”‚   └── project.json       # Repository summary
└── DATA_CSV/
    β”œβ”€β”€ file1.csv          # Individual file results
    └── file2.csv

Analyzing GitHub Repositories

Analyze any public GitHub repository:

python3 pycerfl.py repo https://github.com/username/repository

Process:

  1. Repository is cloned to a temporary location
  2. All Python files are analyzed
  3. Results include repository name in output
  4. Temporary clone is retained for reference

Useful for:

  • Comparing different projects
  • Studying open-source codebases
  • Benchmarking your code against established projects

Analyzing GitHub Users

Analyze all public repositories of a GitHub user:

python3 pycerfl.py user username

Process:

  1. Fetches list of user’s public repositories via GitHub API
  2. Clones and analyzes each repository
  3. Aggregates results across all repositories

Useful for:

  • Assessing a developer’s overall coding style
  • Portfolio analysis
  • Skill level progression tracking

Output Files Explained

data.json

Complete analysis data in JSON array format:

[
  {
    "Repository": "myproject",
    "File Name": "main.py",
    "Class": "Simple Function",
    "Start Line": 10,
    "End Line": 15,
    "Displacement": 4,
    "Level": "B1"
  }
]

Fields:

  • Repository: Source repository or directory name
  • File Name: Python file being analyzed
  • Class: Type of code element
  • Start Line: Line where element starts
  • End Line: Line where element ends
  • Displacement: Indentation level (column offset)
  • Level: CEFR level assignment

data.csv

Same information in CSV format for spreadsheet analysis:

Repository,File Name,Class,Start Line,End Line,Displacement,Level
myproject,main.py,Simple Function,10,15,4,B1

summary_data.json

Aggregated level statistics:

{
  "Levels": {
    "A1": 450,
    "A2": 380,
    "B1": 120,
    "B2": 45,
    "C1": 28,
    "C2": 12
  },
  "Class": {
    "Simple List": 81,
    "Simple Function": 42,
    "Simple Class": 15
  }
}

Use this for:

  • Quick overview of code complexity
  • Comparing multiple projects
  • Generating reports

total_data.json

File-level breakdown:

{
  "myproject": {
    "main.py": {
      "Levels": {
        "A1": 25,
        "A2": 15,
        "B1": 8
      }
    }
  }
}

Use this for:

  • Identifying complex files
  • Prioritizing refactoring efforts
  • File-by-file comparison

Advanced Usage

Custom Configuration

Edit configuration.cfg to customize level assignments:

[A1]
Simple List = ast.List
Simple Assignment = ast.Assign

[B1]
Function = ast.FunctionDef

After editing, regenerate the dictionary:

python3 dict.py

Customization scenarios:

  • Adjust difficulty levels for your team’s skill distribution
  • Focus on specific Python features
  • Create domain-specific complexity metrics

Filtering Results

Use command-line tools to filter results:

# Find all C1/C2 elements
cat data.csv | grep -E "C1|C2"

# Count elements by level
cat data.csv | cut -d',' -f7 | sort | uniq -c

# Find files with most complex elements
cat data.csv | awk -F',' '{if($7=="C1" || $7=="C2") print $2}' | sort | uniq -c | sort -rn

Batch Analysis

Create scripts to analyze multiple projects:

#!/bin/bash
for dir in projects/*; do
  echo "Analyzing $dir..."
  python3 pycerfl.py directory "$dir"
  mv data.json "results/$(basename $dir)_data.json"
done

Integration with CI/CD

Add code complexity checks to your CI pipeline:

# .github/workflows/complexity.yml
name: Code Complexity Check
on: [push, pull_request]
jobs:
  analyze:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: '3.9'
      - name: Analyze complexity
        run: |
          git clone https://github.com/raux/pycefrl.git
          cd pycefrl
          pip3 install -r requirements.txt
          python3 dict.py
          python3 pycerfl.py directory ../
      - name: Check complexity threshold
        run: |
          # Add custom checks here
          echo "Analysis complete"

Streamlit Interface

Launching the App

python3 -m streamlit run app.py

Features

  1. Mode Selection
    • Directory analysis
    • GitHub repository analysis
    • GitHub user analysis
  2. Real-Time Monitoring
    • Live execution logs
    • CPU and RAM usage statistics
    • Progress indicators
  3. Interactive Visualizations
    • Bubble Chart: Category vs Level with size representing frequency
    • Heatmap: File vs Level distribution
    • Treemap: Hierarchical drill-down (Level β†’ Category β†’ Element)
  4. Export Options
    • Download JSON reports
    • Download CSV reports
    • Copy charts to clipboard

Best Practices with Streamlit

  • Performance: Start with small projects to understand output
  • GitHub Rate Limits: Be mindful of API rate limits for user analysis
  • System Resources: Monitor CPU/RAM for large repository analyses
  • Data Export: Always export results for offline analysis

Results Dashboard

Using the Web Dashboard

  1. Open dashboard.html in your browser
  2. Click β€œLoad JSON File” to upload results
  3. Explore visualizations and statistics
  4. Use filters to focus on specific elements
  5. Export filtered data as CSV

Dashboard Features

  • Level Distribution Cards: Quick statistics for each CEFR level
  • Bar Charts: Visual representation of level distribution
  • Element Frequency: Top 10 most common code constructs
  • File Analysis Table: Detailed file-by-file breakdown
  • Search & Filter: Focus on specific files or complexity levels
  • CSV Export: Download filtered results

Interpreting Results

Understanding Level Distribution

High A1/A2 (60%+):

  • βœ… Easy to read and maintain
  • βœ… Good for beginners
  • ⚠️ May underutilize Python features
  • πŸ’‘ Consider using comprehensions, context managers

Balanced (30-40% each tier):

  • βœ… Well-structured code
  • βœ… Appropriate complexity
  • βœ… Good separation of concerns
  • πŸ‘ Ideal for most projects

High C1/C2 (40%+):

  • βœ… Advanced Python usage
  • ⚠️ May be difficult for juniors
  • ⚠️ Potential over-engineering
  • πŸ’‘ Consider simplifying where possible

Code Quality Indicators

Red Flags:

  • Very high C2 percentage (>20%) in utility code
  • Single files with extreme level variation
  • Many B2+ elements in configuration files
  • No B1+ elements in large codebases (underutilization)

Green Flags:

  • Gradual increase in complexity from utilities to core logic
  • Consistent style within modules
  • Appropriate use of advanced features
  • Good balance matching team skill level

Actionable Insights

  1. Refactoring Priorities
    • Files with high C1/C2 concentration
    • Long functions with multiple complexity levels
    • Duplicate complex patterns
  2. Learning Opportunities
    • Codebases at your target skill level
    • Files demonstrating specific patterns well
    • Projects with balanced complexity
  3. Code Review Focus
    • New C2 elements in simple modules
    • Complexity increases in PRs
    • Inconsistencies with project norms

Common Tasks

Task 1: Assess Project Complexity

# Analyze project
python3 pycerfl.py directory ./myproject

# View summary
cat DATA_JSON/summary_data.json

# Calculate complexity score
python3 -c "
import json
with open('DATA_JSON/summary_data.json') as f:
    data = json.load(f)
    levels = data['Levels']
    total = sum(levels.values())
    scores = {'A1': 1, 'A2': 2, 'B1': 3, 'B2': 4, 'C1': 5, 'C2': 6}
    weighted = sum(levels.get(l, 0) * scores[l] for l in scores)
    avg = weighted / total if total > 0 else 0
    print(f'Average complexity: {avg:.2f}')
"

Task 2: Compare Two Projects

# Analyze both projects
python3 pycerfl.py directory ./project1
cp DATA_JSON/summary_data.json project1_summary.json

python3 pycerfl.py directory ./project2
cp DATA_JSON/summary_data.json project2_summary.json

# Compare
python3 -c "
import json
with open('project1_summary.json') as f1, open('project2_summary.json') as f2:
    p1 = json.load(f1)['Levels']
    p2 = json.load(f2)['Levels']
    print('Level  | Project1 | Project2')
    print('-------|----------|----------')
    for level in ['A1', 'A2', 'B1', 'B2', 'C1', 'C2']:
        print(f'{level:6} | {p1.get(level, 0):8} | {p2.get(level, 0):8}')
"

Task 3: Track Changes Over Time

#!/bin/bash
# track_complexity.sh

mkdir -p complexity_history

# Get current complexity
python3 pycerfl.py directory ./src
timestamp=$(date +%Y%m%d_%H%M%S)
cp DATA_JSON/summary_data.json "complexity_history/${timestamp}.json"

# Generate report
python3 -c "
import json
import glob
import os

files = sorted(glob.glob('complexity_history/*.json'))
print('Timestamp          | A1  | A2  | B1 | B2 | C1 | C2')
print('-------------------|-----|-----|----|----|----|----|')
for f in files:
    timestamp = os.path.basename(f).replace('.json', '')
    with open(f) as fp:
        data = json.load(fp)
        levels = data['Levels']
        print(f'{timestamp} | {levels.get(\"A1\", 0):3} | {levels.get(\"A2\", 0):3} | {levels.get(\"B1\", 0):2} | {levels.get(\"B2\", 0):2} | {levels.get(\"C1\", 0):2} | {levels.get(\"C2\", 0):2}')
"

Task 4: Identify Refactoring Candidates

# Find files with high complexity
python3 pycerfl.py directory ./src

python3 -c "
import json
with open('DATA_JSON/total_data.json') as f:
    data = json.load(f)
    for repo, files in data.items():
        print(f'\nRepository: {repo}')
        for filename, stats in files.items():
            if 'Levels' in stats:
                levels = stats['Levels']
                complex_count = levels.get('C1', 0) + levels.get('C2', 0)
                total = sum(levels.values())
                if total > 0 and complex_count / total > 0.3:
                    print(f'  {filename}: {complex_count}/{total} complex elements ({complex_count/total*100:.1f}%)')
"

Troubleshooting

Issue: No results generated

Check:

  1. Are there Python files in the target directory?
  2. Does dicc.txt exist? Run python3 dict.py if not
  3. Check for syntax errors in target files
  4. Verify write permissions

Issue: Incomplete results

Possible causes:

  • Files with syntax errors are skipped
  • Empty or comment-only files produce no results
  • Very simple files may have few detectable elements

Solution: Check console output for errors or warnings

Issue: GitHub API errors

Rate limiting:

  • GitHub API has rate limits
  • Wait a few minutes and retry
  • Consider using a GitHub token for higher limits

Issue: Memory errors

For very large projects:

# Analyze in smaller chunks
for dir in src/*/; do
  python3 pycerfl.py directory "$dir"
  # Process results
done

Best Practices

  1. Regular Analysis: Run periodic analyses to track complexity evolution
  2. Baseline Metrics: Establish complexity baselines for your projects
  3. Team Standards: Use results to define coding standards
  4. Code Reviews: Include complexity metrics in review process
  5. Learning Tool: Study well-designed projects to improve skills
  6. Continuous Improvement: Set goals for complexity reduction
  7. Context Matters: High complexity isn’t always bad - consider the domain
  8. Balance: Aim for appropriate complexity, not minimal complexity

Next Steps

Getting Help