API Reference
Table of Contents
Command Line Interface
pycerfl.py
Main entry point for the pycefrl analyzer.
Syntax
python3 pycerfl.py <mode> <target>
Modes
directory
Analyze Python files in a local directory.
python3 pycerfl.py directory <path>
Arguments:
path: Path to the directory containing Python files (absolute or relative)
Examples:
python3 pycerfl.py directory .
python3 pycerfl.py directory /home/user/projects/myapp
python3 pycerfl.py directory ../another-project
repo
Analyze a GitHub repository.
python3 pycerfl.py repo <github_url>
Arguments:
github_url: Full clone URL of the GitHub repository
Examples:
python3 pycerfl.py repo https://github.com/django/django
python3 pycerfl.py repo https://github.com/requests/requests
Note: The repository will be cloned to a temporary location for analysis.
user
Analyze all public repositories of a GitHub user.
python3 pycerfl.py user <username>
Arguments:
username: GitHub username
Examples:
python3 pycerfl.py user guido
python3 pycerfl.py user kennethreitz
Note: This will analyze all publicly accessible repositories for the user.
Output Files
The analyzer generates the following files:
| File | Description |
|---|---|
data.json |
Complete analysis data in JSON format |
data.csv |
Complete analysis data in CSV format |
DATA_JSON/summary_data.json |
Aggregated level statistics |
DATA_JSON/total_data.json |
File-level breakdown |
DATA_JSON/<repo_name>.json |
Individual repository summaries |
DATA_CSV/<file_name>.csv |
Individual file analyses |
Core Modules
levels.py
Contains the level assignment logic and CEFR classification system.
Key Constants
# CEFR Levels
LEVELS = ['A1', 'A2', 'B1', 'B2', 'C1', 'C2']
# Code element categories
Literals = ['ast.List', 'ast.Tuple', 'ast.Dict']
Variables = ['ast.Name']
Expressions = ['ast.Call', 'ast.IfExp', 'ast.Attribute']
Comprehensions = ['ast.ListComp', 'ast.GeneratorExp', 'ast.DictComp']
Statements = ['ast.Assign', 'ast.AugAssign', 'ast.Raise', 'ast.Assert', 'ast.Pass']
Imports = ['ast.Import', 'ast.ImportFrom']
ControlFlow = ['ast.If', 'ast.For', 'ast.While', 'ast.Break', 'ast.Continue', 'ast.Try', 'ast.With']
FunctionsClass = ['ast.FunctionDef', 'ast.Lambda', 'ast.Return', 'ast.Yield', 'ast.ClassDef']
Functions
asignar_Nivel(node, dicc)
Assigns a CEFR level to an AST node based on the level dictionary.
Parameters:
node: AST node objectdicc: Dictionary mapping code patterns to CEFR levels
Returns:
str: CEFR level (e.g., ‘A1’, ‘B1’, ‘C2’)
Example:
import ast
from levels import asignar_Nivel
code = "x = [1, 2, 3]"
tree = ast.parse(code)
level = asignar_Nivel(tree.body[0], level_dict)
# Returns: 'A1' for simple list
ClassIterTree.py
Provides tree iteration utilities for traversing Abstract Syntax Trees (AST).
Class: IterTree
Iterator for traversing Python AST nodes.
Methods
__init__(self, tree, dicc)
Initialize the tree iterator.
Parameters:
tree: AST tree object fromast.parse()dicc: Level dictionary for node classification
__iter__(self)
Returns iterator object.
Returns:
self: Iterator instance
__next__(self)
Advances to the next node in the tree.
Returns:
dict: Node information containing:class: Element typelevel: CEFR levellineno: Start line numberend_lineno: End line numbercol_offset: Column offset (displacement)
Raises:
StopIteration: When no more nodes to iterate
Example:
import ast
from ClassIterTree import IterTree
code = """
def greet(name):
print(f"Hello, {name}")
"""
tree = ast.parse(code)
iterator = IterTree(tree, level_dict)
for node_info in iterator:
print(f"Found {node_info['class']} at line {node_info['lineno']}, level {node_info['level']}")
getjson.py
Handles JSON output generation and data formatting.
Functions
read_Json(option, filename_Dir, list_Results)
Generates JSON output files from analysis results.
Parameters:
option: Source identifier (repository/directory name)filename_Dir: File name being analyzedlist_Results: List of analysis results for the file
Outputs:
- Creates/updates
data.jsonwith all results - Creates/updates
DATA_JSON/summary_data.jsonwith aggregated statistics - Creates/updates
DATA_JSON/total_data.jsonwith file-level data - Creates individual repository JSON files in
DATA_JSON/
Data Structure:
{
"Repository": "project-name",
"File Name": "main.py",
"Class": "Simple Function",
"Start Line": 5,
"End Line": 10,
"Displacement": 4,
"Level": "B1"
}
getcsv.py
Handles CSV output generation.
Functions
read_FileCsv(option, filename_Dir, list_Results)
Generates CSV output files from analysis results.
Parameters:
option: Source identifier (repository/directory name)filename_Dir: File name being analyzedlist_Results: List of analysis results for the file
Outputs:
- Creates/updates
data.csvwith all results - Creates individual file CSV files in
DATA_CSV/
CSV Format:
Repository,File Name,Class,Start Line,End Line,Displacement,Level
project-name,main.py,Simple Function,5,10,4,B1
dict.py
Dictionary generation utility.
Usage
python3 dict.py
Purpose:
- Reads
configuration.cfgto get level assignments - Generates
dicc.txtmapping code patterns to CEFR levels - Must be run before first analysis and after configuration changes
Output:
dicc.txt: Text file with level mappings
Data Structures
Analysis Result Dictionary
Each analyzed code element produces a dictionary with these fields:
{
'class': str, # Type of code element
'level': str, # CEFR level (A1-C2)
'lineno': int, # Starting line number
'end_lineno': int, # Ending line number
'col_offset': int # Column offset (indentation)
}
Summary Data Format
summary_data.json contains aggregated statistics:
{
"Levels": {
"A1": 450,
"A2": 380,
"B1": 120,
"B2": 45,
"C1": 28,
"C2": 12
},
"Class": {
"Simple List": 81,
"Simple Function": 42,
"Simple Class": 15
}
}
Total Data Format
total_data.json provides file-level breakdowns:
{
"repository-name": {
"file1.py": {
"Levels": {
"A1": 25,
"A2": 15,
"B1": 8
}
},
"file2.py": {
"Levels": {
"A1": 30,
"B1": 12
}
}
}
}
Configuration
configuration.cfg
INI-style configuration file for customizing level assignments.
Format
[LEVEL_NAME]
Element_Name = ast.NodeType
Example
[A1]
Simple List = ast.List
Simple Tuple = ast.Tuple
Simple Assignment = ast.Assign
[A2]
Simple For Loop = ast.For
Simple If statements = ast.If
Import = ast.Import
[B1]
Function = ast.FunctionDef
Simple Class = ast.ClassDef
[B2]
Function with decorators = ast.FunctionDef
Class inheritance = ast.ClassDef
[C1]
List comprehension = ast.ListComp
Generator = ast.GeneratorExp
[C2]
Nested comprehension = ast.ListComp
Metaclass = ast.ClassDef
Sections
[A1]: Basic level - fundamental Python structures[A2]: Elementary level - basic control flow and I/O[B1]: Intermediate level - functions, classes, exceptions[B2]: Upper intermediate - advanced OOP, decorators[C1]: Advanced - comprehensions, generators, advanced patterns[C2]: Proficient - complex nested structures, metaclasses
AST Node Types
pycefrl recognizes the following Python AST node types:
Literals
ast.List- List literalsast.Tuple- Tuple literalsast.Dict- Dictionary literalsast.Set- Set literals
Variables & Attributes
ast.Name- Variable namesast.Attribute- Attribute access (e.g.,obj.attr)
Expressions
ast.Call- Function/method callsast.IfExp- Ternary expressionsast.BinOp- Binary operationsast.UnaryOp- Unary operationsast.Compare- Comparisons
Statements
ast.Assign- Assignmentsast.AugAssign- Augmented assignments (+=, -=, etc.)ast.Raise- Raise exceptionsast.Assert- Assertionsast.Pass- Pass statements
Imports
ast.Import- Import statementsast.ImportFrom- From…import statements
Control Flow
ast.If- If statementsast.For- For loopsast.While- While loopsast.Break- Break statementsast.Continue- Continue statementsast.Try- Try/except blocksast.With- Context managers
Functions & Classes
ast.FunctionDef- Function definitionsast.Lambda- Lambda expressionsast.Return- Return statementsast.Yield- Yield expressionsast.ClassDef- Class definitions
Comprehensions
ast.ListComp- List comprehensionsast.DictComp- Dictionary comprehensionsast.SetComp- Set comprehensionsast.GeneratorExp- Generator expressions
Integration Examples
Using as a Python Module
import ast
from ClassIterTree import IterTree
from levels import asignar_Nivel
import json
# Load level dictionary
with open('dicc.txt', 'r') as f:
level_dict = json.load(f)
# Parse Python code
code = """
def factorial(n):
if n <= 1:
return 1
return n * factorial(n-1)
"""
tree = ast.parse(code)
# Analyze
results = []
iterator = IterTree(tree, level_dict)
for node_info in iterator:
results.append(node_info)
# Display results
for result in results:
print(f"{result['class']:30} | Level {result['level']} | Lines {result['lineno']}-{result['end_lineno']}")
Custom Analysis Script
import os
import ast
from ClassIterTree import IterTree
def analyze_directory(path, level_dict):
"""Analyze all Python files in a directory"""
results = {}
for root, dirs, files in os.walk(path):
for file in files:
if file.endswith('.py'):
filepath = os.path.join(root, file)
with open(filepath, 'r') as f:
try:
tree = ast.parse(f.read())
iterator = IterTree(tree, level_dict)
results[file] = list(iterator)
except SyntaxError:
print(f"Syntax error in {filepath}")
return results
Error Handling
Common Errors
| Error | Cause | Solution |
|---|---|---|
FileNotFoundError: dicc.txt |
Level dictionary not generated | Run python3 dict.py |
SyntaxError |
Invalid Python syntax in target file | Fix syntax errors in target code |
ModuleNotFoundError |
Missing dependencies | Run pip3 install -r requirements.txt |
KeyError in level assignment |
Node type not in configuration | Add missing node type to configuration.cfg |
Performance Considerations
- Large repositories: Analysis time is proportional to code size
- GitHub API: Rate limiting applies to user/repository queries
- Memory usage: Large projects may require significant memory for AST parsing
- File I/O: Multiple output files are generated; ensure sufficient disk space
Next Steps
- Quick Start Guide - Get started quickly
- User Guide - Comprehensive usage guide
- Examples - Practical examples
- Contributing - Contribute to pycefrl