Automated Code Quality Analysis Using Machine Learning
An innovative computer vision system that analyzes programming code screenshots to automatically detect common coding errors and provide educational feedback. The system combines OCR preprocessing with trained ML models to identify syntax errors, style violations, and logical mistakes in code images.
Solution: Fine-tuned Tesseract with code-specific character sets and implemented custom post-processing to fix common OCR errors in programming syntax.
Solution: Built a normalization layer that standardizes different indentation styles and formatting conventions before analysis.
Solution: Created a synthetic data generator that produces code screenshots with programmatically inserted errors, augmented with real student submissions.