002 : OCR Sudoku Solver

"Leveraging OCR and neural networks to solve Sudoku puzzles from images."
ROGER Robin

Presentation Video

\0/

About the Project

Brief Overview:

The OCR Sudoku Solver is a tool capable of solving any Sudoku puzzle by recognizing and processing an image of the grid. This project allowed me to explore the functionality of neural networks and gain hands-on experience with image processing techniques, including the Hough Transform algorithm.

Objective:

The main goal of the project was to develop a system that recognizes and solves Sudoku puzzles from images.

Here are the instruction of the project:

Features

Core Functionalities:

OCR for Sudoku grid detection.
Neural network-based digit recognition.
Efficient Sudoku-solving algorithm.
Interactive result display and correction system.

Workflow

Step-by-Step Process:

Image Preprocessing: Grayscale conversion, noise removal, and rotation correction.
Grid Detection: Identifying and extracting the Sudoku grid from the image.
Digit Recognition: Neural network-based classification of grid digits.
Sudoku Solver: Algorithm to complete the grid.
Result Display: Overlaying the solution onto the original image.

Image Preprocessing

Image preprocessing is the first crucial step in the Sudoku-solving pipeline. Its purpose is to optimize the input image's quality by addressing issues like noise, color dominance, and rotation to ensure accurate subsequent steps.

1. Contrast Enhancement and Grayscale Conversion

Enhance the contrast by identifying dominant colors and converting them to white, followed by grayscale transformation. This step improves detail visibility and prepares the image for further processing.

Figure 1: Before and After Preprocessing

2. Grayscale Conversion Algorithms

Algorithms Considered:

Weighted Average (Luminance): Adjusts for human eye sensitivity with the formula:
Gray = 0.22 * R + 0.65 * G + 0.11 * B
Chosen Formula (Simplified): For computational efficiency:
Gray = 0.299 * R + 0.587 * G + 0.114 * B
HSL (Hue-Saturation-Luminance): Extracts luminance for improved consistency.

Figure 2: Grayscale Conversion

3. Adaptive Binarization

Converts the image to black and white by analyzing grayscale intensity and dynamically calculating thresholds based on pixel distribution. Adaptive thresholds ensure robustness against varied lighting conditions.

Figure 3: Adaptive Binarization

4. Rotation Correction

Detects and corrects skewed grids for accurate grid alignment:

Manual Rotation: Uses affine transformations to align the grid with the image edges.
Automatic Rotation: Applies edge detection (Canny filter) and Hough Transform to determine the dominant angle for automatic alignment.

Figure 4: Rotation Correction

Grid Detection

Overview:

Grid detection involves identifying and extracting the Sudoku grid from an image using edge detection, histograms, and segmentation techniques. This ensures precise localization of the grid for further processing.

Key Steps:

Edge Detection: - Used Roberts filter for gradient-based edge detection, highlighting horizontal and vertical gradients to identify contours in the image.

Figure 1: Gradient Edge Detection

Combining Gradients: Combined horizontal (Gh) and vertical (Gv) gradients to compute the overall gradient strength: G(x, y) = √(Gh(x, y)² + Gv(x, y)²). The result was thresholded to create a binary image highlighting the grid contours.

Figure 2: Combined Gradient and Contour Detection

Histogram-Based Grid Localization: Used histograms to count black pixels along rows and columns, identifying the grid's top, bottom, left, and right edges.

Figure 3: Histogram-Based Grid Localization

Grid Segmentation: Resized and normalized the grid to 900 pixels for consistency. The grid was divided into 81 individual cells, saved as separate PNG files.

Figure 4: Cropped Grid and Individual Cells

Digit Recognition

Overview:

Neural networks are a fundamental component of this project, enabling accurate recognition of Sudoku digits. Without a properly configured neural network, the system would fail to recognize grid digits correctly, preventing successful puzzle-solving.

Key Steps:

Neural Network Implementation: The project leveraged pre-existing C-based structures for matrices and neural networks, including operations like matrix multiplication, addition, and transposition. This provided a solid foundation for implementing feedforward and backpropagation algorithms.

Figure 1: Matrix and Neural Network Structures

Training for XOR Problem: The XOR network architecture consisted of 2 input neurons, 2 hidden layer neurons, and 1 output neuron. This helped the team understand the fundamentals of learning rates, backpropagation, and hyperparameter tuning.

Figure 2: XOR Neural Network Architecture

Digit Recognition Neural Network: The final neural network was trained to recognize digits (0–9) using a dataset from Kaggle, consisting of 6,000 labeled images. Each image was preprocessed and converted into a C-structured object for training.

Figure 3: Dataset Format and Image Structure

Performance and Results: - Final architecture: 784 input neurons, 1 hidden layer with 200 neurons, and 10 output neurons (one for each digit). - Learning rate: 0.08. - Activation functions: Leaky ReLU and derivative of ReLU. Achieved 96% accuracy on the training dataset and 100% success on Sudoku test images.

Figure 4: Recognition Results on Sudoku Test Images

Result Display

Overview:

The result display integrates image processing, neural network-based digit recognition, and user interaction to overlay the solution onto the original Sudoku image. This robust approach ensures transparency and user control throughout the solving process.

Key Steps:

Interface Overview: The user interface begins with a home page presenting the project, followed by options to explore various sections like Images, Task Breakdown, and Download.

Figure 1: Home and Download Pages

Application Interaction: - Users can choose between "Solve" and "Step by Step" modes. - In "Step by Step," users view each stage of the solving process. - The "Solve" mode provides direct solving functionality.

Figure 2: Application Options and Step-by-Step View

Image Selection and Preprocessing: Users select a Sudoku image via a file browser. The application processes the image, detects contours, segments cells, and converts digits into usable data.

Figure 3: File Browser for Image Selection

Digit Recognition and User Validation: The processed image is fed to the neural network for digit recognition. Results are displayed with confidence levels: - High confidence digits appear in green. - Low confidence digits appear in red, allowing manual correction by the user.

Figure 4: Digit Recognition and Correction

Final Solution: Once all corrections are made, the application overlays the solved Sudoku onto the original image, ensuring an accurate and visually appealing result.

Figure 5: Final Solved Sudoku

Key Technologies

Tools and Technologies Used:

Programming Language: C
Libraries: SDL
Techniques: Optical Character Recognition (OCR), Convolutional Neural Networks (CNN), Sudoku-solving algorithms

Highlight Innovation:

A unique approach in this project was the use of the Hough Transform for grid detection, which enabled precise extraction of the Sudoku grid from images. Additionally, implementing a neural network in C required leveraging efficient data structures and optimizing matrix operations for high performance.

Team Contributions

Team Members:

TRAHANT Thomas: User interface and application logic
BALVAY Evarist: Neural network implementation
MERCIER Eliott: Preprocessing pipeline
LEMEILLEUR Yako: Neural network optimization
ROGER Robin: Grid detection using the Hough Transform

My Role:

My primary contribution was developing the grid detection module. By utilizing the Hough Transform algorithm, I ensured precise identification and extraction of the Sudoku grid, which was crucial for subsequent steps in the project.

Project Report

Download the Project

You can download the executable for this project by clicking the button below:

Download Project