002 : OCR Sudoku Solver

"Leveraging OCR and neural networks to solve Sudoku puzzles from images."

ROGER Robin

Presentation Video

\0/

About the Project

Brief Overview:

The OCR Sudoku Solver is a tool capable of solving any Sudoku puzzle by recognizing and processing an image of the grid. This project allowed me to explore the functionality of neural networks and gain hands-on experience with image processing techniques, including the Hough Transform algorithm.

Objective:

The main goal of the project was to develop a system that recognizes and solves Sudoku puzzles from images.

Here are the instruction of the project:

Features

 Core Functionalities:

  • OCR for Sudoku grid detection.
  • Neural network-based digit recognition.
  • Efficient Sudoku-solving algorithm.
  • Interactive result display and correction system.

Workflow

 Step-by-Step Process:

  • Image Preprocessing: Grayscale conversion, noise removal, and rotation correction.
  • Grid Detection: Identifying and extracting the Sudoku grid from the image.
  • Digit Recognition: Neural network-based classification of grid digits.
  • Sudoku Solver: Algorithm to complete the grid.
  • Result Display: Overlaying the solution onto the original image.

Image Preprocessing

Image preprocessing is the first crucial step in the Sudoku-solving pipeline. Its purpose is to optimize the input image's quality by addressing issues like noise, color dominance, and rotation to ensure accurate subsequent steps.

1. Contrast Enhancement and Grayscale Conversion

Enhance the contrast by identifying dominant colors and converting them to white, followed by grayscale transformation. This step improves detail visibility and prepares the image for further processing.

Original Image Preprocessed Image

Figure 1: Before and After Preprocessing

2. Grayscale Conversion Algorithms

Algorithms Considered:

  • Weighted Average (Luminance): Adjusts for human eye sensitivity with the formula:
    Gray = 0.22 * R + 0.65 * G + 0.11 * B
  • Chosen Formula (Simplified): For computational efficiency:
    Gray = 0.299 * R + 0.587 * G + 0.114 * B
  • HSL (Hue-Saturation-Luminance): Extracts luminance for improved consistency.
Original Image Grayscale Image

Figure 2: Grayscale Conversion

3. Adaptive Binarization

Converts the image to black and white by analyzing grayscale intensity and dynamically calculating thresholds based on pixel distribution. Adaptive thresholds ensure robustness against varied lighting conditions.

Original Image Black and White Image

Figure 3: Adaptive Binarization

4. Rotation Correction

Detects and corrects skewed grids for accurate grid alignment:

  • Manual Rotation: Uses affine transformations to align the grid with the image edges.
  • Automatic Rotation: Applies edge detection (Canny filter) and Hough Transform to determine the dominant angle for automatic alignment.
Not Rotated Image Rotated Image

Figure 4: Rotation Correction

Grid Detection

Overview:

Grid detection involves identifying and extracting the Sudoku grid from an image using edge detection, histograms, and segmentation techniques. This ensures precise localization of the grid for further processing.

 Key Steps:

  • Edge Detection: - Used Roberts filter for gradient-based edge detection, highlighting horizontal and vertical gradients to identify contours in the image.
  • Gradient Example Horizontal and Vertical Gradients

    Figure 1: Gradient Edge Detection

  • Combining Gradients: Combined horizontal (Gh) and vertical (Gv) gradients to compute the overall gradient strength: G(x, y) = √(Gh(x, y)² + Gv(x, y)²). The result was thresholded to create a binary image highlighting the grid contours.
  • Thresholded Gradient Contours Highlighted

    Figure 2: Combined Gradient and Contour Detection

  • Histogram-Based Grid Localization: Used histograms to count black pixels along rows and columns, identifying the grid's top, bottom, left, and right edges.
  • Grid Detection via Histogram

    Figure 3: Histogram-Based Grid Localization

  • Grid Segmentation: Resized and normalized the grid to 900 pixels for consistency. The grid was divided into 81 individual cells, saved as separate PNG files.
  • Grid Cropping Cells Binary Grid

    Figure 4: Cropped Grid and Individual Cells

Digit Recognition

Overview:

Neural networks are a fundamental component of this project, enabling accurate recognition of Sudoku digits. Without a properly configured neural network, the system would fail to recognize grid digits correctly, preventing successful puzzle-solving.

 Key Steps:

  • Neural Network Implementation: The project leveraged pre-existing C-based structures for matrices and neural networks, including operations like matrix multiplication, addition, and transposition. This provided a solid foundation for implementing feedforward and backpropagation algorithms.
  • Matrix and Neural Network Structures

    Figure 1: Matrix and Neural Network Structures

  • Training for XOR Problem: The XOR network architecture consisted of 2 input neurons, 2 hidden layer neurons, and 1 output neuron. This helped the team understand the fundamentals of learning rates, backpropagation, and hyperparameter tuning.
  • XOR Neural Network Architecture

    Figure 2: XOR Neural Network Architecture

  • Digit Recognition Neural Network: The final neural network was trained to recognize digits (0–9) using a dataset from Kaggle, consisting of 6,000 labeled images. Each image was preprocessed and converted into a C-structured object for training.
  • Dataset Format Example Image Structure in C

    Figure 3: Dataset Format and Image Structure

  • Performance and Results: - Final architecture: 784 input neurons, 1 hidden layer with 200 neurons, and 10 output neurons (one for each digit). - Learning rate: 0.08. - Activation functions: Leaky ReLU and derivative of ReLU. Achieved 96% accuracy on the training dataset and 100% success on Sudoku test images.
  • Sudoku Recognition Results

    Figure 4: Recognition Results on Sudoku Test Images

Result Display

Overview:

The result display integrates image processing, neural network-based digit recognition, and user interaction to overlay the solution onto the original Sudoku image. This robust approach ensures transparency and user control throughout the solving process.

 Key Steps:

  • Interface Overview: The user interface begins with a home page presenting the project, followed by options to explore various sections like Images, Task Breakdown, and Download.
  • Home Page Download Page

    Figure 1: Home and Download Pages

  • Application Interaction: - Users can choose between "Solve" and "Step by Step" modes. - In "Step by Step," users view each stage of the solving process. - The "Solve" mode provides direct solving functionality.
  • Options in Application Step by Step View

    Figure 2: Application Options and Step-by-Step View

  • Image Selection and Preprocessing: Users select a Sudoku image via a file browser. The application processes the image, detects contours, segments cells, and converts digits into usable data.
  • File Browser

    Figure 3: File Browser for Image Selection

  • Digit Recognition and User Validation: The processed image is fed to the neural network for digit recognition. Results are displayed with confidence levels: - High confidence digits appear in green. - Low confidence digits appear in red, allowing manual correction by the user.
  • Recognition and Correction

    Figure 4: Digit Recognition and Correction

  • Final Solution: Once all corrections are made, the application overlays the solved Sudoku onto the original image, ensuring an accurate and visually appealing result.
  • Final Solved Sudoku

    Figure 5: Final Solved Sudoku

Key Technologies

Tools and Technologies Used:

  • Programming Language: C
  • Libraries: SDL
  • Techniques: Optical Character Recognition (OCR), Convolutional Neural Networks (CNN), Sudoku-solving algorithms

Highlight Innovation:

A unique approach in this project was the use of the Hough Transform for grid detection, which enabled precise extraction of the Sudoku grid from images. Additionally, implementing a neural network in C required leveraging efficient data structures and optimizing matrix operations for high performance.

Team Contributions

Team Members:

  • TRAHANT Thomas: User interface and application logic
  • BALVAY Evarist: Neural network implementation
  • MERCIER Eliott: Preprocessing pipeline
  • LEMEILLEUR Yako: Neural network optimization
  • ROGER Robin: Grid detection using the Hough Transform

My Role:

My primary contribution was developing the grid detection module. By utilizing the Hough Transform algorithm, I ensured precise identification and extraction of the Sudoku grid, which was crucial for subsequent steps in the project.

Project Report

Download the Project

You can download the executable for this project by clicking the button below:

Download Project