ocr

Optical Character Recognition With Tesseract OCR On Ubuntu 7.04

| | | | | | | |

This guide describes how to set up Tesseract OCR on Ubuntu 7.04. OCR means "Optical Character Recognition". The resulting system will be able to convert images with embedded text to text files. Tesseract is licensed under the Apache License v2.0.

Fight Image Spam With FuzzyOCR And SpamAssassin On Debian/Ubuntu

| | | | | | | | |

This tutorial describes how to scan emails for image spam with FuzzyOCR. FuzzyOCR is a plugin for SpamAssassin which is aimed at unsolicited bulk mail containing images as the main content carrier. Using different methods, it analyzes the content and properties of images to distinguish between normal mails (ham) and spam mails. FuzzyOCR tries to keep the system load low by scanning only mails that have not already been categorized as spam by SpamAssassin, thus avoiding unnecessary work.

Syndicate content