Date of publication: 2017-08-31 06:39
Compare the character slices with a character library. If performance is not the main concern, try to find the characters within different font libraries, until you can identify the font used. Then stick with that font for character recognition.
Lines detection and removing. This step is required to improve page layout analysis, to achieve better recognition quality for underlined text, to detect tables, etc.(Decided To Complete that part in End)
Recognition of characters. This is the main algorithm of OCR an image of every character must be converted to appropriate character code. Sometimes this algorithm produces several character codes for uncertain images. For instance, recognition of the image of "I" character can produce "I", "|" "6", "l" codes and the final character code will be selected later.
For noise reduction , replace any pixel, that does not have a neighbour (north, east, south or west) with the same color (a similar color, using a tolerance threshold), with the average of the neighbours.
You should use Adaptive treshold instead Otsu method.. I think it will be helpful http:///~shafait/papers/Shafait-efficient-binarization- This method will automatically remove the noise.
Search for vertical white gaps for layout detection. Slice along the vertical gap. For each slice, now search horizontal gaps, and slice. If the slices have the same (a similar) height, you are at line level. Otherwise repeat vertical/horizontal slicing, until you only have lines left. The last step then is again a vertical slicing, giving you the single characters (or ligatures in some cases). Long and narrow or short and wide slices are lines.
Detecting image features like resolution and inversion. So that we can finally convert it to a straightened image for further processing. (completed the code of rotation of Image but not able to detect Image angle about which we have to rotate the Image,So still working on angle detection part)
I am working on a project in which I have to develop OCR Algorithm ( I have to read the text from Image and then convert it to different language ).So my first task is to get text from image.
So I need help in have completed line detection part (get n Images from a paragraph containing n lines) but stuck in next part getting words and character you know good links related to OCR and character recognisation part then please post Here.
Page layout analysis. In this step I am trying to identify the text zones present in the image. So that only that portion is used for recognition and rest of the region is left out.
In the original image, replace each character with the background color, which is determined by interpolating pixels that not are part of the character for each pixel of the character. This gives you the background image , if any.