Computer Vision News - November 2019

3 Summary Tesseract OCR with Python 29 Another way to remove noise from the image is by blurring. Image blurring is usually achieved by convolving the image with a low-pass filter kernel. I use the median blurring since, from my experience, it works the best. I'll manipulate the image a bit more by resizing it into a smaller size. The interpolation, which is performed to decrease the size of the image, fills the tiny holes in the digits and makes the recognition task easier. These two manipulations are performed in the next two lines of code. Now all we have to do is to run our new code and boom! We got the correct results: It is worth mentioning that although this image is with a low noise level, the above tricks will help you recognize much noisier images. Conclusion Tesseract is an easy touse tool for accurateOCR. It is able to recognize bounding boxes for characters on an image and to extract them using a pre-trained LSTM network. In cases where the text is aligned and properly located in the image, Tesseract is doing very good job. For natural images in most cases the tricks we have seen above can help us to get better results, however, in some cases, it is unavoidable to train our own model. "In cases where the text is aligned and properly located in the image, Tesseract is doing very good job."

RkJQdWJsaXNoZXIy NTc3NzU=