23-01-2010, 08:09 PM
Optical character reading has been a topic of researchfor several decades. However, human competingperformance is still a distant reality. One ofthe primary reasons of human beings' superior performanceis our ability to invoke varying knowledgestores which are relevant to the given situation andintegrate them to arrive at meaningful and consistentinterpretation. In this paper, we identify theknowledge sources and discuss about their role inDevnagari script recognition.An optical character recognition system segmentstext zone into text lines, text lines into words, andwords into characters. These characters are thenrecognized.At each stage, there is a possibility of ambiguity.Ambiguities can be resolved using varying knowledgesources at di_erent levels. Many of these knowledgesources are independent of the speci_c document underconsideration. For example, script compositionrules, word dictionary and syntax-semantics of naturallanguage. On the other hand, character shapes,font, layout etc are information speci_c to the documentand can be obtained through training. Thedomain knowledge also forms part of context. Heterogeneousknowledge sources are integrated with thehelp of blackboard architecture.