Monday, February 17, 2014

The Word Based Parsing Technique (part-4)

4.4. Arabic rules modeling

Most of the attempts to parse Arabic sentence used the word segmentation approach. Previous attempts (such as those mentioned in chapter 3) addressed the problem by cutting the analysis process into several analysis sub-processes with each solving the disambiguation problem on its level.

The attempt being presented now, addresses the problem by putting all needed information to disambiguate the PoS tags of the words and syntactic-relationships, in a compatible form to be used by the Link-Parser.

While the word-segmentation approach simplifies the analysis process, by having less number of information per processed unit, the word based approach requires that all the information regarding the word components and its morphosyntactic rules be provided for the parser. The role of the parser in this case serves as morpho-syntactic parser and PoS tagger.

As a starting point, all rules presented by Casbeer et al [42] are checked, and well studied. Casbeer added a quite noticeable amount of rules for morphological modeling.

As the Arabic language is highly inflectional and the Arabic writing system concatenates some of the words that work as functional units in the Arabic grammar, such as co-ordination, and prepositions, and with the new formalism introduced in the previous section, the author needed to write the rules from scratch.

In the developed rules, a considerable part of Arabic grammar is modeled (co-ordination, prepositional phrases, quantifiers, nominal sentences, verbal sentences, verbal negation, relative clauses, nominal compound clauses, adjective phrases, adverbs, adverbial clauses, tense markers, aspectual verbs, and the most important problems solved by techniques mentioned in this chapter, the multi-function words for nouns, verbs, prepositions, adverbs, … etc). The developed grammar contains 211 rules written in DLG.

No comments:

Post a Comment