Monday, February 17, 2014

The Word Based Parsing Technique (part-1)

In this series the process of creating “word based parser” is presented with the developed algorithms. The next section presents an overview of the Link-Parser structure, followed by the dictionaries, “rule preparation”, “word minimization”, “unknown word” handling, and finally the modifying of the formalism to include dependency information.

4.1. Introduction

The Link-Parsing system depends on 3 components to do its function. The first is a dictionary containing all words of the language classified into PoS groups. The second is the grammar rules, and the third is an implementation for the parsing algorithm. Figure 4.1 depicts the relation between the three components.

Figure 4.1: Link Parsing overview

Every group of PoS and/or word(s) group is assigned linking requirements. A single un-diacritized word may have several PoS tags (and thus, several linking requirements). The Parser tries to find all possible contexts through which the linking requirements are met for every word, resulting in a set of linkages (sentence contexts).

The resulting linkages contain all possible permutations for possible linking requirements for every word in valid contexts. As of this writing, the algorithm do not group contexts where the difference between two contexts is just the word group (PoS tag); thus, the number of linkages may overflow 32-bit integer.
In the following sections, the dictionary creation, and rule creation processes are explained, and new formalism, after that, is presented.

No comments:

Post a Comment