Some Developer: NLP - Arabic parsing overview (5/6)

2.3.5. Slot Grammar (SG)

2.3.5.1. Overview

SG formalism provides a facility for describing natural languages in a lisp-like structure. A slot grammar depends on its lexicon as a source for deriving syntactic analysis and predicate argument structure. Every index word may have its needed slots that are needed to be satisfied for proper relation analysis for these words. An index word entry (in the lexicon) may be a single word or a multi-word phrase. A slot definition in the lexicon associated with an index word is composed of the slot name and its disjunctive features. Features can be thought of positions in a bit vector, each of which models a different aspect of the lexical unit. Some features can be defined to be reference to a set of other features. An index word in the lexicon may have an ordered set of complement slots. Slots defined for an index word works as the possible grammatical relations for this word and also as arguments for its word sense predicates. [34]

SG parsing algorithm is a bottom up chart parser that starts from a word entry of the lexicon and combining other words by integrating slots with the start slot. Other slot modifies the current slot resulting in a higher slot. The relation between the current slot and the higher slot is defined either in lexicon, or in the syntax rules. [35]

SG rules are command based. It can be thought of being a list of instructions for matching and modifying the phrases being processed. SG rules also do unifications on the features of the incorporated nodes. Each phrase node has its associated features that are obtained from its underlying words after modifying its associated features if there are instructions in its syntax definition that alter its associated features. The modification of the associated features can also affect the words. For example, if the word can be a plural and a singular noun and its slot is associated with another verb slot that has the singular feature, then the instructions in the combining slot may remove the plural feature out of the noun word. [35]

2.3.5.2. Application on Arabic language

McCord & Cavalli-Sforza [34] developed an Arabic slot grammar that utilized the slot grammar formalism [35] [36]. It used BAMA [37] as its morphological analyzer. Features of BAMA were the default features exist in its lexicon. The lexicon also had specific vocalized verb stems that are obtained from PATB and sorted by its frequency. BAMA is used when the word does not exist in the lexicon. It was not reported how the parsing system disambiguate the possible solutions obtained from BAMA, apart from the fallback that happens when a selected solution is not satisfied in the syntax rules. This also brings up the possibility of producing an analysis for a sentence that might not be the correct one, as it uses the ATB stems in the given order. [36]

The formalism permits semantic modeling through features on the lexical level (which can be part of processing for syntax rules). It is not reported that this grammar contained semantic modeling to a noticeable level. Rough testing done on ATB3 produced results above 70% of complete parses, without checking the produced PoS of the words and without checking the correctness of the derivation. [36]

2.3.6. Lexical Functional Grammar (LFG)

2.3.6.1. Overview

LFG is a context sensitive approach for modeling syntax. It is more powerful than context-free grammar, but on efficiency cost. Lexical Functional Grammar belongs to a sort of grammars called Unification grammars that incorporate unification constraints. Contrary to context-free grammars, rules for lexical units state the conditions within which the lexical unit can exist (i.e. context sensitive). LFG adds features or attributes on lexical units and enforces constraints on these structures. [1]

Features of LFG can be atomic, set of possible values (underspecified), a feature structure, such as feature structure for embedded clause, or pointer to another value of the feature structure. The lexicon is the key source for the features possible to every lexical unit. LFG also involves more general constraints on these functional structures, which are completeness and coherence. While completeness guarantees that all needed arguments are satisfied, coherence guarantees that the argument targets adhere to the argument constraints. Same as slot grammars, a unification process combines different feature structures into unified ones. Unification works on combining similarities of child phrases into a more general phrase. An LFG parse yields two types of constructs, C-structure and F-Structure. While C-structure represents the constituents structure for a sentence (much like a CFG), F-structure represents the unified features of the sentence (dependency structure). [1]

2.3.6.2. Application on Arabic

Attia [38] developed an Arabic parser that is based on LFG. The syntax analyzer part of the parser takes as input a stream of tokens. A token can be a single (delimited) lexical unit representing a Multi-word expression, a word, or a clitic (some other types exist). It is based on the same checking paradigm that is used in McCord Slot Grammar. Two sample sentences passed to the web-interface for testing, which are:

Sentence #2.1:
البنت التي لم تقرأ الكتاب نجحت في الإمتحان

Sentence #2.2:
ولم يذكر مزيدا من المعلومات أو الإيضاحات عن القرار

In the Figures 2.3, 2.4, 2.5 and 2.6, it can be seen that the analysis produced from the system for the previous two sentences show the robustness of the developed grammar, which could produce analysis for the part that is not identified, which is labeled “Fragments” in C-Structure and “REST” in F-structure.

Figure 2.3: C-Sutrcture of sentence 2.1

Figure 2.4: C-Structure of sentence 2.2

Figure 2.5: F-Structure of sentence 2.1

Figure 2.6: F-Structure of sentence 2.2

Pages

Monday, February 17, 2014

NLP - Arabic parsing overview (5/6)

No comments:

Post a Comment