Monday, February 17, 2014

The Word Based Parsing Technique (part-3)

4.3. Directed Link grammar

Link grammars are superset of projective dependency grammars. Dependency grammar takes into account the relationship direction, while link grammar does not. Graphs generated by link grammar are not guaranteed to be acyclic. It is reported that the English grammar presented with the Link grammar system has cyclic modeling for some grammar cases.

Information extraction needs base phrases boundaries and relationships modeling which are hard to be achieved without directed acyclic graphs. Removing cycles of the link grammar and converting its output to directed graphs were done previously in information extraction and mentioned as an essential need. It was also mentioned that it was not error free in the mentioned context. [62].

Casbeer’s modeling of Arabic preposition, used a link named P for modeling the prepositional phrase attachment. An attachment of prepositional phrase to a word following it, is not supported by his rule-set. A simple demonstration of the rules is shown in Table 4.1.

نشرب : ADV+ & P+
كثيرا :       ADV-
في :       P- & PO+
الصيف:   PO-
Table 4.1: Sample Arabic grammar using Link grammar formalism

It can be seen that the link  P connects the verb to the preposition. By modifying the verb to accept linking on P with a word before (P-), two prepositions might connect to each other in an incomplete sentence. But the need for the parser is to parse, in the first place, which is judging the correctness (which implies completeness in natural languages, [58]), and two prepositional phrases alone do not compose a complete Arabic sentence (except in answers to questions).

In that case there will be a need for a new link to model this type of connection. As link grammar is projective and depends on the positioning information that are implicitly modeled through the order of the presented links [39], the need for additional links will increase when facing the same problem with other constructs.

This study claims that using smaller number of links in modeling natural language syntax is easier for the grammar maintainer and minimizes the possibility of errors of modeling a grammar rule with a link in such a way that might contradict with other linking decisions in the grammar.

One way of achieving the minimum number of links and also modeling dependency direction on the produced graph is by adding a dependency direction marker in the link grammar formalism.

A common implementation for dependency modeling over typed set of connectors is to annotate the connector with either “in” or “out”. A word having an “in” connector, means that the direction of dependency is from the other word to this word (and also that the other word is depending on this word). A word having an “out” marked connector, means that this word is depending on the word that it will connect through this connector. For a link to be established between two words whose rules are defined in directed-link grammar formalism, two connectors should satisfy those restrictions by Sleator & Temperly [39], and they should have different marker (one “in” and the other “out”).

A sample grammar using the modified formalism is presented in Table 4.2.
نشرب : I|ADV+ & I|P+
كثيرا  : O|ADV-
في : O|P- & I|PO+
الصيف: O|PO-
Table 4.2: Sample Arabic grammar using directed-link grammar formalism

It can be seen in table 4.2 that the connectors now preceded by either (I|) or (O|). I or O denotes “in” and “out” respectively. And the vertical bar is a special separator that separates the direction of the connector from its linking specification.

Modifying the sample grammar defined in Table 4.2 is easy to make it support forward prepositional attachment. Table 4.3 illustrates the new modification.


نشرب: I|ADV+ & (I|P+ or I|P-)
كثيرا:  O|ADV-
في   :  (O|P- & I|PO+) or (I|PO+ & O|P+)
الصيف: O|PO-

Table 4.3: Sample Arabic grammar that support forward prepositional phrase attachment

The grammar presented in table 4.3 shows that no new links are used, and this grammar will not accept two prepositional phrases to connect to each other on P link, as P connector is only defined for the preposition to have Out direction. And as the matching rules for our directed-link grammar formalism guarantees that no two words having the same connector type (either “out” or “in”) will be connected on that link for those two connectors, it will keep the number of needed link names to minimum.

The other important point that is needed to be covered, is to prove that the new formalism does not break or contradict with link-grammar formalism and parser. Also, it will be needed to prove that it doesn’t break the context freeness of the Link-Grammar formalism.

If it can be proved that the Directed-Link Grammar is a subset of (i.e. convertible to) Link Grammar, then all of the mentioned requirements are met.

I|X+ LX+
I|X- RX-
O|X+ RX+
O|X- LX-
Table 4.4: proposed conversion from directed-link grammar to link-grammar

Table 4.4 states a proposed conversion from Directed Link Grammar connector specification to Link-Grammar connector specification. As the directed connectors are convertible to be link grammar compatible, the only concern that is left, is that the conversion preserves the needed semantics for which the new formalism applies.

The connector I|X+ connects the current word to some word that follows it. If we consider the stream of words as progressing from left to right, then this word is on the left of the link, and the compatible connector for it is O|X-, which is for a word on the right of the current word. For the link X in this case, the head word is on the left of the link. It can be seen in Table 4.4 that these two connectors have this information encoded as the prefix “L” in the corresponding Link-Grammar compatible connectors. Thus, a connector that expects the head word to be on the left side of the link (will have the prefix L), is not compatible with connectors that expects the head word to be on the right side of the link (will have the prefix R), as both will have different prefixes, regardless of the direction.

As the connector conversion does not produce erroneous connections, and it has been proved that the semantics of the conversion is compatible with the semantics of the Directed-Link Grammar formalism, it can be claimed that Directed Link Grammar is a subset of Link-Grammar.

One additional rule that makes Directed Link Grammar formalism (DLG) a dependency compatible among other dependency formalisms, is that a single word cannot have two “Out” connectors in a single disjunction in its disjunction set (for the disjunction form of its connection requirements).

A preprocessing tool has been developed to convert the DLG grammars to LG. It converts the connectors and checks the rules in order not to have more than one connection marked as “Out” in the same rule.

No comments:

Post a Comment