Computing Actants in Portuguese
For example the following sentence:
A Maria vai tendo razão , e o Pedro nomeou o Luís seu chefe .
conveys two different semantic actions, one related with Maria
having something,
and Pedro
nominating something to something. This can be captured in the
following figure created manually, where the root of each tree captures the gist of the action,
represented by the verb, and the leafs of the each verb represent the actants
(strongly related)
The goal of the algorithm described is to compute this result automatically. The input of the algorithm is the sentence dependency tree. Processing the example sentence with a dependency parser (syntaxnet) yields the following result:
1 A _ DET art||F|S Definite=Def|Gender=Fem|Number=Sing|PronType=Art|fPOS=DET++art||F|S 2 det _ _
2 Maria _ PROPN prop|F|S Gender=Fem|Number=Sing|fPOS=PROPN++prop|F|S 3 nsubj _ _
3 vai _ VERB v-fin|PR|3S|IND Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|fPOS=VERB++v-fin|PR|3S|IND 0 ROOT _ _
4 tendo _ VERB v-ger VerbForm=Ger|fPOS=VERB++v-ger 3 ccomp _ _
5 razão _ NOUN n|F|S Gender=Fem|Number=Sing|fPOS=NOUN++n|F|S 4 dobj _ _
6 , _ PUNCT punc fPOS=PUNCT++punc 3 punct _ _
7 e _ CONJ conj-c|| fPOS=CONJ++conj-c|| 3 cc _ _
8 o _ DET art||M|S Definite=Def|Gender=Masc|Number=Sing|PronType=Art|fPOS=DET++art||M|S 9 det _ _
9 Pedro _ PROPN prop|M|S Gender=Masc|Number=Sing|fPOS=PROPN++prop|M|S 10 nsubj _ _
10 nomeou _ VERB v-fin|PS|3S|IND Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin|fPOS=VERB++v-fin|PS|3S|IND 3 conj _ _
11 o _ DET art||M|S Definite=Def|Gender=Masc|Number=Sing|PronType=Art|fPOS=DET++art||M|S 12 det _ _
12 Luís _ PROPN prop|M|S Gender=Masc|Number=Sing|fPOS=PROPN++prop|M|S 10 dobj _ _
13 seu _ DET pron-det||M|S Gender=Masc|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|fPOS=DET++pron-det||M|S 14 det _ _
14 chefe _ NOUN n|M|S Gender=Masc|Number=Sing|fPOS=NOUN++n|M|S 12 nmod _ _
15 . _ PUNCT punc fPOS=PUNCT++punc 3 punct _ _
This represents a tree of dependencies between the tokens in the original sentence, best visualized using a tree, as illustrated in the following image.
The root of the sentence is the token vai
which is a verb and represents one of the
semantic actions described in the sentence, the other one is captured in the nomeou
token, also a verb.
Given the sentence dependency tree, we start by computing the actants cores,
and then the actants full syntagmas (phrases). The syntagma core is the token in the
actant phrase that strongly represents the actant. For example, in the actant phrase
A Maria
the actant core is Maria
, or in the actant phrase seu chefe
the core
is chefe
.
Step 1: Split
Given that a single sentence can convey more than one action, concept or thought, the first step for computing the actants cores is splitting the original dependency tree in several trees, one for each semantic action. The result of the split for the example sentence is illustrated in the following figure, one tree for each individual action, the root of the tree should be a verb representing the gist of the action. Note that this sentence has 4 tokens tagged as verbs, but only two sub-trees are computed, this is because not all verbs represent an action, or thought by themselves, e.g. auxiliary verbs.
Step 2: Simplify
Once the different actions in the sentence are isolated, a different tree for each one.
The next step is to simplify these individual trees, trying to discard information
that does not help while computing actants cores and can be discarded. For example
auxiliary verbs and complements are collapsed to single nodes, because the action
is being conveyed in the main verb. The following figure illustrates the resulting
trees after the simplification process for our running example, on the left tree
the verbs nodes vai
and tendo
were collapsed to a single node. (Other simplications
may be added in the future.)
Step 3: Compute Actants Cores Tokens
During the last step the goal is to actual compute the actants cores in each individual
tree. To compute the actants cores, a individual score is computed for each token
in each individual tree. This score is either positive or negative, depending on
the type of relation that links each node, and it's POS tag. Dependency relations
that convey strong relations with verbs, e.g. nsubj
, dobj
, have higher positive
scores. Weaker relations like det
, or punctuation (punct
), have lower or negative
scores, i.e. a comma (,
) is never a verb actant. Following the same reasoning
some POS tags (e.g. nouns) are more often used to represent concepts and ideas,
hence are more probable to act as an actant cores, so these tags have positive
high scores. Opposed to determinants and other more common stop-words that usually
convey low to none semantic meanings, these have low positive or negative scores.
Bottom line, positive scores indicate that the token is an actant core, tokens with negative score are not. This approach has the advantage of not needing to know before hand how many actants each specific verb has, and also because this can vary in different sentences. The following figure illustrates the actant cores in each individual tree, i.e. for each individual action. The verb node represents the action itself.
Step 4: Compute Actants Syntagmas
Given the actants scores computed in the previous step, and the different trees computed in Step 1, we can compute the full actant syntagma (a phrase) by following each core sub-tree and concatenating the tokens in-order. Yielding the final resulting actants:
$ actants input.conll
A Maria vai tendo razão , e o Pedro nomeou o Luís seu chefe .
# Actants syntagma cores
Verb: tendo
= Maria
= razão
Verb: nomeou
= Pedro
= Luís
= chefe
# Actants syntagmas
Verb: tendo
= A Maria
= razão
Verb: nomeou
= o Pedro
= o Luís
= seu chefe
Benchmark
Applying this to the available benchmark yields the following results:
# Uma empresa portuense apresenta computadores novos .
Gold:
A1: Uma empresa portuense
A2: computadores novos
Acts:
A1: Uma empresa portuense
A2: computadores novos
# A empresa apresenta os computadores ao público .
Gold:
A1: A empresa
A2: os computadores
A3: ao público
Acts:
A1: A empresa
A2: os computadores
A3: público
# O miúdo precisa de ajuda .
Gold:
A1: O miúdo
A2: de ajuda
Acts:
A1: O miúdo
A2: de ajuda
# A cidade fica na montanha .
Gold:
A1: A cidade
A2: na montanha
Acts:
A1: A cidade
A2: na montanha
# Pousar o livro na mesa .
Gold:
A1: o livro
A2: na mesa
Acts:
A1: o livro
A2: na mesa
# Este documento remonta ao ano 1840 .
Gold:
A1: Este documento
A2: ao ano 1840
Acts:
A1: Este documento
A2: ao ano 1840
# A sessão durou três horas .
Gold:
A1: A sessão
A2: três horas
Acts:
A1: A sessão
A2: três horas
# Ele comporta-se como um homem .
Gold:
A1: Ele
A2: como um homem
Acts:
A1: Ele
A2: como um homem
# Ele nomeou o Pedro chefe .
Gold:
A1: Ele
A2: o Pedro
A3: chefe
Acts:
A1: Ele
A2: o Pedro
A3: chefe
Recall: 100%
Precision: 95%