Computing Actants in Portuguese

For example the following sentence:

A Maria vai tendo razão , e o Pedro nomeou o Luís seu chefe .

conveys two different semantic actions, one related with Maria having something, and Pedro nominating something to something. This can be captured in the following figure created manually, where the root of each tree captures the gist of the action, represented by the verb, and the leafs of the each verb represent the actants (strongly related)

Initial Tree

The goal of the algorithm described is to compute this result automatically. The input of the algorithm is the sentence dependency tree. Processing the example sentence with a dependency parser (syntaxnet) yields the following result:

1 A _ DET art||F|S  Definite=Def|Gender=Fem|Number=Sing|PronType=Art|fPOS=DET++art||F|S 2 det _ _
2 Maria _ PROPN prop|F|S  Gender=Fem|Number=Sing|fPOS=PROPN++prop|F|S 3 nsubj _ _
3 vai _ VERB  v-fin|PR|3S|IND Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|fPOS=VERB++v-fin|PR|3S|IND  0 ROOT  _ _
4 tendo _ VERB  v-ger VerbForm=Ger|fPOS=VERB++v-ger 3 ccomp _ _
5 razão _ NOUN  n|F|S Gender=Fem|Number=Sing|fPOS=NOUN++n|F|S 4 dobj  _ _
6 , _ PUNCT punc  fPOS=PUNCT++punc  3 punct _ _
7 e _ CONJ  conj-c||  fPOS=CONJ++conj-c|| 3 cc  _ _
8 o _ DET art||M|S  Definite=Def|Gender=Masc|Number=Sing|PronType=Art|fPOS=DET++art||M|S  9 det _ _
9 Pedro _ PROPN prop|M|S  Gender=Masc|Number=Sing|fPOS=PROPN++prop|M|S  10  nsubj _ _
10  nomeou  _ VERB  v-fin|PS|3S|IND Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin|fPOS=VERB++v-fin|PS|3S|IND  3 conj  _ _
11  o _ DET art||M|S  Definite=Def|Gender=Masc|Number=Sing|PronType=Art|fPOS=DET++art||M|S  12  det _ _
12  Luís  _ PROPN prop|M|S  Gender=Masc|Number=Sing|fPOS=PROPN++prop|M|S  10  dobj  _ _
13  seu _ DET pron-det||M|S Gender=Masc|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|fPOS=DET++pron-det||M|S  14  det _ _
14  chefe _ NOUN  n|M|S Gender=Masc|Number=Sing|fPOS=NOUN++n|M|S  12  nmod  _ _
15  . _ PUNCT punc  fPOS=PUNCT++punc  3 punct _ _

This represents a tree of dependencies between the tokens in the original sentence, best visualized using a tree, as illustrated in the following image.

Parsing Tree

The root of the sentence is the token vai which is a verb and represents one of the semantic actions described in the sentence, the other one is captured in the nomeou token, also a verb.

Given the sentence dependency tree, we start by computing the actants cores, and then the actants full syntagmas (phrases). The syntagma core is the token in the actant phrase that strongly represents the actant. For example, in the actant phrase A Maria the actant core is Maria, or in the actant phrase seu chefe the core is chefe.

Step 1: Split

Given that a single sentence can convey more than one action, concept or thought, the first step for computing the actants cores is splitting the original dependency tree in several trees, one for each semantic action. The result of the split for the example sentence is illustrated in the following figure, one tree for each individual action, the root of the tree should be a verb representing the gist of the action. Note that this sentence has 4 tokens tagged as verbs, but only two sub-trees are computed, this is because not all verbs represent an action, or thought by themselves, e.g. auxiliary verbs.

Deps Trees

Step 2: Simplify

Once the different actions in the sentence are isolated, a different tree for each one. The next step is to simplify these individual trees, trying to discard information that does not help while computing actants cores and can be discarded. For example auxiliary verbs and complements are collapsed to single nodes, because the action is being conveyed in the main verb. The following figure illustrates the resulting trees after the simplification process for our running example, on the left tree the verbs nodes vai and tendo were collapsed to a single node. (Other simplications may be added in the future.)

Simple Trees

Step 3: Compute Actants Cores Tokens

During the last step the goal is to actual compute the actants cores in each individual tree. To compute the actants cores, a individual score is computed for each token in each individual tree. This score is either positive or negative, depending on the type of relation that links each node, and it's POS tag. Dependency relations that convey strong relations with verbs, e.g. nsubj, dobj, have higher positive scores. Weaker relations like det, or punctuation (punct), have lower or negative scores, i.e. a comma (,) is never a verb actant. Following the same reasoning some POS tags (e.g. nouns) are more often used to represent concepts and ideas, hence are more probable to act as an actant cores, so these tags have positive high scores. Opposed to determinants and other more common stop-words that usually convey low to none semantic meanings, these have low positive or negative scores.

Bottom line, positive scores indicate that the token is an actant core, tokens with negative score are not. This approach has the advantage of not needing to know before hand how many actants each specific verb has, and also because this can vary in different sentences. The following figure illustrates the actant cores in each individual tree, i.e. for each individual action. The verb node represents the action itself.

Cores Trees

Step 4: Compute Actants Syntagmas

Given the actants scores computed in the previous step, and the different trees computed in Step 1, we can compute the full actant syntagma (a phrase) by following each core sub-tree and concatenating the tokens in-order. Yielding the final resulting actants:

$ actants input.conll 
A Maria vai tendo razão , e o Pedro nomeou o Luís seu chefe .

# Actants syntagma cores
 Verb: tendo
  = Maria
  = razão
 Verb: nomeou
  = Pedro
  = Luís
  = chefe

# Actants syntagmas
 Verb: tendo
  = A Maria
  = razão
 Verb: nomeou
  = o Pedro
  = o Luís
  = seu chefe


Applying this to the available benchmark yields the following results:

# Uma empresa portuense apresenta computadores novos .
  A1: Uma empresa portuense
  A2: computadores novos
  A1: Uma empresa portuense
  A2: computadores novos
# A empresa apresenta os computadores ao público .
  A1: A empresa
  A2: os computadores
  A3: ao público
  A1: A empresa
  A2: os computadores
  A3: público
# O miúdo precisa de ajuda .
  A1: O miúdo
  A2: de ajuda
  A1: O miúdo
  A2: de ajuda
# A cidade fica na montanha .
  A1: A cidade
  A2: na montanha
  A1: A cidade
  A2: na montanha
# Pousar o livro na mesa .
  A1: o livro
  A2: na mesa
  A1: o livro
  A2: na mesa
# Este documento remonta ao ano 1840 .
  A1: Este documento
  A2: ao ano 1840
  A1: Este documento
  A2: ao ano 1840
# A sessão durou três horas .
  A1: A sessão
  A2: três horas
  A1: A sessão
  A2: três horas
# Ele comporta-se como um homem .
  A1: Ele
  A2: como um homem
  A1: Ele
  A2: como um homem
# Ele nomeou o Pedro chefe .
  A1: Ele
  A2: o Pedro
  A3: chefe
  A1: Ele
  A2: o Pedro
  A3: chefe

Recall: 100%
Precision: 95%