Computing Actants in Portuguese

For example the following sentence:

A Maria vai tendo razão , e o Pedro nomeou o Luís seu chefe .

conveys two different semantic actions, one related with Maria having something, and Pedro nominating something to something. This can be captured in the following figure created manually, where the root of each tree captures the gist of the action, represented by the verb, and the leafs of the each verb represent the actants (strongly related)

Initial Tree

The goal of the algorithm described is to compute this result automatically. The input of the algorithm is the sentence dependency tree. Processing the example sentence with a dependency parser (syntaxnet) yields the following result:

1 A _ DET art||F|S  Definite=Def|Gender=Fem|Number=Sing|PronType=Art|fPOS=DET++art||F|S 2 det _ _
2 Maria _ PROPN prop|F|S  Gender=Fem|Number=Sing|fPOS=PROPN++prop|F|S 3 nsubj _ _
3 vai _ VERB  v-fin|PR|3S|IND Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin|fPOS=VERB++v-fin|PR|3S|IND  0 ROOT  _ _
4 tendo _ VERB  v-ger VerbForm=Ger|fPOS=VERB++v-ger 3 ccomp _ _
5 razão _ NOUN  n|F|S Gender=Fem|Number=Sing|fPOS=NOUN++n|F|S 4 dobj  _ _
6 , _ PUNCT punc  fPOS=PUNCT++punc  3 punct _ _
7 e _ CONJ  conj-c||  fPOS=CONJ++conj-c|| 3 cc  _ _
8 o _ DET art||M|S  Definite=Def|Gender=Masc|Number=Sing|PronType=Art|fPOS=DET++art||M|S  9 det _ _
9 Pedro _ PROPN prop|M|S  Gender=Masc|Number=Sing|fPOS=PROPN++prop|M|S  10  nsubj _ _
10  nomeou  _ VERB  v-fin|PS|3S|IND Mood=Ind|Number=Sing|Person=3|Tense=Past|VerbForm=Fin|fPOS=VERB++v-fin|PS|3S|IND  3 conj  _ _
11  o _ DET art||M|S  Definite=Def|Gender=Masc|Number=Sing|PronType=Art|fPOS=DET++art||M|S  12  det _ _
12  Luís  _ PROPN prop|M|S  Gender=Masc|Number=Sing|fPOS=PROPN++prop|M|S  10  dobj  _ _
13  seu _ DET pron-det||M|S Gender=Masc|Number=Sing|Number[psor]=Sing|Person=3|Poss=Yes|PronType=Prs|fPOS=DET++pron-det||M|S  14  det _ _
14  chefe _ NOUN  n|M|S Gender=Masc|Number=Sing|fPOS=NOUN++n|M|S  12  nmod  _ _
15  . _ PUNCT punc  fPOS=PUNCT++punc  3 punct _ _

This represents a tree of dependencies between the tokens in the original sentence, best visualized using a tree, as illustrated in the following image.

Parsing Tree

The root of the sentence is the token vai which is a verb and represents one of the semantic actions described in the sentence, the other one is captured in the nomeou token, also a verb.

Given the sentence dependency tree, we start by computing the actants cores, and then the actants full syntagmas (phrases). The syntagma core is the token in the actant phrase that strongly represents the actant. For example, in the actant phrase A Maria the actant core is Maria, or in the actant phrase seu chefe the core is chefe.

Step 1: Split

Given that a single sentence can convey more than one action, concept or thought, the first step for computing the actants cores is splitting the original dependency tree in several trees, one for each semantic action. The result of the split for the example sentence is illustrated in the following figure, one tree for each individual action, the root of the tree should be a verb representing the gist of the action. Note that this sentence has 4 tokens tagged as verbs, but only two sub-trees are computed, this is because not all verbs represent an action, or thought by themselves, e.g. auxiliary verbs.

Deps Trees

Step 2: Simplify

Once the different actions in the sentence are isolated, a different tree for each one. The next step is to simplify these individual trees, trying to discard information that does not help while computing actants cores and can be discarded. For example auxiliary verbs and complements are collapsed to single nodes, because the action is being conveyed in the main verb. The following figure illustrates the resulting trees after the simplification process for our running example, on the left tree the verbs nodes vai and tendo were collapsed to a single node. (Other simplications may be added in the future.)

Simple Trees

Step 3: Compute Actants Cores Tokens

During the last step the goal is to actual compute the actants cores in each individual tree. To compute the actants cores, a individual score is computed for each token in each individual tree. This score is either positive or negative, depending on the type of relation that links each node, and it's POS tag. Dependency relations that convey strong relations with verbs, e.g. nsubj, dobj, have higher positive scores. Weaker relations like det, or punctuation (punct), have lower or negative scores, i.e. a comma (,) is never a verb actant. Following the same reasoning some POS tags (e.g. nouns) are more often used to represent concepts and ideas, hence are more probable to act as an actant cores, so these tags have positive high scores. Opposed to determinants and other more common stop-words that usually convey low to none semantic meanings, these have low positive or negative scores.

Bottom line, positive scores indicate that the token is an actant core, tokens with negative score are not. This approach has the advantage of not needing to know before hand how many actants each specific verb has, and also because this can vary in different sentences. The following figure illustrates the actant cores in each individual tree, i.e. for each individual action. The verb node represents the action itself.

Cores Trees

Step 4: Compute Actants Syntagmas

Given the actants scores computed in the previous step, and the different trees computed in Step 1, we can compute the full actant syntagma (a phrase) by following each core sub-tree and concatenating the tokens in-order. Yielding the final resulting actants:

$ actants input.conll 
A Maria vai tendo razão , e o Pedro nomeou o Luís seu chefe .

# Actants syntagma cores
 Verb: tendo
  = Maria
  = razão
 Verb: nomeou
  = Pedro
  = Luís
  = chefe

# Actants syntagmas
 Verb: tendo
  = A Maria
  = razão
 Verb: nomeou
  = o Pedro
  = o Luís
  = seu chefe

Benchmark

Applying this to the available benchmark yields the following results:

# Uma empresa portuense apresenta computadores novos .
 Gold:
  A1: Uma empresa portuense
  A2: computadores novos
 Acts:
  A1: Uma empresa portuense
  A2: computadores novos
# A empresa apresenta os computadores ao público .
 Gold:
  A1: A empresa
  A2: os computadores
  A3: ao público
 Acts:
  A1: A empresa
  A2: os computadores
  A3: público
# O miúdo precisa de ajuda .
 Gold:
  A1: O miúdo
  A2: de ajuda
 Acts:
  A1: O miúdo
  A2: de ajuda
# A cidade fica na montanha .
 Gold:
  A1: A cidade
  A2: na montanha
 Acts:
  A1: A cidade
  A2: na montanha
# Pousar o livro na mesa .
 Gold:
  A1: o livro
  A2: na mesa
 Acts:
  A1: o livro
  A2: na mesa
# Este documento remonta ao ano 1840 .
 Gold:
  A1: Este documento
  A2: ao ano 1840
 Acts:
  A1: Este documento
  A2: ao ano 1840
# A sessão durou três horas .
 Gold:
  A1: A sessão
  A2: três horas
 Acts:
  A1: A sessão
  A2: três horas
# Ele comporta-se como um homem .
 Gold:
  A1: Ele
  A2: como um homem
 Acts:
  A1: Ele
  A2: como um homem
# Ele nomeou o Pedro chefe .
 Gold:
  A1: Ele
  A2: o Pedro
  A3: chefe
 Acts:
  A1: Ele
  A2: o Pedro
  A3: chefe

Recall: 100%
Precision: 95%