Monday, October 27, 2014

Triplet Extraction from Sentence [Implementation]

Many sentiment analysis tasks require extraction of sentence triplets, ie. Subject - Verb - Object, from a sentence. While there are many approaches to the problem, I recently stumbled upon a fairly easy-to-implement algorithm in a research paper (http://ailab.ijs.si/dunja/SiKDD2007/Papers/Rusu_Trippels.pdf).

The algorithm

function TRIPLET-EXTRACTION(sentence) returns a solution, or failure
 result ← EXTRACT-SUBJECT(NP_subtree) ∪ EXTRACT-PREDICATE(VP_subtree) ∪ EXTRACT-OBJECT(VP_siblings)
 if result ≠ failure then return result
 else return failure

function EXTRACT-ATTRIBUTES(word) returns a solution, or failure
 // search among the word’s siblings
 if adjective(word)
  result ← all RB siblings
 else
  if noun(word)
   result ← all DT, PRP$, POS, JJ, CD, ADJP, QP, NP siblings
  else
   if verb(word)
    result ← all ADVP siblings
    
 // search among the word’s uncles
 if noun(word) or adjective(word)
  if uncle = PP
   result ← uncle subtree
  else
   if verb(word) and (uncle = verb)
    result ← uncle subtree
 if result ≠ failure then return result
 else return failure

function EXTRACT-SUBJECT(NP_subtree) returns a solution, or failure
 subject ← first noun found in NP_subtree
 subjectAttributes ← EXTRACT-ATTRIBUTES(subject)
 result ← subject ∪ subjectAttributes
 if result ≠ failure then return result
 else return failure
 
function EXTRACT-PREDICATE(VP_subtree) returns solution, or failure
 predicate ← deepest verb found in VP_subtree
 predicateAttributes ← EXTRACT-ATTRIBUTES(predicate)
 result ← predicate ∪ predicateAttributes
 if result ≠ failure then return result
 else return failure
 
function EXTRACT-OBJECT(VP_sbtree) returns a solution, or failure
 siblings ← find NP, PP and ADJP siblings of VP_subtree
 for each value in siblings do
  if value = NP or PP
   object ← first noun in value
  else
   object ← first adjective in value
 objectAttributes ← EXTRACT-ATTRIBUTES(object)
 result ← object ∪ objectAttributes
 if result ≠ failure then return result
 else return failure


Implementation


The above algorithm works on the parsed tree generated by parser such as "Stanford Parser", "OpenNLP Parser". I was using the "Stanford Parser" and the parsed tree generated by the parser was supplied to my Triplet extractor for the result. For my work, I needed the Sentence Triplets along with its sentiment supportive attributes(not all). So, my implementation ignores the extraction of attributes from the "word's uncles", mentioned in the algorithm.

I have implemented the algorithm in java. You can find my work at this link: (https://github.com/SushantKafle/TripletExtraction).

0 comments:

Post a Comment