Monday, October 27, 2014

Triplet Extraction from Sentence [Implementation]

Many sentiment analysis tasks require extraction of sentence triplets, ie. Subject - Verb - Object, from a sentence. While there are many approaches to the problem, I recently stumbled upon a fairly easy-to-implement algorithm in a research paper (http://ailab.ijs.si/dunja/SiKDD2007/Papers/Rusu_Trippels.pdf).

The algorithm

function TRIPLET-EXTRACTION(sentence) returns a solution, or failure
 result ← EXTRACT-SUBJECT(NP_subtree) ∪ EXTRACT-PREDICATE(VP_subtree) ∪ EXTRACT-OBJECT(VP_siblings)
 if result ≠ failure then return result
 else return failure

function EXTRACT-ATTRIBUTES(word) returns a solution, or failure
 // search among the word’s siblings
 if adjective(word)
  result ← all RB siblings
 else
  if noun(word)
   result ← all DT, PRP$, POS, JJ, CD, ADJP, QP, NP siblings
  else
   if verb(word)
    result ← all ADVP siblings
    
 // search among the word’s uncles
 if noun(word) or adjective(word)
  if uncle = PP
   result ← uncle subtree
  else
   if verb(word) and (uncle = verb)
    result ← uncle subtree
 if result ≠ failure then return result
 else return failure

function EXTRACT-SUBJECT(NP_subtree) returns a solution, or failure
 subject ← first noun found in NP_subtree
 subjectAttributes ← EXTRACT-ATTRIBUTES(subject)
 result ← subject ∪ subjectAttributes
 if result ≠ failure then return result
 else return failure
 
function EXTRACT-PREDICATE(VP_subtree) returns solution, or failure
 predicate ← deepest verb found in VP_subtree
 predicateAttributes ← EXTRACT-ATTRIBUTES(predicate)
 result ← predicate ∪ predicateAttributes
 if result ≠ failure then return result
 else return failure
 
function EXTRACT-OBJECT(VP_sbtree) returns a solution, or failure
 siblings ← find NP, PP and ADJP siblings of VP_subtree
 for each value in siblings do
  if value = NP or PP
   object ← first noun in value
  else
   object ← first adjective in value
 objectAttributes ← EXTRACT-ATTRIBUTES(object)
 result ← object ∪ objectAttributes
 if result ≠ failure then return result
 else return failure


Implementation


The above algorithm works on the parsed tree generated by parser such as "Stanford Parser", "OpenNLP Parser". I was using the "Stanford Parser" and the parsed tree generated by the parser was supplied to my Triplet extractor for the result. For my work, I needed the Sentence Triplets along with its sentiment supportive attributes(not all). So, my implementation ignores the extraction of attributes from the "word's uncles", mentioned in the algorithm.

I have implemented the algorithm in java. You can find my work at this link: (https://github.com/SushantKafle/TripletExtraction).

2 comments:

  1. Great breakdown of triplet extraction from sentences your explanation makes a complex NLP concept much easier to grasp. Posts like this are really helpful for anyone diving deeper into language processing and AI. For readers also looking to grow their digital and creative skill set alongside topics like this, you might check out a Digital Marketing Training Institute in Coimbatore or a UI UX Design Course in Coimbatore. Thanks for the insightful content!

    ReplyDelete
  2. Nice article on triplet extraction! If you’re also interested in learning about digital marketing basics, you might find this helpful guide: What is Digital Marketing. Thanks for sharing!

    ReplyDelete