Many sentiment analysis tasks require extraction of sentence triplets, ie. Subject - Verb - Object, from a sentence. While there are many approaches to the problem, I recently stumbled upon a fairly easy-to-implement algorithm in a research paper (http://ailab.ijs.si/dunja/SiKDD2007/Papers/Rusu_Trippels.pdf).
The algorithm
function TRIPLET-EXTRACTION(sentence) returns a solution, or failure result ← EXTRACT-SUBJECT(NP_subtree) ∪ EXTRACT-PREDICATE(VP_subtree) ∪ EXTRACT-OBJECT(VP_siblings) if result ≠ failure then return result else return failure function EXTRACT-ATTRIBUTES(word) returns a solution, or failure // search among the word’s siblings if adjective(word) result ← all RB siblings else if noun(word) result ← all DT, PRP$, POS, JJ, CD, ADJP, QP, NP siblings else if verb(word) result ← all ADVP siblings // search among the word’s uncles if noun(word) or adjective(word) if uncle = PP result ← uncle subtree else if verb(word) and (uncle = verb) result ← uncle subtree if result ≠ failure then return result else return failure function EXTRACT-SUBJECT(NP_subtree) returns a solution, or failure subject ← first noun found in NP_subtree subjectAttributes ← EXTRACT-ATTRIBUTES(subject) result ← subject ∪ subjectAttributes if result ≠ failure then return result else return failure function EXTRACT-PREDICATE(VP_subtree) returns solution, or failure predicate ← deepest verb found in VP_subtree predicateAttributes ← EXTRACT-ATTRIBUTES(predicate) result ← predicate ∪ predicateAttributes if result ≠ failure then return result else return failure function EXTRACT-OBJECT(VP_sbtree) returns a solution, or failure siblings ← find NP, PP and ADJP siblings of VP_subtree for each value in siblings do if value = NP or PP object ← first noun in value else object ← first adjective in value objectAttributes ← EXTRACT-ATTRIBUTES(object) result ← object ∪ objectAttributes if result ≠ failure then return result else return failure
Implementation
The above algorithm works on the parsed tree generated by parser such as "Stanford Parser", "OpenNLP Parser". I was using the "Stanford Parser" and the parsed tree generated by the parser was supplied to my Triplet extractor for the result. For my work, I needed the Sentence Triplets along with its sentiment supportive attributes(not all). So, my implementation ignores the extraction of attributes from the "word's uncles", mentioned in the algorithm.
I have implemented the algorithm in java. You can find my work at this link: (https://github.com/SushantKafle/TripletExtraction).
0 comments:
Post a Comment