Class: TextRank::RankFilter::CollapseAdjacent
- Inherits:
-
Object
- Object
- TextRank::RankFilter::CollapseAdjacent
- Defined in:
- lib/text_rank/rank_filter/collapse_adjacent.rb
Overview
A rank filter which attempts to collapse one of the highly ranked, single token keywords into a combined keyword when those keywords are adjacent to each other in the original text.
It tries to do this in as intelligent a manner as possible, keeping the single tokens that comprise a combination when one or more of the single tokens occur more often than the combination.
This filter operates on the original (non-filtered) text in order to more intelligently determine true text adjacency versus token adjacency (e.g. two tokens can be adjacent even though they appeared in the original text on separate lines with punctuation in between. However, because it operates on the original text we may fail to find some combinations due to the keyword tokens not exactly matching the original text any more (e.g. if ASCII folding has occurred). The goal is to err on the side of caution: it is better to not suggest a combination than to suggest a bad combination.
= Example
CollapseAdjacent.new(ranks_to_collapse: 6, max_tokens_to_combine: 2).filter!( { "town" => 0.9818754334834477, "cities" => 0.9055017128817066, "siege" => 0.7411519524982207, "arts" => 0.6907977453782612, "envy" => 0.6692709808107252, "blessings" => 0.6442147897516214, "plagues" => 0.5972420789430091, "florish" => 0.3746092797528525, "devoured" => 0.36867321734332237, "anxieties" => 0.3367731719604189, "peace" => 0.2905352582752693, "inhabitants" => 0.12715120116732137, "cares" => 0.0697383057947685, }, original_text: "cities blessings peace arts florish inhabitants devoured envy cares anxieties plagues town siege" ) => { "town siege" => 0.9818754334834477, "cities blessings" => 0.9055017128817066, "arts florish" => 0.6907977453782612, "devoured envy" => 0.6692709808107252, "anxieties plagues" => 0.5972420789430091, "peace" => 0.2905352582752693, "inhabitants" => 0.12715120116732137, "cares" => 0.0697383057947685, "town siege" => 0.2365184450186848, "cities blessings" => 0.21272821337880285, "arts florish" => 0.146247479840506, "devoured envy" => 0.1424776818760168, "anxieties plagues" => 0.12821144722639122, "peace" => 0.07976303576999531, "inhabitants" => 0.03490786580297893, "cares" => 0.019145831086624026, }
Instance Method Summary collapse
-
#filter!(ranks, original_text:, **_) ⇒ Hash<String, Float>
Perform the filter on the ranks.
-
#initialize(**options) ⇒ CollapseAdjacent
constructor
A new instance of CollapseAdjacent.
Constructor Details
#initialize(**options) ⇒ CollapseAdjacent
Returns a new instance of CollapseAdjacent.
66 67 68 |
# File 'lib/text_rank/rank_filter/collapse_adjacent.rb', line 66 def initialize(**) @options = end |
Instance Method Details
#filter!(ranks, original_text:, **_) ⇒ Hash<String, Float>
Perform the filter on the ranks
74 75 76 |
# File 'lib/text_rank/rank_filter/collapse_adjacent.rb', line 74 def filter!(ranks, original_text:, **_) TokenCollapser.new(tokens: ranks, text: original_text, **@options).collapse end |