Class: WaybackArchiver::Archive
- Inherits:
-
Object
- Object
- WaybackArchiver::Archive
- Defined in:
- lib/wayback_archiver/archive.rb
Overview
Post URL(s) to Wayback Machine
Class Method Summary collapse
-
.crawl(source, hosts: [], concurrency: WaybackArchiver.concurrency, limit: WaybackArchiver.max_limit) {|archive_result| ... } ⇒ Array<ArchiveResult>
Send URLs to Wayback Machine by crawling the site.
-
.post(urls, concurrency: WaybackArchiver.concurrency, limit: WaybackArchiver.max_limit) {|archive_result| ... } ⇒ Array<ArchiveResult>
Send URLs to Wayback Machine.
-
.post_url(url) ⇒ ArchiveResult
Send URL to Wayback Machine.
Class Method Details
.crawl(source, hosts: [], concurrency: WaybackArchiver.concurrency, limit: WaybackArchiver.max_limit) {|archive_result| ... } ⇒ Array<ArchiveResult>
Send URLs to Wayback Machine by crawling the site.
78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
# File 'lib/wayback_archiver/archive.rb', line 78 def self.crawl(source, hosts: [], concurrency: WaybackArchiver.concurrency, limit: WaybackArchiver.max_limit) WaybackArchiver.logger.info "Request are sent with up to #{concurrency} parallel threads" posted_urls = Concurrent::Array.new pool = ThreadPool.build(concurrency) found_urls = URLCollector.crawl(source, hosts: hosts, limit: limit) do |url| pool.post do result = post_url(url) yield(result) if block_given? posted_urls << result unless result.errored? end end WaybackArchiver.logger.info "Crawling of #{source} finished, found #{found_urls.length} URL(s)" pool.shutdown pool.wait_for_termination WaybackArchiver.logger.info "#{posted_urls.length} URL(s) posted to Wayback Machine" posted_urls end |
.post(urls, concurrency: WaybackArchiver.concurrency, limit: WaybackArchiver.max_limit) {|archive_result| ... } ⇒ Array<ArchiveResult>
Send URLs to Wayback Machine.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/wayback_archiver/archive.rb', line 26 def self.post(urls, concurrency: WaybackArchiver.concurrency, limit: WaybackArchiver.max_limit) WaybackArchiver.logger.info "Total URLs to be sent: #{urls.length}" WaybackArchiver.logger.info "Request are sent with up to #{concurrency} parallel threads" urls_queue = if limit == -1 urls else urls[0...limit] end posted_urls = Concurrent::Array.new pool = ThreadPool.build(concurrency) urls_queue.each do |url| pool.post do result = post_url(url) yield(result) if block_given? posted_urls << result unless result.errored? end end pool.shutdown pool.wait_for_termination WaybackArchiver.logger.info "#{posted_urls.length} URL(s) posted to Wayback Machine" posted_urls end |
.post_url(url) ⇒ ArchiveResult
Send URL to Wayback Machine.
104 105 106 |
# File 'lib/wayback_archiver/archive.rb', line 104 def self.post_url(url) WaybackArchiver.adapter.call(url) end |