Class: WaybackArchiver::Request

Inherits:
Object
  • Object
show all
Defined in:
lib/wayback_archiver/request.rb

Overview

Make HTTP requests

Defined Under Namespace

Classes: ClientError, Error, GETStruct, HTTPError, InvalidRedirectError, MaxRedirectError, ResponseError, ServerError, UnknownResponseCodeError

Constant Summary collapse

MAX_REDIRECTS =

Max number of redirects before an error is raised

10
REQUEST_ERRORS =

Known request errors

{
  # server
  Timeout::Error => ServerError,
  OpenSSL::SSL::SSLError => ServerError,
  Net::HTTPBadResponse => ServerError,
  Zlib::Error => ServerError,
  # client
  SystemCallError => ClientError,
  SocketError => ClientError,
  IOError => ClientError
}.freeze

Class Method Summary collapse

Class Method Details

.blank?(value) ⇒ Boolean

Return whether a value is blank or not.

Examples:

Returns false for nil.

Request.blank?(nil)

Returns false for empty string.

Request.blank?('')

Returns false for string with only spaces.

Request.blank?('  ')

Parameters:

  • value (Object)

    the value to check if its blank or not.

Returns:

  • (Boolean)

    whether the value is blank or not.



190
191
192
193
194
195
# File 'lib/wayback_archiver/request.rb', line 190

def self.blank?(value)
  return true unless value
  return true if value.strip.empty?

  false
end

.build_redirect_uri(uri, response) ⇒ URI

Builds an URI for a redirect response.

Examples:

Build redirect URI for example.com (lets pretend it will redirect..)

Request.build_redirect_uri('http://example.com', net_http_response)

Parameters:

  • uri (URI)

    that was requested.

  • response (Net::HTTPResponse)

    the server response.

Returns:

  • (URI)

    to redirect to.



142
143
144
145
146
147
148
149
150
151
# File 'lib/wayback_archiver/request.rb', line 142

def self.build_redirect_uri(uri, response)
  location_header = response.header.fetch('location') do
    raise InvalidRedirectError, "No location header found on redirect when requesting #{uri}"
  end

  location = URI.parse(location_header)
  return build_uri(uri) + location_header if location.relative?

  location
end

.build_response(uri, response) ⇒ Response

Builds a Response object.

Examples:

Build Response object for example.com

Request.build_response(uri, net_http_response)

Parameters:

  • uri (URI)

    that was requested.

  • response (Net::HTTPResponse)

    the server response.

Returns:



127
128
129
130
131
132
133
134
# File 'lib/wayback_archiver/request.rb', line 127

def self.build_response(uri, response)
  Response.new(
    response.code,
    response.message,
    parse_body(response.body),
    uri.to_s
  )
end

.build_uri(uri) ⇒ URI

Build URI.

Examples:

Build URI for example.com

Request.build_uri('http://example.com')

Build URI for #<URI::HTTP example.com>

uri = URI.parse('http://example.com')
Request.build_uri(uri)

Parameters:

  • uri (URI, String)

    to build.

Returns:

  • (URI)

    uri to redirect to.



161
162
163
164
165
166
# File 'lib/wayback_archiver/request.rb', line 161

def self.build_uri(uri)
  return uri if uri.is_a?(URI)

  uri = "http://#{uri}" unless uri =~ %r{^https?://}
  URI.parse(uri)
end

.get(uri, max_redirects: MAX_REDIRECTS, raise_on_http_error: false, follow_redirects: true) ⇒ Response

Get reponse.

Examples:

Get example.com

Request.get('example.com')

Get example.com and follow max 3 redirects

Request.get('http://example.com', max_redirects: 3)

Get example.com and don’t follow redirects

Request.get('http://example.com', follow_redirects: false)

Parameters:

  • uri (String, URI)

    to retrieve.

  • max_redirects (Integer) (defaults to: MAX_REDIRECTS)

    max redirects (default: 10).

  • follow_redirects (Boolean) (defaults to: true)

    follow redirects (default: true).

Returns:

  • (Response)

    the http response representation.

Raises:

  • (Error)

    super class of all exceptions that this method can raise

  • (ServerError)

    all server errors

  • (ClientError)

    all client errors

  • (HTTPError)

    all HTTP errors

  • (MaxRedirectError)

    too many redirects, subclass of HTTPError (only raised if raise_on_http_error flag is true)

  • (ResponseError)

    server responsed with a 4xx or 5xx HTTP status code, subclass of HTTPError (only raised if raise_on_http_error flag is true)

  • (UnknownResponseCodeError)

    server responded with an unknown HTTP status code, subclass of HTTPError (only raised if raise_on_http_error flag is true)

  • (InvalidRedirectError)

    server responded with an invalid redirect, subclass of HTTPError (only raised if raise_on_http_error flag is true)



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
# File 'lib/wayback_archiver/request.rb', line 68

def self.get(
  uri,
  max_redirects: MAX_REDIRECTS,
  raise_on_http_error: false,
  follow_redirects: true
)
  uri = build_uri(uri)

  redirect_count = 0
  until redirect_count > max_redirects
    WaybackArchiver.logger.debug "Requesting #{uri}"

    http = Net::HTTP.new(uri.host, uri.port)
    if uri.scheme == 'https'
      http.use_ssl = true
      http.verify_mode = OpenSSL::SSL::VERIFY_NONE
    end

    request = Net::HTTP::Get.new(uri.request_uri)
    request['User-Agent'] = WaybackArchiver.user_agent

    result = perform_request(uri, http, request)
    response = result.response
    error = result.error

    raise error if error

    code = response.code
    WaybackArchiver.logger.debug "[#{code}, #{response.message}] Requested #{uri}"

    case HTTPCode.type(code)
    when :success
      return build_response(uri, response)
    when :redirect
      return build_response(uri, response) unless follow_redirects

      uri = build_redirect_uri(uri, response)
      redirect_count += 1
      next
    when :error
      if raise_on_http_error
        raise ResponseError, "Failed with response code: #{code} when requesting #{uri}"
      end

      return build_response(uri, response)
    else
      raise UnknownResponseCodeError, "Unknown HTTP response code #{code} when requesting #{uri}"
    end
  end

  raise MaxRedirectError, "Redirected too many times when requesting #{uri}"
end

.parse_body(response_body) ⇒ String

Parse response body, handles reqular and gzipped response bodies.

Examples:

Return response body for response.

Request.parse_body(uri, net_http_response)

Parameters:

  • response_body (String)

    the server response body.

Returns:

  • (String)

    the response body.



173
174
175
176
177
178
179
# File 'lib/wayback_archiver/request.rb', line 173

def self.parse_body(response_body)
  return '' unless response_body

  Zlib::GzipReader.new(StringIO.new(response_body)).read
rescue Zlib::GzipFile::Error => _e
  response_body
end