Class: Scanf::FormatSpecifier

Inherits:
Object
  • Object
show all
Defined in:
lib/scanf.rb

Overview

Technical notes

Rationale behind scanf for Ruby

The impetus for a scanf implementation in Ruby comes chiefly from the fact that existing pattern matching operations, such as Regexp#match and String#scan, return all results as strings, which have to be converted to integers or floats explicitly in cases where what’s ultimately wanted are integer or float values.

Design of scanf for Ruby

scanf for Ruby is essentially a <format string>-to-<regular expression> converter.

When scanf is called, a FormatString object is generated from the format string (“%d%s…”) argument. The FormatString object breaks the format string down into atoms (“%d”, “%5f”, “blah”, etc.), and from each atom it creates a FormatSpecifier object, which it saves.

Each FormatSpecifier has a regular expression fragment and a “handler” associated with it. For example, the regular expression fragment associated with the format “%d” is “([-+]?d+)”, and the handler associated with it is a wrapper around String#to_i. scanf itself calls FormatString#match, passing in the input string. FormatString#match iterates through its FormatSpecifiers; for each one, it matches the corresponding regular expression fragment against the string. If there’s a match, it sends the matched string to the handler associated with the FormatSpecifier.

Thus, to follow up the “%d” example: if “123” occurs in the input string when a FormatSpecifier consisting of “%d” is reached, the “123” will be matched against “([-+]?d+)”, and the matched string will be rendered into an integer by a call to to_i.

The rendered match is then saved to an accumulator array, and the input string is reduced to the post-match substring. Thus the string is “eaten” from the left as the FormatSpecifiers are applied in sequence. (This is done to a duplicate string; the original string is not altered.)

As soon as a regular expression fragment fails to match the string, or when the FormatString object runs out of FormatSpecifiers, scanning stops and results accumulated so far are returned in an array.

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(str) ⇒ FormatSpecifier

Returns a new instance of FormatSpecifier.



332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
# File 'lib/scanf.rb', line 332

def initialize(str)
  @spec_string = str
  h = '[A-Fa-f0-9]'

  @re_string, @handler =
    case @spec_string

      # %[[:...:]]
    when /%\*?(\[\[:[a-z]+:\]\])/
      [ "(#{$1}+)", :extract_plain ]

      # %5[[:...:]]
    when /%\*?(\d+)(\[\[:[a-z]+:\]\])/
      [ "(#{$2}{1,#{$1}})", :extract_plain ]

      # %[...]
    when /%\*?\[([^\]]*)\]/
      yes = $1
      if /^\^/.match(yes) then no = yes[1..-1] else no = '^' + yes end
      [ "([#{yes}]+)(?=[#{no}]|\\z)", :extract_plain ]

      # %5[...]
    when /%\*?(\d+)\[([^\]]*)\]/
      yes = $2
      w = $1
      [ "([#{yes}]{1,#{w}})", :extract_plain ]

      # %i
    when /%\*?i/
      [ "([-+]?(?:(?:0[0-7]+)|(?:0[Xx]#{h}+)|(?:[1-9]\\d*)))", :extract_integer ]

      # %5i
    when /%\*?(\d+)i/
      n = $1.to_i
      s = "("
      if n > 1 then s += "[1-9]\\d{1,#{n-1}}|" end
      if n > 1 then s += "0[0-7]{1,#{n-1}}|" end
      if n > 2 then s += "[-+]0[0-7]{1,#{n-2}}|" end
      if n > 2 then s += "[-+][1-9]\\d{1,#{n-2}}|" end
      if n > 2 then s += "0[Xx]#{h}{1,#{n-2}}|" end
      if n > 3 then s += "[-+]0[Xx]#{h}{1,#{n-3}}|" end
      s += "\\d"
      s += ")"
      [ s, :extract_integer ]

      # %d, %u
    when /%\*?[du]/
      [ '([-+]?\d+)', :extract_decimal ]

      # %5d, %5u
    when /%\*?(\d+)[du]/
      n = $1.to_i
      s = "("
      if n > 1 then s += "[-+]\\d{1,#{n-1}}|" end
      s += "\\d{1,#{$1}})"
      [ s, :extract_decimal ]

      # %x
    when /%\*?[Xx]/
      [ "([-+]?(?:0[Xx])?#{h}+)", :extract_hex ]

      # %5x
    when /%\*?(\d+)[Xx]/
      n = $1.to_i
      s = "("
      if n > 3 then s += "[-+]0[Xx]#{h}{1,#{n-3}}|" end
      if n > 2 then s += "0[Xx]#{h}{1,#{n-2}}|" end
      if n > 1 then s += "[-+]#{h}{1,#{n-1}}|" end
      s += "#{h}{1,#{n}}"
      s += ")"
      [ s, :extract_hex ]

      # %o
    when /%\*?o/
      [ '([-+]?[0-7]+)', :extract_octal ]

      # %5o
    when /%\*?(\d+)o/
      [ "([-+][0-7]{1,#{$1.to_i-1}}|[0-7]{1,#{$1}})", :extract_octal ]

      # %f
    when /%\*?[aefgAEFG]/
      [ '([-+]?(?:0[xX](?:\.\h+|\h+(?:\.\h*)?)[pP][-+]\d+|\d+(?![\d.])|\d*\.\d*(?:[eE][-+]?\d+)?))', :extract_float ]

      # %5f
    when /%\*?(\d+)[aefgAEFG]/
      [ '(?=[-+]?(?:0[xX](?:\.\h+|\h+(?:\.\h*)?)[pP][-+]\d+|\d+(?![\d.])|\d*\.\d*(?:[eE][-+]?\d+)?))' +
        "(\\S{1,#{$1}})", :extract_float ]

      # %5s
    when /%\*?(\d+)s/
      [ "(\\S{1,#{$1}})", :extract_plain ]

      # %s
    when /%\*?s/
      [ '(\S+)', :extract_plain ]

      # %c
    when /\s%\*?c/
      [ "\\s*(.)", :extract_plain ]

      # %c
    when /%\*?c/
      [ "(.)", :extract_plain ]

      # %5c (whitespace issues are handled by the count_*_space? methods)
    when /%\*?(\d+)c/
      [ "(.{1,#{$1}})", :extract_plain ]

      # %%
    when /%%/
      [ '(\s*%)', :nil_proc ]

      # literal characters
    else
      [ "(#{Regexp.escape(@spec_string)})", :nil_proc ]
    end

  @re_string = '\A' + @re_string
end

Instance Attribute Details

#conversionObject (readonly)

Returns the value of attribute conversion



290
291
292
# File 'lib/scanf.rb', line 290

def conversion
  @conversion
end

#matchedObject (readonly)

Returns the value of attribute matched



290
291
292
# File 'lib/scanf.rb', line 290

def matched
  @matched
end

#matched_stringObject (readonly)

Returns the value of attribute matched_string



290
291
292
# File 'lib/scanf.rb', line 290

def matched_string
  @matched_string
end

#re_stringObject (readonly)

Returns the value of attribute re_string



290
291
292
# File 'lib/scanf.rb', line 290

def re_string
  @re_string
end

Instance Method Details

#count_space?Boolean

Returns:

  • (Boolean)


328
329
330
# File 'lib/scanf.rb', line 328

def count_space?
  /(?:\A|\S)%\*?\d*c|%\d*\[/.match(@spec_string)
end

#letterObject



470
471
472
# File 'lib/scanf.rb', line 470

def letter
  @spec_string[/%\*?\d*([a-z\[])/, 1]
end

#match(str) ⇒ Object



457
458
459
460
461
462
463
464
465
466
467
468
# File 'lib/scanf.rb', line 457

def match(str)
  @matched = false
  s = str.dup
  s.sub!(/\A\s+/,'') unless count_space?
  res = to_re.match(s)
  if res
    @conversion = send(@handler, res[1])
    @matched_string = @conversion.to_s
    @matched = true
  end
  res
end

#mid_match?Boolean

Returns:

  • (Boolean)


478
479
480
481
482
483
484
485
# File 'lib/scanf.rb', line 478

def mid_match?
  return false unless @matched
  cc_no_width    = letter == '[' &&! width
  c_or_cc_width  = (letter == 'c' || letter == '[') && width
  width_left     = c_or_cc_width && (matched_string.size < width)

  return width_left || cc_no_width
end

#to_reObject



453
454
455
# File 'lib/scanf.rb', line 453

def to_re
  Regexp.new(@re_string,Regexp::MULTILINE)
end

#to_sObject



324
325
326
# File 'lib/scanf.rb', line 324

def to_s
  @spec_string
end

#widthObject



474
475
476
# File 'lib/scanf.rb', line 474

def width
  @spec_string[/%\*?(\d+)/, 1]&.to_i
end