Navigating the Parse Tree
Fundamentally, the StateMachine
presents an interface for iterating the
syntax tree as if it were merely a linear sequence of segments. Purely
syntactical values like LoopVal
s and TableVal
s are hidden from the
programmer. While the interface only exposes segments and elements, the
information hidden by StateMachine
provides an efficient means to search
the parse tree for specific segments.
For information on how to construct a parse tree programmatically, see the
document on Generating X12. The StateMachine
can be
accessed via the BuilderDsl#machine
method. For information about how
to construct a parse tree from an input stream, see Parsing X12.
Iterating Segments
Current Segment
When you want to access the current SegmentVal
, use the #segment
method. It returns an AbstractCursor
, which is a read-only pointer to the
current segment within the parse tree, wrapped by Either
. When the parse
tree is empty, a failure will be returned.
# Success
machine.segment
#=> Either.success(#<Zipper::AbstractCursor ...>)
# Failure
machine.segment
#=> Either.failure("not a segment")
You may be wondering why the SegmentVal
is wrapped by two wrappers. The
first, Either
, provides a manner to distinguish error return values from
normal return values. It is more sophisticated and less error-prone than using
conventions like returning nil
on failure, because it supports chaining,
detailed error information can be returned, and the risk of neglecting to test
for an error is mitigated.
This Either
value wraps an AbstractCursor
, which points to the
SegmentVal
via its #node
method. Because SegmentVal
does not
have information about its parent or siblings (it only is aware of its
#children
), returning only a SegmentVal
does not always provide enough
information. The AbstractCursor
, on the other hand, allows access to any
related AbstractVal
node.
# Success
machine.segment.map(&:node)
#=> Either.success(SegmentVal[IEA](...))
machine.segment.map(&:parent).map(&:node)
#=> Either.success(InterchangeVal[00501](...))
# Failure
machine.segment.map(&:node)
#=> Either.failure("not a segment")
For another example, the current segment identifier can be accessed like any
other method of SegmentVal
.
# When machine.segment fails, nothing is printed
machine.segment.tap do |s|
puts "Hello, #{s.node.id}"
end
Going Forward
The #next
method returns a StateMachine
positioned at the segment
immediately following the current segment. Optionally, you may specify how many
segments to advance, which defaults to 1
. You can check if the current segment
is the last segment using the #last?
method.
# Success
machine.last?
#=> false
machine.next
#=> Either.success(StateMachine[1](SegmentVal[GS](...)))
# Success
machine.next(3)
#=> Either.success(StateMachine[1](SegmentVal[BHT](...)))
# Failure
machine.last?
#=> true
machine.next
#=> Either.failure("cannot move to next after last segment")
Going Backward
The #prev
method returns a StateMachine
positioned at the segment
immediately preceeding the current segment. Optionally, you may specify how many
segments to rewind, which defaults to 1
. You can check if the current segment
is the first segment using the #first?
method.
# Success
machine.first?
#=> false
machine.prev
#=> Either.success(StateMachine[1](SegmentVal[GS](...)))
# Success
machine.prev(2)
#=> Either.success(StateMachine[1](SegmentVal[ISA](...)))
# Failure
machine.first?
#=> true
machine.prev
#=> Either.failure("cannot move to prev before first segment")
Accessing Elements
Elements can be accessed using the #element
method, which accepts up to
three numeric arguments and requires at least one. Like #segment
, the
return value is a AbstractCursor
wrapped by an Either
.
Simple Elements
To access the first element of the current segment, call #element(1)
. Notice
elements are counted starting at 1
, not 0
. Beware that #element
will
raise an ArgumentError
if you attempt to access the fifth element of a segment
whose declaration only indicates four elements, for instance.
# Success
machine.element(1).map(&:node)
#=> Either.success(Nn.value[ I16: Number of Included Functional Groups](1))
# Error
machine.element(3).map(&:node)
#=> IEA has only 2 elements (ArgumentError)
Composite Elements
If the first element of the current segment is a CompositeElementVal
, calling
#element(1)
will return the entire composite value. To access a specific
component, call #element(1, n)
which will return the nth SimpleElementVal
.
Beware that #element
will raise an ArgumentError
if you attempt, for
instance, to access the third component of a composite whose declaration only
indicates two components.
# Success
machine.element(5).map(&:node)
#=> Either.success(CompositeElementVal[C023: HEALTH CARE SERVICE LOCATION INFORMATION](...))
# Success
machine.element(5, 1).map(&:node)
#=> Either.success(AN.value[E1331: Place of Service Code](11))
# Error
machine.element(5, 4).map(&:node)
#=> CLM05 has only 3 components (ArgumentError)
Repeated Elements
When the first element of the current segment is a RepeatedElementVal
, calling
#element(1)
will return the entire sequence of element vals. To access a
specific occurrence of the element, call #element(1, n)
which will return the
nth occurrence if it exists. If the element is a repeating composite element,
an optional third argument can be given to select a specific component. Beware
that #element
will raise an ArgumentError
if, for instance, you try to
access the sixth occurrence when the element definition declares the element can
occur a maximum of five times.
Taking Bigger Steps
First Segment
You can position the StateMachine
at the first segment in the parse tree
by calling #first
. When there are no segments in the parse tree, this
method returns a failure. This will typically position the StateMachine
at the first ISA
segment.
machine.first.map(&:first?)
#=> Either.success(true)
Last Segment
Likewise, the #last
method will position the StateMachine
at the
first segment in the parse tree if there is one. When the parse tree is empty,
a failure is returned instead. This will typically position the StateMachine
at the last IEA
segment.
machine.last.map(&:last?)
#=> Either.success(true)
Searching for Segments
The #find
method performs an efficient context-sensitive search based
on the current position. Being context-sensitive places restrictions on which
segments are reachable from the current state, unlike iterating segments one
at a time. These restrictions prevent complicated problems, like mistakenly
finding an NM1
segment from the next interchange because there were no more
NM1
segments in the current interchange.
The next matching segment is returned, or a failure if no segments matched.
Searching for a segment that, according to the definition tree, cannot exist or
is not reachable will cause #find
to raise an exception. To be clear,
#find
only searches forward for segments, not backward.
Sibling Segments
Segments connected directly by a horizontal dashed black line are siblings and
are reachable using #find
. For instance, from the third NM1
, the N3
,
N4
, and REF
segments are reachable.
# From 2010AA NM1 right one segment to N3
machine.find(:N3)
#=> Either.success(StateMachine[1](N3(...)))
# Right two segments to N4
machine.find(:N4)
#=> Either.success(StateMachine[1](N4(...)))
# Right three segments to REF
machine.find(:REF)
#=> Either.success(StateMachine[1](REF(...)))
Likewise, N4
and REF
are reachable from N3
; however, the third NM1
is
not reachable from N3
because it preceeds N3
.
Uncle Segments
Segments that occur as siblings of an ancestor node are uncles (remember that
#find
only proceeds forward). Common uncle segments are GE
and IEA
,
which are analogous to "closing tags" for envelope structures. Other examples
include the IK5
and AK9
segments from the 999 Functional Acknowledgement
transaction set.
# From BHT ascend twice and move left to GE
machine.find(:GE)
# From PRV ascend four times and move left to IEA
machine.find(:IEA)
Uncles are relatively rare in X12 transaction sets, because most child structures, like loops, are not wrapped by segments at both ends. For instance there are no segments in Loop 2000A that follow the child loops.
Nephew Segments
Segments that occur as the first direct child of a sibling node are nephews.
The siblings that follow the first child are not directly reachable, but they
can be reached indirectly by chaining two calls to
#find
. For example, GS
is a nephew of ISA
, and ST
has two nephews
named NM1
; but BHT
is not a nephew of GS
because it is not the first
child of its parent node.
# From ST move left twice and down to NM1
machine.find(:NM1)
# From 2000A PRV move left and down to NM1
machine.find(:NM1)
# From 2300 CLM move left three times and down to NM1
machine.find(:NM1)
The first-child restriction prevents potential problems of ambiguity with
certain grammars. Consider the possibility that Table 1 could be defined to
have a 1100 PER
loop in addition to its two NM1
loops. From the ST
segment, without the restriction, there would be two possibly reachable PER
segments. One is the sibling of 1000A NM1
, and the other is the 1100 PER
that we made up. Because #find
returns the first matching segment, it
would return 1000A PER
if it occurred, otherwise it would return the 1100
PER
. The caller would have to check which one it was -- the necessity of this
test is not apparent without studying the grammar. Changing the grammar to add
Loop 1100 would break existing code. The first-child restriction solves these
problems.
Cousin Segments
Segments that occurr as the first child of a sibling of the parent node are
cousins of the current segment. Similar to the restriction on nephew segments,
siblings that follow the first child are not directly reachable. For example,
the second NM1
is a cousin of the first NM1
and PER
segments, the 2000AB
NM1
is a cousin of all segments in Loop 2010AA, and each HL
is a cousin of
all segments in Table 1.
# From 1000A PER to 2000A HL
machine.find(:HL)
# From 2000BA NM1 to 2000BB NM1
machine.find(:NM1)
You may have noticed, in some cases there are more than one cousin with the same
segment identifier -- there are three cousins of BHT
named HL
, for instance.
See Element Constraints for information on how to find a
specific occurrence of HL
segment based on its qualifier elements, or
Chaining Method Calls for details on iterating each
HL
segment, one-at-a-time.
Parent Segments
Internal knowledge of the underlying tree structure makes it possible to
rewind to the first segment of a parent structure, using the #parent
method. The parent may be the first segment of a loop, table, functional group,
or interchange, but never a transaction set, because they do not parent segments
directly.
# From PRV up two nodes, left one node, and down to ST
machine.parent
# From PER left one node to NM1
machine.parent
Traversing to the parent segment, unlike #find
, always traverses
backwards in the sequence of segments. The #parent
method can only
rewind to the segment defined by the grammar, so it will always find the same
segment from a given starting position.
Element Constraints
Finding the next segment by identifier alone is often not specific enough. For
example, there are several different NM1
segments that each have a different
meaning -- one is a provider, another is the insurance company, another is the
patient, etc. In the case of NM1
, the first element is a qualifier that
indicates the meaning. To find NM1*QC
"Patient Name", you can iterate the
reachable NM1
segments and stop when you find the occurrence whose first
element equals 41
or when there are no more NM1
occurrences,
# Find the NM1 occurrence with "41" in the first element
position = Either.success(machine)
searching = true
while searching and position.defined?
# Move position to the next NM1 segment
position = position.flatmap{|m| m.find(:NM1) }
# Check the constraint
position.tap do |nm1|
nm1.element(1).tap do |element|
# Stop the while loop if we found the match
searching = element.node != "QC"
end
end
end
# Success
searching
#=> false
position
#=> Either.success(StateMachine[1](SegmentVal[NM1](...)))
# Failure
searching
#=> true
position
#=> Either.failure("NM1 segment does not occur")
There is a much simpler and less error-prone way, however. The #find
method accepts a variable number of filter arguments. For instance, the above
filter can be accomplished by calling,
machine.find(:NM1, "QC")
Multiple constraints can be specified and #blank
or nil
should be used to
indicate a wildcard, if needed. For instance to find the NM1*PR
"Payer Name"
that has a certain organization name in element NM103
,
machine.find(:NM1, "PR", nil, "MEDICARE")
Syntactic Constraints
In addition to improving readability, #find
checks the validity of your
element constraints and raises an exception when you ask for something that is
guaranteed never to occur in a valid parse tree. For instance, if you called
#find(:NM1, "XX")
from a position where there are no reachable NM1
segments,
machine.find(:NM1, "XX")
#=> NM1 segment cannot be reached from the current state (ArgumentError)
Or from a position where every reachable NM1
segment is defined such that
"XX"
is not allowed,
machine.find(:NM1, "XX")
#=> "XX" is not allowed in NM101 (ArgumentError)
You can get a list of potentially reachable segments from the current position
by calling #successors
, which returns one InstructionTable
per
active state. That is, when the machine is in a deterministic state, a single
InstructionTable
will be returned. See the section on
Non-determinism for more information.
pp b.successors
[InstructionTable(
1: Instruction[REF: Subscriber Secon..](pop: 0, drop: 0),
2: Instruction[REF: Property and Cas..](pop: 0, drop: 0),
3: Instruction[PER: Property and Cas..](pop: 0, drop: 3),
4: Instruction[NM1: Subscriber Name ](pop: 1, drop: 0, push: LoopState),
5: Instruction[NM1: Payer Name ](pop: 1, drop: 0, push: LoopState),
6: Instruction[CLM: Claim Informatio..](pop: 1, drop: 2, push: LoopState),
7: Instruction[ HL: Subscriber Hiera..](pop: 2, drop: 0, push: LoopState),
8: Instruction[ HL: Billing Provider..](pop: 3, drop: 0, push: TableState),
9: Instruction[ HL: Subscriber Hiera..](pop: 3, drop: 0, push: TableState),
10: Instruction[ HL: Patient Hierachi..](pop: 3, drop: 0, push: TableState),
11: Instruction[ SE: Transaction Set ..](pop: 3, drop: 4, push: TableState),
12: Instruction[ ST](pop: 4, drop: 0, push: TransactionSetState),
13: Instruction[ GE: Functional Group..](pop: 4, drop: 2),
14: Instruction[ GS](pop: 5, drop: 0, push: FunctionalGroupState),
15: Instruction[IEA: Interchange Cont..](pop: 5, drop: 2),
16: Instruction[ISA](pop: 6, drop: 0, push: InterchangeState))]
Chaining Method Calls
The Either
datatype allows chaining via the #map
, #or
,
#flatmap
, and #tap
methods. The use of each method is
demonstrated in the following examples.
Map
The #map
method transforms one Either.success
into another
Either.success
. It leaves Either.failure
values unaltered, passing
them through.
result = machine.find(:REF).map do |ref|
# When a REF segment was found, this block is
# executed. The return value of this block is
# wrapped by Either.success. If a REF segment
# was not found, this block is not executed and
# the Either.failure is propogated by #map
ref.id
end
# Briefer syntax, implementing Symbol#to_proc
machine.find(:REF).map(&:id)
#=> Either.success(:REF)
machine.find(:REF).map(&:id).map(&:to_s)
#=> Either.success("REF")
machine.find(:REF).map(&:id).map(&:to_s).map(&:length)
#=> Either.success(3)
Flatmap
To transform one Either.success
into another Either
, which may be a success
or failure, use the #flatmap
method. Like #map
, it also leaves
Either.failure
values unaltered. This is an important technique to traverse
the parse tree. For instance, the following shows how to locate the "Billing
Provider Organization" of the first claim in the parse tree.
machine.first.
flatmap{|x| x.find(:GS) }.
flatmap{|x| x.find(:ST) }.
flatmap{|x| x.find(:HL, nil, nil, "20") }.
flatmap{|x| x.find(:NM1, "85") }.
flatmap{|x| x.element(3) }.
map(&:node)
#=> Either.success(AN.value[E1035: Billing Provider Last or Organizational Name](BEN KILDARE SERVICE))
You can use #flatmap
to iterate a sequence of segments with the same
identifier, or use #iterate
which does the same thing.
# Find the first HL segment
position = machine.first.
flatmap{|x| x.find(:GS) }.
flatmap{|x| x.find(:ST) }.
flatmap{|x| x.find(:HL) }
while position.defined?
position = position.flatmap do |hl|
# Process the HL segment
...
# Find the next HL segment
hl.find(:HL)
end
end
Note that the value returned by the block given to #flatmap
must return
an instance of Either
or a TypeError
will be raised. The Object#try
method is similar to #flatmap
in many ways.
Iterate
The #iterate
method is a convenience method that calls #find
repeatedly with the given arguments, yielding its result to a block.
# Process each ST in the first ISA envelope
machine.first.flatmap do |isa|
isa.iterate(:GS) do |gs|
gs.iterate(:ST) do |st|
# ...
end
end
end
Note that the search starts ahead of the current position, so the following
machine.first.flatmap{|m| m.iterate(:ISA) {|isa| ... }}
will never yield the
first ISA segment in the document, because it started searching after m
,
which is the first ISA.
Since #iterate
calls #find
, it enforces the same syntactic
constraints.
Side Effects
In cases where you do not want to transform the Either
value, but only need to
execute a side effect on Either.success
, the #tap
method is suitable.
On Either.success
values, it passes the wrapped value to the block and returns
the original value, and it passes Either.failure
values along without calling
the block.
# Record GS06 Group Control Number
machine.first.flatmap{|x| x.find(:GS) }.tap do |gs|
# The return value of this block is discarded
gs.element(6).tap do |control|
@numbers << control.node.to_s
end
end #=> Either.success(StateMachine[1](SegmentVal[GS](...)))
# Contrived example
machine.first.
flatmap{|x| x.find(:GS) }.tap{|x| puts "Hi, GS" }
flatmap{|x| x.find(:ST) }.tap{|x| puts "Hi, ST" }
#=> Either.success(StateMachine[1](SegmentVal[ST](...)))
The Object#tap
method is similar to Either#tap
except #tap
always calls the block, even on nil
, while #tap
does not call the block
on Either.failure
values.
Error Recovery
When a method call earlier in the chain returns a Either.failure
, the error
will be propogated through the chain unless it the #or
method is used to
recover from the error.
result = machine.find(:ST).map do |st|
st.class
end.or do |reason|
# This block is executed if #find failed. The
# block must return an instance of Either, or
# a TypeError will be raised
Either.success(FalseClass)
end
# Result is guaranteed to be Either.success because
# of the block given to #or. The wrapped value then
# is StateMachine or FalseClass
result.defined?
#=> true
Word of Caution
Beware that the #find
method only searches forward in the sequence
of segments. In some cases, you will need to save the current position to let
you restart another search from that position, rather than chaining successive
searches together.
For instance, in the X222 837P transaction set, there are sixteen different
consecutive DTP
segments in Loop 2300. While the X222 implementation guide
arranges them in what appears to be a sequence, there is no restriction on
the order in which the DTP
segments occur -- they all have the same position.
Thus DTP*439
"Accident Date" can follow or preceed DTP*096
"Discharge Date",
and one or both might not be present.
clm = machine.first.
flatmap{|x| x.find(:GS) }.
flatmap{|x| x.find(:ST) }.
flatmap{|x| x.find(:HL, nil, nil, nil, "0") }.
flatmap{|x| x.find(:CLM) }
clm. # Wrong: this assumes DTP*439 occurs before DTP*096
flatmap{|x| x.find(:DTP, "439") }.tap{|x| puts "accident ..." }
flatmap{|x| x.find(:DTP, "096") }.tap{|x| puts "discharge ..." }
# Correct: no order is assumed among the DTP segments
clm.flatmap{|x| x.find(:DTP, "439") }.tap{|x| puts "accident ..." }
clm.flatmap{|x| x.find(:DTP, "096") }.tap{|x| puts "accident ..." }
In general, siblings following the first segment in a purely syntactic node,
like a table, loop, or envelope structure cannot exist unless the syntactic node
exists -- and the syntactic node cannot exist unless an entry segment from the
definition of that node occurs. With few exceptions, the entry segment of a
syntactic node is the first segment in its definition. Therefore, the 2300 DTP
segments cannot exist unless the 2300 CLM
segment occurs; that is why it is
best to save the CLM
position and use it as a starting point.
Non-determinism
Certain sequences of input segments can be described by more than one parse
tree. Often these sequences are malformed. For instance, in an X222 837P
transaction, an HL
segment that has an empty HL03
qualifier could
potentially be the HL
segment describing the "Billing Provider Detail",
"Subscriber Detail", or "Patient Detail". In this case the parser will construct
three parse trees: one for each possibility. The parser will respond to
#deterministic?
with false
when it is in a non-deterministic state.
In a non-determistic state, methods that normally return a single node will
return Either.failure("non-deterministic state")
. These are #segment
,
#element
, and #zipper
. Traversal methods, however, like #next
#first
, #last
, #find
, and #parent
, operate on
each parse tree in parallel.
These traversal methods will position the parser on parallel segments within
each parse tree. To use the HL
example again, the parser would point to
each tree's version of the HL
segment, one named "Patient Detail", one named
"Bililng Provider Detail", and another named "Subscriber Detail". These segments
all have the same element values, but have a different meaning.