Ranges

Thus far the rules we have examined have one thing in common; the values they produce are fixed in size and known at compile-time. However, grammars can specify the repetition of elements. For example consider the following grammar (loosely adapted from rfc7230):

chunk-ext      = *( ";" token )

The star operator in BNF notation means a repetition. In this case, zero or more of the expression in parenthesis. This production can be expressed using the function range_rule, which returns a rule allowing for a prescribed number of repetitions of a specified rule. The following rule matches the grammar for chunk-ext defined above:

constexpr auto chunk_ext_rule = range_rule(
    tuple_rule( squelch( delim_rule( ';' ) ), token_rule( alnum_chars ) ) );

This rule produces a range, a ForwardRange whose value type is the same as the value type of the rule passed to the function. In this case, the type is string_view because the tuple has one unsquelched element, the token_rule. The range can be iterated to produce results, without allocating memory for each element. The following code:

system::result< range< core::string_view > > rv = parse( ";johndoe;janedoe;end", chunk_ext_rule );

for( auto s : rv.value() )
    std::cout << s << "\n";

produces this output:

johndoe
janedoe
end

Sometimes a repetition is not so easily expressed using a single rule. Take for example the following grammar for a comma delimited list of tokens, which must contain at least one element:

token-list    = token *( "," token )

We can express this using the overload of range_rule which accepts two parameters: the rule to use when performing the first match, and the rule to use for performing every subsequent match. Both overloads of the function have additional, optional parameters for specifying the minimum number of repetitions, or both the minimum and maximum number of repetitions. Since our list may not be empty, the following rule perfectly captures the token-list grammar:

constexpr auto token_list_rule = range_rule(
    token_rule( alnum_chars ),
    tuple_rule( squelch( delim_rule( ',' ) ), token_rule( alnum_chars ) ),
    1 );

The following code:

system::result< range< core::string_view > > rv = parse( "johndoe,janedoe,end", token_list_rule );

for( auto s : rv.value() )
    std::cout << s << "\n";

produces this output:

johndoe
janedoe
end

In the next section we discuss the available rules which are specific to rfc3986.

More

These are the rules and compound rules provided by the library. For more details please see the corresponding reference sections.

Name Description

dec_octet_rule

Match an integer from 0 and 255.

delim_rule

Match a character literal.

literal_rule

Match a character string exactly.

not_empty_rule

Make a matching empty string into an error instead.

optional_rule

Ignore a rule if parsing fails, leaving the input pointer unchanged.

range_rule

Match a repeating number of elements.

token_rule

Match a string of characters from a character set.

tuple_rule

Match a sequence of specified rules, in order.

unsigned_rule

Match an unsigned integer in decimal form.

variant_rule

Match one of a set of alternatives specified by rules.