Reference

StructuredData in general

StructuredData is the concept of organizing data in a special hierarchical data structure. First we have to define the terms used in the following chapters.

StructuredData terminology

StructuredData

This is the concept of having data in a hierarchical structure. There is always a top node which is always a collection.

StructuredDataContainer

This is a StructuredData structure that contains a StructuredDataStore and StructuredDataTypes.

StructuredDataStore

This is a StructuredData structure that holds your data.

StructuredDataTypes

This is a StructuredData structure that contains type declarations for a StructuredDataStore.

node

Either a collection or a scalar.

scalar

Either a boolean, integer, real or string. A scalar is a simple value with no references. It cannot be referenced and is always contained in a collection.

boolean

This is a scalar with only two possible Values, True or False. Note that in SDpyshell these two values are True and False with an upper case first letter. In YAML however, the values are true and false (all in small caps).

integer

An integer number. Note that the range of these numbers is not defined here. We require that the range is at least -2**31 to +2**31.

real

A floating point number. We require floating point numbers according to the IEEE 754 standard.

string

A sequence of characters. Unicode characters are supported.

collection

Either a map or a list.

map

A data structure that maps mapkeys, which are always strings, to values which are always nodes. Note that each mapkey can only be present once in the map. Each mapkey is associated with exactly one node. However, two map keys may be associated with the same node.

mapkey

This is a key of a map. A mapkey is always a string.

list

A data structure that is a sequence of nodes. Note that the elements of the list have the order you gave them and that two elements of the list may be equal.

listindex

This is the index that identifies a member of a list. An listindex is always an integer.

key

Either a mapkey or a listindex.

keylist

A list of keys. A keylist is a reference to a node in a StructuredDataStore. It describes how to find the node when you start at the top of the StructuredDataStore. The first key is a identifies a node in the top collection. If this node is a collection, the second key identifies a node in this second collection. If this node is again a collection, the third key identifies a node in this third collection and so on until you finally reach the referenced node.

path

This is a keylist converted to a string. Basically mapkeys are concatenated with dots ‘.’ while listindices are concatenated after they are enclosed in square brackets. A typical path may look like this “abc.def[4].ghi”. For a precise definition of how paths are constructed see paths.

pattern

A path that may also contain paths. By definition, all paths are also patterns.

wildcard

Special keys that match whole classes of keys in a StructuredData structure. “*” matches any mapkey and any listindex while “**” matches one or more mapkey and listindex.

reference

Collections are never contained in other collections, they are only referenced. It is possible that a collection is referenced by more than one other collection.

link

A link is a reference to a collection that is already referenced somewhere else.

Relation of Structured Data to python data structures

You may skip this section if you are not familiar with python.

Here is an overview on which terms of the StructuredData definition relate to which python data type:

Structured Data term

python data type

map

dict where keys are always strings

list

list

boolean

bool

integer

int

real

float

string

str

collection

either dict or list

scalar

an int, a float or a str

Paths

The definition of StructuredData allows to construct a unique path for each node. We construct a path like this:

We start at the top of the StructuredData store and move, key by key towards the node we have selected. We collect the keys we encounter in that order in a list. It is now obvious that this list of keys identifies the node. A path is simply a string representation of that list of keys.

Joining a keylist to a path

The rules to construct a path from a list of keys are like this:

  • If the key is a list index convert it to a string and enclose it in square brackets, e.g index 9 becomes the string “[9]”.

  • If the key is a map key it must be a string. Apply escape rules to the string.

  • Combine all converted keys with the “.” character.

  • If the path contains the sequence “.[” replace it with “[“.

Here are some examples:

list of keys

path

“A” “B”

A.B

“A.B” “C”

A\.B.C

“A” 2 “C”

A[2].C

“A” “*” “C”

A\*.C

“A” ANYKEY “C”

A.*.C

Note that “ANYKEY” is a special variable that represents the “*” wildcard as it is used in patterns, for more information on patterns see patterns.

Escape rules

The escape rules ensure that any list of map keys and list indices can be represented as a path path and that this list can always be reconstructed from the path. The rules also ensure that a path can not be confused with a pattern containing wildcards.

The escape rules are these:

  • If the key is “*” change it to “\*”.

  • If the key is “**” change it to “\**”.

  • If the key is “#” change it to “\#”

  • If the key starts with a sequence of “\” followed by either “*”, “**” or “#”, prepend a “\” character.

  • Replace all occurences of “.” in the key with “\.”.

  • Replace all occurences of “[” in the key with “\[“.

  • Replace all occurences of “]” in the key with “\]”.

Here are some examples:

key

escaped key

A.B

A\.B

A.B[5]C

A\.B\[5\]C

*

\*

**

\**

#

\#

\*

\\*

Example

Here is an example of StructuredData (only the StructuredDataStore) formulated in YAML:

item1:
    first:
    - A
    - B
    second:
    - X
    - Y
    third:
    -   m: 1
        n: 2
    -   p: 10
        q: 11

If you are familiar with python, this would be the same structure in python:

{ "item1" : { "first":  ["A","B"],
              "second": ["X","Y"],
              "third":  [ {"m": 1, "n":2}, {"p":10, "q":11}]
            }
}

In the example of StructuredData shown above the following table shows some examples of paths and the data they point to:

path

data (in python notation)

item1.first

[“A”,”B”]

item1.first[1]

“B”

item1.second[0]

“X”

item1.third

[ {“m”: 1, “n”:2}, {“p”:10, “q”:11}]

item1.third[0]

{“m”: 1, “n”:2}

item1.third[0].m

1

item1.third[0].n

2

item1.third[1].q

11

Patterns

In order to select a subset from a set of paths we define patterns, also called path patterns where it could be confused with other types of patterns. In patterns we combine special keys with ordinary keys. So each path can also be considered as a pattern. These are the special keys that can be used in patterns:

key name

string representation

meaning

ANYKEY

*

matches any key

ANYKEYS

**

matches one or more keys of any value

ROOTKEY

#

used in type patterns for the root type

Patterns come in two flavours, type patterns and match patterns. For detailed information on type patterns see also StructuredDataTypes.

Here are the differences between both flavours:

flavour

allowed special keys

usage

type pattern

ROOTKEY ANYKEY

type declarations

match pattern

ANYKEY ANYKEYS

matching paths

Example

Here are some examples for match patterns:

Assume that we have the following set of paths:

item1
item1.first
item1.first.A
item1.first.B
item1.second
item1.second.X
item1.second.Y
item1.third
item1.third[0]
item1.third[1]
item1.third[0].m
item1.third[0].n
item1.third[1].p
item1.third[1].q

This is what some patterns match:

wildcard-path

paths matched

*

item1

item1.*

item1.first item1.second item1.third

item1.second.*

item1.second.X item1.second.Y

item1.*.*

item1.first.A item1.first.B item1.second.X item1.second.Y item1.third[0] item1.third[1]

item1.third[1].*

item1.third[1].p item1.third[1].q

item1.third.**

item1.third[0] item1.third[1] item1.third[0].m item1.third[0].n item1.third[1].p item1.third[1].q

*.second.*

item1.second.X item1.second.Y

StructuredDataStore

A StructuredDataStore basically is StructuredData without type declarations. A StructuredDataStore is often embedded in a StructuredDataContainer together with StructuredDataTypes.

StructuredDataTypes

The concept of paths allows to reference any part in a StructuredDataStore with a single string. The concept of patterns allows to reference sets of paths and by this sub sets of the StructuredDataStore. For an introduction on patterns see patterns. Here we use a special flavour of patterns called type patterns, for further details on this see type patterns.

A StructuredDataTypes structure maps patterns, which are strings, to type declarations which are simple scalars or nodes. By this StructuredDataTypes is itself StructuredData.

We can now check the types of a StructuredDataStore if they are consistent with the type declarations in StructuredDataTypes. For all paths in the StructuredDataStore we check if we find a matching pattern in StructuredDataTypes. If more than one patterns match, the “best” matching pattern is selected. See also matching typepatterns for details.

If a pattern is found, the corresponding type declaration is checked with the node referenced by the path. We report an error for each path where the type declaration didn’t match.

Differences to programming language type declarations

In statically typed programming languages without type inference you have to declare types for all variables and parameters and functions. With StructuredData you can define types partially. It is possible to have no type declarations for parts of the data.

Typepatterns

Typepatterns are a flavour of patterns that are used for type declarations. The wildcard “**” (ANYKEYS) is not allowed here. The special path “#” (ROOTKEY) is used to declare the type of the top node since the top node has no path.

Here are some examples of typepatterns:

pattern

comment

#

matches the top node

*

matches all elements of the top node

A

matches element “A” of the top node

A.B

matches element “B” of element “A” of the top node

Typepattern matching

During a typecheck the program tries for each path if it finds a matching typepattern in StructuredDataTypes. In order to speed up this process not all typepatterns are examined but only those who have the same length as the path. For this reason “**” is not allowed in typepatterns since it would also match longer paths. The details of the typepattern matching algorithm are important if more than one typepattern would match the path. The algorithm determines which of the matching typepatterns is selected for the actual typecheck.

At each stage a directly matching key in a typepattern has precedence over a wildcard. If a matching typepattern is found, the other typepatterns are not searched.

Here are some examples with a path, some typepatterns and an indicator which typepattern is found by the match algorithm:

path

typepatterns

matched

X.B.D

*.*.D

X

*.B.C

X.A.*

X.B.D

X.B.*

X.B.D

X

X.B.D

X.*.*

X

*.B.D

Type declarations

This is the list of currently known type declarations, note that we write the type declaration in YAML syntax here:

boolean

A boolean. A scalar of type boolean has only two possible Values, True or False. Note that in SDpyshell these two values are True and False. In YAML however, the values or true and false (all in small caps). This data type is represented with the string:

boolean

integer

An integer number. Note that the range of these numbers is not defined here. We assume that the range is at least -2**31 to +2**31. This data type is represented with the string:

integer

real

A floating point number. We assume floating point numbers according to the IEEE 754 standard.

This data type is represented with the string:

real

string

A sequence of characters. Unicode characters are supported.

This data type is represented with the string:

string

optional struct

This is a map where all map keys must be elements of the list provided in the type declaration.

This data type is represented as a map with just one key and a list as value. Here is the representation of it in YAML, there can be an arbitrary number of map keys:

optional_struct:
- map_key1
- map_key2

open struct

This is a map where all elements of the list provided in the type declaration must be present as map keys. The map may however, have other additional keys.

This data type is represented as a map with just one key and a list as value. Here is the representation of it in YAML, there can be an arbitrary number of map keys:

open_struct:
- map_key1
- map_key2

struct

This is a map where all elements of the list provided in the type declaration must be present as map keys. No other keys are allowed in the map than the elements of the list.

This data type is represented as a map with just one key and a list as value. Here is the representation of it in YAML, there can be an arbitrary number of map keys:

struct:
- map_key1
- map_key2

typed map

This is a map where each value must be of the type scalar_type. scalar_type is either “boolean”, “integer”, “real” or “string”.

This data type is represented as a map with just one key and a string as value. The value must be one of the strings “boolean”, “integer”, “real” or “string”. Here is a representation in YAML which requires that all map values must be integers:

typed_map: integer

map

This is a map with no further restrictions (aside from that map keys must be strings).

This data type is represented with the string:

map

optional list

This is a list where all list elements must be elements of the list provided in the type declaration.

This data type is represented as a map with just one key and a list as value. Here is the representation of it in YAML, there can be an arbitrary number of values:

optional_list:
- value1
- value2

typed list

This is a list where each value must be of the type scalar_type. scalar_type is either “boolean”, “integer”, “real” or “string”.

This data type is represented as a map with just one key and a string as value. The value must be one of the strings “boolean”, “integer”, “real” or “string”. Here is a representation in YAML which requires that all list elements must be integers:

typed_list: integer

list

This is simply a list with no further restrictions.

This data type is represented with the string:

list

StructuredDataContainer

A StructuredDataContainer contains a StructuredDataStore and optionally StructuredDataTypes. When a StructuredDataContainer is stored in a file, it is stored in YAML format. Here is an example how such a file looks like:

'**SDC-Metadata**':
    version: '1.0'
'**SDC-Store**':
    key1: 1
    key2:
        A: x
        B: y
    key3:
    - 1
    - 2
    - 3
    -   float: 1.23
'**SDC-Types**':
    '#':
        struct:
        - key1
        - key2
        - key3
    '*.key1': integer
    '*.key2':
        optional_struct:
        - A
        - B
        - C
    '*.key2.*': string
    '*.key3':
        typed_list: integer

A StructuredDataContainer consists of three parts, the metadata, the StructuredDataStore and the StructuredDataTypes.

metadata

This is meta information on the file. Currently it only contains the version number of the file format. It is everything below the key “**SDC-Metadata**”.

StructuredDataStore

This is the part of the file where the data is stored. It is everything below the key “**SDC-Store**”.

StructuredDataTypes

Here are the type declarations. Type declarations are explained in more detail further below in this file. For now we just remember that type declarations consist of paths and types. A path is a string that identifies a position in the store. The “#” is the root symbol, it is used to define the type for the topmost part of the StructuredDataStore. The “*” characters are wildcards, similar to the “*” used in file systems, they match any string at that position. Note that the store and the types may reside in two different files.