Reference ========= StructuredData in general ------------------------- StructuredData is the concept of organizing data in a special hierarchical data structure. First we have to define the terms used in the following chapters. .. _reference-StructuredData-terminology: StructuredData terminology ++++++++++++++++++++++++++ *StructuredData* This is the concept of having data in a hierarchical structure. There is always a top :ref:`node ` which is always a :ref:`collection `. *StructuredDataContainer* This is a StructuredData structure that contains a StructuredDataStore and StructuredDataTypes. *StructuredDataStore* This is a StructuredData structure that holds your data. *StructuredDataTypes* This is a StructuredData structure that contains type declarations for a StructuredDataStore. .. _reference-StructuredDataStore_terminology_node: *node* Either a :ref:`collection ` or a :ref:`scalar `. .. _reference-StructuredData-terminology-scalar: *scalar* Either a *boolean*, *integer*, *real* or *string*. A scalar is a simple value with no references. It cannot be referenced and is always contained in a collection. *boolean* This is a scalar with only two possible Values, True or False. Note that in SDpyshell these two values are *True* and *False* with an upper case first letter. In YAML however, the values are *true* and *false* (all in small caps). *integer* An integer number. Note that the range of these numbers is not defined here. We require that the range is at least -2**31 to +2**31. *real* A floating point number. We require floating point numbers according to the IEEE 754 standard. *string* A sequence of characters. Unicode characters are supported. .. _reference-StructuredData-terminology-collection: *collection* Either a :ref:`map ` or a :ref:`list `. .. _reference-StructuredData-terminology-map: *map* A data structure that maps mapkeys, which are always strings, to values which are always nodes. Note that each mapkey can only be present once in the map. Each mapkey is associated with exactly one node. However, two map keys may be associated with the same node. *mapkey* This is a key of a map. A mapkey is always a string. .. _reference-StructuredData-terminology-list: *list* A data structure that is a sequence of nodes. Note that the elements of the list have the order you gave them and that two elements of the list may be equal. *listindex* This is the index that identifies a member of a list. An listindex is always an integer. *key* Either a mapkey or a listindex. *keylist* A list of keys. A keylist is a reference to a node in a StructuredDataStore. It describes how to find the node when you start at the top of the StructuredDataStore. The first key is a identifies a node in the top collection. If this node is a collection, the second key identifies a node in this second collection. If this node is again a collection, the third key identifies a node in this third collection and so on until you finally reach the referenced node. *path* This is a keylist converted to a string. Basically mapkeys are concatenated with dots ‘.’ while listindices are concatenated after they are enclosed in square brackets. A typical path may look like this “abc.def[4].ghi”. For a precise definition of how paths are constructed see :ref:`paths `. *pattern* A path that may also contain :ref:`paths `. By definition, all paths are also patterns. .. _reference-StructuredData-terminology-wildcard: *wildcard* Special keys that match whole classes of keys in a StructuredData structure. "*" matches any mapkey and any listindex while "**" matches one or more mapkey and listindex. *reference* Collections are never contained in other collections, they are only referenced. It is possible that a collection is referenced by more than one other collection. *link* A link is a reference to a collection that is already referenced somewhere else. References and links ++++++++++++++++++++ One is tempted to see collections as containing other collections or scalars, but this is not true in the general case. Collections contain scalars but they actually never contain other collections, they only have *references* to them. The difference between containing an item and just having a reference is that in a "contains" relationship a collection can only be contained in one other collected where in a "reference" relationship a collection may be referenced by several other collections. "Reference" relationships even allow circles e.g. A referencing B referencing C referencing A. However, if a collection is referenced by only one other collection, it makes no practical difference if we see this as a "contains" or a "reference" relationship. If a collection is referenced by at least two other collections, we always have to see this as a "reference" relationship. In the context of StructuredData we call these cases "links" and distinguish them from the ordinary cases. When querying a StructuredData object links are not recognizable. However, if you apply a change to a collection it makes a big difference whether this collection is referenced at only one or more than other collection. In order to make the work with links easier SDpyshell provides some format parameters and functions that help you to detect links and see what other collections reference a given collection. Relation of Structured Data to python data structures +++++++++++++++++++++++++++++++++++++++++++++++++++++ You may skip this section if you are not familiar with python. Here is an overview on which terms of the StructuredData definition relate to which python data type: ==================== ================================== Structured Data term python data type ==================== ================================== map dict where keys are always strings list list boolean bool integer int real float string str collection either dict or list scalar an int, a float or a str ==================== ================================== .. _reference-Paths: Paths +++++ The definition of StructuredData allows to construct a unique path for each node. We construct a path like this: We start at the top of the StructuredData store and move, key by key towards the node we have selected. We collect the keys we encounter in that order in a list. It is now obvious that this list of keys identifies the node. A path is simply a string representation of that list of keys. Joining a keylist to a path ::::::::::::::::::::::::::: The rules to construct a path from a list of keys are like this: - If the key is a list index convert it to a string and enclose it in square brackets, e.g index 9 becomes the string "[9]". - If the key is a map key it must be a string. Apply :ref:`escape ` rules to the string. - Combine all converted keys with the "." character. - If the path contains the sequence ".[" replace it with "[". Here are some examples: ============== ======= list of keys path ============== ======= "A" "B" A.B "A.B" "C" A\\.B.C "A" 2 "C" A[2].C "A" "\*" "C" A\\*.C "A" ANYKEY "C" A.\*.C ============== ======= Note that "ANYKEY" is a special variable that represents the "*" wildcard as it is used in *patterns*, for more information on patterns see :ref:`patterns `. .. _reference-Escape_rules: Escape rules :::::::::::: The *escape* rules ensure that any list of map keys and list indices can be represented as a path :ref:`path ` and that this list can always be reconstructed from the path. The rules also ensure that a path can not be confused with a :ref:`pattern ` containing wildcards. The *escape* rules are these: - If the key is "\*" change it to "\\\*". - If the key is "\*\*" change it to "\\\*\*". - If the key is "#" change it to "\\#" - If the key starts with a sequence of "\\" followed by either "\*", "\*\*" or "#", prepend a "\\" character. - Replace all occurences of "." in the key with "\\.". - Replace all occurences of "[" in the key with "\\[". - Replace all occurences of "]" in the key with "\\]". Here are some examples: ======= =============== key escaped key ======= =============== A.B A\\.B A.B[5]C A\\.B\\[5\\]C \* \\\* \*\* \\\*\* # \\# \\\* \\\\\* ======= =============== Example ::::::: Here is an example of StructuredData (only the StructuredDataStore) formulated in YAML:: item1: first: - A - B second: - X - Y third: - m: 1 n: 2 - p: 10 q: 11 If you are familiar with python, this would be the same structure in python:: { "item1" : { "first": ["A","B"], "second": ["X","Y"], "third": [ {"m": 1, "n":2}, {"p":10, "q":11}] } } In the example of StructuredData shown above the following table shows some examples of paths and the data they point to: ================= ===================================== path data (in python notation) ================= ===================================== item1.first ["A","B"] item1.first[1] "B" item1.second[0] "X" item1.third [ {"m": 1, "n":2}, {"p":10, "q":11}] item1.third[0] {"m": 1, "n":2} item1.third[0].m 1 item1.third[0].n 2 item1.third[1].q 11 ================= ===================================== .. _reference-StructuredData-Patterns: Patterns ++++++++ In order to select a subset from a set of paths we define *patterns*, also called *path patterns* where it could be confused with other types of patterns. In patterns we combine special keys with ordinary keys. So each *path* can also be considered as a *pattern*. These are the special keys that can be used in patterns: ======== ===================== ======================================= key name string representation meaning ======== ===================== ======================================= ANYKEY \* matches any key ANYKEYS \*\* matches one or more keys of any value ROOTKEY # used in type patterns for the root type ======== ===================== ======================================= Patterns come in two flavours, *type patterns* and *match patterns*. For detailed information on *type patterns* see also :ref:`StructuredDataTypes `. Here are the differences between both flavours: ============= ==================== =================== flavour allowed special keys usage ============= ==================== =================== type pattern ROOTKEY ANYKEY type declarations match pattern ANYKEY ANYKEYS matching paths ============= ==================== =================== Example ::::::: Here are some examples for match patterns: Assume that we have the following set of paths:: item1 item1.first item1.first.A item1.first.B item1.second item1.second.X item1.second.Y item1.third item1.third[0] item1.third[1] item1.third[0].m item1.third[0].n item1.third[1].p item1.third[1].q This is what some patterns match: +-------------------+---------------------------------------+ | wildcard-path | paths matched | +===================+=======================================+ | \* | item1 | +-------------------+---------------------------------------+ | item1.\* | item1.first item1.second item1.third | +-------------------+---------------------------------------+ | item1.second.\* | item1.second.X item1.second.Y | +-------------------+---------------------------------------+ | item1.\*.\* | item1.first.A item1.first.B | | | item1.second.X item1.second.Y | | | item1.third[0] item1.third[1] | +-------------------+---------------------------------------+ | item1.third[1].\* | item1.third[1].p item1.third[1].q | +-------------------+---------------------------------------+ | item1.third.\*\* | item1.third[0] item1.third[1] | | | item1.third[0].m item1.third[0].n | | | item1.third[1].p item1.third[1].q | +-------------------+---------------------------------------+ | \*.second.\* | item1.second.X item1.second.Y | +-------------------+---------------------------------------+ .. _reference-StructuredDataStore: StructuredDataStore ------------------- A StructuredDataStore basically is StructuredData without type declarations. A StructuredDataStore is often embedded in a StructuredDataContainer together with :ref:`StructuredDataTypes `. .. _reference-StructuredDataTypes: StructuredDataTypes ------------------- The concept of :ref:`paths ` allows to reference any part in a StructuredDataStore with a single string. The concept of *patterns* allows to reference sets of paths and by this sub sets of the StructuredDataStore. For an introduction on *patterns* see :ref:`patterns `. Here we use a special flavour of patterns called *type patterns*, for further details on this see :ref:`type patterns `. A StructuredDataTypes structure maps patterns, which are strings, to type declarations which are simple scalars or nodes. By this StructuredDataTypes is itself StructuredData. We can now check the types of a StructuredDataStore if they are consistent with the type declarations in StructuredDataTypes. For all paths in the StructuredDataStore we check if we find a matching pattern in StructuredDataTypes. If more than one patterns match, the "best" matching pattern is selected. See also :ref:`matching typepatterns ` for details. If a pattern is found, the corresponding type declaration is checked with the node referenced by the path. We report an error for each path where the type declaration didn't match. Differences to programming language type declarations +++++++++++++++++++++++++++++++++++++++++++++++++++++ In statically typed programming languages without type inference you have to declare types for all variables and parameters and functions. With StructuredData you can define types *partially*. It is possible to have no type declarations for parts of the data. .. _reference-StructuredData-Typepatterns: Typepatterns ++++++++++++ Typepatterns are a flavour of :ref:`patterns ` that are used for type declarations. The wildcard "**" (ANYKEYS) is not allowed here. The special path "#" (ROOTKEY) is used to declare the type of the *top node* since the *top node* has no path. Here are some examples of typepatterns: ======= ===================================================== pattern comment ======= ===================================================== # matches the *top node* \* matches all elements of the *top node* A matches element "A" of the *top node* A.B matches element "B" of element "A" of the *top node* ======= ===================================================== .. _reference-Typepattern-matching: Typepattern matching ++++++++++++++++++++ During a typecheck the program tries for each path if it finds a matching typepattern in StructuredDataTypes. In order to speed up this process not all typepatterns are examined but only those who have the same length as the path. For this reason "**" is not allowed in typepatterns since it would also match longer paths. The details of the typepattern matching algorithm are important if more than one typepattern would match the path. The algorithm determines which of the matching typepatterns is selected for the actual typecheck. At each stage a directly matching key in a typepattern has precedence over a wildcard. If a matching typepattern is found, the other typepatterns are not searched. Here are some examples with a path, some typepatterns and an indicator which typepattern is found by the match algorithm: +-----------+--------------+------------+ | path | typepatterns | matched | +===========+==============+============+ | X.B.D | \*.\*.D | X | | +--------------+------------+ | | \*.B.C | | | +--------------+------------+ | | X.A.\* | | +-----------+--------------+------------+ | X.B.D | X.B.\* | | | +--------------+------------+ | | X.B.D | X | +-----------+--------------+------------+ | X.B.D | X.\*.\* | X | | +--------------+------------+ | | \*.B.D | | +-----------+--------------+------------+ Type declarations +++++++++++++++++ This is the list of currently known type declarations, note that we write the type declaration in `YAML `_ syntax here: boolean ::::::: A boolean. A scalar of type boolean has only two possible Values, True or False. Note that in SDpyshell these two values are *True* and *False*. In YAML however, the values or *true* and *false* (all in small caps). This data type is represented with the string:: boolean integer ::::::: An integer number. Note that the range of these numbers is not defined here. We assume that the range is at least -2**31 to +2**31. This data type is represented with the string:: integer real :::: A floating point number. We assume floating point numbers according to the IEEE 754 standard. This data type is represented with the string:: real string :::::: A sequence of characters. Unicode characters are supported. This data type is represented with the string:: string optional struct ::::::::::::::: This is a map where all map keys must be elements of the list provided in the type declaration. This data type is represented as a map with just one key and a list as value. Here is the representation of it in YAML, there can be an arbitrary number of map keys:: optional_struct: - map_key1 - map_key2 open struct ::::::::::: This is a map where all elements of the list provided in the type declaration must be present as map keys. The map may however, have other additional keys. This data type is represented as a map with just one key and a list as value. Here is the representation of it in YAML, there can be an arbitrary number of map keys:: open_struct: - map_key1 - map_key2 struct :::::: This is a map where all elements of the list provided in the type declaration must be present as map keys. No other keys are allowed in the map than the elements of the list. This data type is represented as a map with just one key and a list as value. Here is the representation of it in YAML, there can be an arbitrary number of map keys:: struct: - map_key1 - map_key2 typed map ::::::::: This is a map where each value must be of the type scalar_type. scalar_type is either "boolean", "integer", "real" or "string". This data type is represented as a map with just one key and a string as value. The value must be one of the strings "boolean", "integer", "real" or "string". Here is a representation in YAML which requires that all map values must be integers:: typed_map: integer map ::: This is a map with no further restrictions (aside from that map keys must be strings). This data type is represented with the string:: map optional list ::::::::::::: This is a list where all list elements must be elements of the list provided in the type declaration. This data type is represented as a map with just one key and a list as value. Here is the representation of it in YAML, there can be an arbitrary number of values:: optional_list: - value1 - value2 typed list :::::::::: This is a list where each value must be of the type scalar_type. scalar_type is either "boolean", "integer", "real" or "string". This data type is represented as a map with just one key and a string as value. The value must be one of the strings "boolean", "integer", "real" or "string". Here is a representation in YAML which requires that all list elements must be integers:: typed_list: integer list :::: This is simply a list with no further restrictions. This data type is represented with the string:: list .. _reference-StructuredDataContainer: StructuredDataContainer ----------------------- A StructuredDataContainer contains a StructuredDataStore and optionally StructuredDataTypes. When a StructuredDataContainer is stored in a file, it is stored in `YAML `_ format. Here is an example how such a file looks like:: '**SDC-Metadata**': version: '1.0' '**SDC-Store**': key1: 1 key2: A: x B: y key3: - 1 - 2 - 3 - float: 1.23 '**SDC-Types**': '#': struct: - key1 - key2 - key3 '*.key1': integer '*.key2': optional_struct: - A - B - C '*.key2.*': string '*.key3': typed_list: integer A StructuredDataContainer consists of three parts, the *metadata*, the *StructuredDataStore* and the *StructuredDataTypes*. metadata This is meta information on the file. Currently it only contains the version number of the file format. It is everything below the key "\*\*SDC-Metadata\*\*". StructuredDataStore This is the part of the file where the data is stored. It is everything below the key "\*\*SDC-Store\*\*". StructuredDataTypes Here are the type declarations. Type declarations are explained in more detail further below in this file. For now we just remember that type declarations consist of paths and types. A path is a string that identifies a position in the store. The "#" is the root symbol, it is used to define the type for the topmost part of the StructuredDataStore. The "*" characters are wildcards, similar to the "*" used in file systems, they match any string at that position. Note that the store and the types may reside in two different files.