Useful notes

This section contains useful tips to help with understanding function usage, ease shell interaction and introduce some basic characteristics of the dynamic virtual tables.

Understanding function examples

While reading each function’s description, you will notice examples like this:

>>> sql("select ifthenelse(1>0,'yes','no') as answer")
answer
------
yes

Because examples also serve as tests for madIS’s functions, they are coded and displayed as actual Python code. The string inside Python function “sql()” is an SQL query, having its results displayed in the lines following it. Query results in examples also include the column names for clarity. However in the interactive shell column names are not shown. In the interactive shell, the same as above example, would look like the following:

mterm> select ifthenelse(1>0,'yes','no') as answer;
yes
Query executed in 0 min. 0 sec 33 msec

Understanding function parameters

Named parameters

All types of functions (row, aggregate and virtual table) can take simple or named parameters. For simple parameters, ordering is important and should follow function’s definition. Named parameters can be placed anywhere in the parameters lists, with few exceptions that are clearly stated. Named parameters are strings (so they must be placed inside quotes) using the format paramname:paramvalue.

For example file function is declared as:

file(location[, formatting options])

where location is a simple parameter containing the file name or network resource to read from, and formatting options is a list of named parameters (eg. parameter dialect which accepts one of values tsv or csv). So using this function to import a csv file would look like this:

select * from file('myfile','dialect:csv');

By using inverted syntax (see SQL extensions), the query would look like this:

select * from (file 'myfile' dialect:csv);

Notice that non named parameters must be quoted. However named parameters quoting is optional. Also parameters are separated with spaces instead of commas. If a space character appears in a named parameter’s value field, quoting should be used. So importing a file that uses spaces as value delimiters, would look like following (using inverted syntax):

select * from (file 'myfile' 'delimiter: ');

An example of a function definition which contains named parameters is output:

output(query:None, file[, formatting options])

The definition above states, that the function output receives the named parameter query which has no default value (the value following the “:” character of a named parameter in a functions’ definition, represents the default value).

The query parameter

Named parameter query shown in the definition above, defines which SQL query the function output will execute and then write the query’s results in the given file. When using, in inverted syntax, a function that takes a query parameter (see SQL extensions) the query keyword should be omitted. Instead the actual query should be placed at the end of the statement. For example to output, in inverted syntax, some query’s result, the following could be used:

output 'outfile' dialect:tsv select * from table1;

In non inverted syntax, this is equivalent to:

select  * from output('query:select * from table1','outfile','dialect:tsv');

Inverting queries that include the query parameter, permits “pipelining” of many virtual table functions, while using an easy to read syntax. So to re-order a file and add line numbers you can do:

output 'outfile.csv' rowidvt select * from (file 'infile.csv') order by C2;

Let’s look at the details of this query. file function reads infile.csv and returns the data as a table. Returned data are then ordered by column C2. The result of the query, select * from (file ‘infile.csv’) order by C2 is the query parameter for rowidvt function. This function adds a rowid column to the query result that indicates the order of the row in the result (and not in the initial table). The result of rowidvt function is forwarded into output which writes the result in the outfile.csv file.

Understanding function return types

To understand how function return types works, make sure you understand how SQLite types work. The return type of the row and aggregate madIS functions refers to the storage class typically returned by the function. Virtual table functions return a table having a static (or dynamic) schema with column names and types affinity. To clarify the difference of the returned types of row/aggregate functions and virtual table functions consider the following example:

mterm> select * from (select kwnum('lol') as a) where a=='1';
Query executed in 0 min. 0 sec 17 msec

Here kwnum() function returns the number of keywords of the ‘lol’ string which is an integer. Although the first returned column (named a) is of integer type, the literal text ‘1’ in the where part is not converted to an integer, so the condition fails. In the following example, by using the built in SQLite function cast, an integer type affinity is suggested so the text literal ‘1’ is automatically converted to an integer and the condition is evaluated to true.

mterm> select * from (select cast(kwnum('lol') as int) as a) where a=='1';
1
Query executed in 0 min. 0 sec 21 msec

In conclusion, the returned values from row and aggregate functions, even multisets do not also imply data affinity.

Note

The same result could be reached using the virtual table functions, eg. typing or setschema.

Shell interaction

In madIS’s interactive shell mode some helpful shell commands and functions helping with complex query composition are supported. These can be listed in the terminal with the .help command.

Names function

Function names is a virtual table function that operates over a query (See Understanding function parameters) and returns the column names of the query result.

mterm> names select * from file('test.csv','header:t') ;
City|Region|Country|Population
Query executed in 0 min. 0 sec 24 msec

Function coltypes, has similar functionality, additionally presenting the column’s type affinity when this information is contained in its input.

mterm> coltypes select * from file('/home/meili/Desktop/test.csv','header:t') ;
City|text
Region|text
Country|text
Population|text

Empty schema exception

A price to pay for using virtual tables having dynamic schema creation, is that if no data are provided, for example an empty file to the file function or a query that returns no rows to virtual tables that accept input queries (eg. cache function), is an exception as a result:

mterm> select * from (cache select 5 as a where a!=5);
Madis SQLError: operator cache: Cannot initialise dynamic schema virtual table without data

To avoid this issue, especially in automatic flow execution where a flow crash is undesirable, the setschema function can be used. Setschema is virtual table function that enforces a given schema. In the case of non empty resultsets, it can be used to project, rename and typecast inner query columns, while in case of an empty resultset, it works as a static schema definition.

mterm> select * from (setschema 'a' cache select 5 as a where a!=5);
Query executed in 0 min. 0 sec 62 msec

In the example above, cache function has a possibility of producing an empty schema exception. Applying the setschema function with the desirable schema (column a with NONE type) this problem is avoided.

Other possible causes of empty schema exceptions are also multiset functions which are actually implemented through the expand virtual table function.

Efficiency in streaming virtual tables

Virtual table functions that work as streams do not need to materialise the whole input/output data streams before returning them, they can produce them on demand. So executing, for example a join, between streaming virtual tables for a huge volume of data is not a good idea, as the data will have to be re-produced over and over again. Using cache virtual table function on one of the streaming queries will significantly increase performance.