Useful notes¶
This section contains useful tips to help with understanding function usage, ease shell interaction and introduce some basic characteristics of the dynamic virtual tables.
Understanding function examples¶
While reading each function’s description, you will notice examples like this:
>>> sql("select ifthenelse(1>0,'yes','no') as answer")
answer
------
yes
Because examples also serve as tests for madIS’s functions, they are coded and displayed as actual Python code. The string inside Python function “sql()” is an SQL query, having its results displayed in the lines following it. Query results in examples also include the column names for clarity. However in the interactive shell column names are not shown. In the interactive shell, the same as above example, would look like the following:
mterm> select ifthenelse(1>0,'yes','no') as answer;
yes
Query executed in 0 min. 0 sec 33 msec
Understanding function parameters¶
Named parameters
All types of functions (row, aggregate and virtual table) can take simple or named parameters. For simple parameters, ordering is important and should follow function’s definition. Named parameters can be placed anywhere in the parameters lists, with few exceptions that are clearly stated. Named parameters are strings (so they must be placed inside quotes) using the format paramname:paramvalue.
For example file
function is declared as:
file(location[, formatting options])
where location is a simple parameter containing the file name or network resource to read from, and formatting options is a list of named parameters (eg. parameter dialect which accepts one of values tsv or csv). So using this function to import a csv file would look like this:
select * from file('myfile','dialect:csv');
By using inverted syntax (see SQL extensions), the query would look like this:
select * from (file 'myfile' dialect:csv);
Notice that non named parameters must be quoted. However named parameters quoting is optional. Also parameters are separated with spaces instead of commas. If a space character appears in a named parameter’s value field, quoting should be used. So importing a file that uses spaces as value delimiters, would look like following (using inverted syntax):
select * from (file 'myfile' 'delimiter: ');
An example of a function definition which contains named parameters is output
:
output(query:None, file[, formatting options])
The definition above states, that the function output receives the named parameter query which has no default value (the value following the “:” character of a named parameter in a functions’ definition, represents the default value).
The query parameter
Named parameter query shown in the definition above, defines which SQL query the function output will execute and then write the query’s results in the given file. When using, in inverted syntax, a function that takes a query parameter (see SQL extensions) the query keyword should be omitted. Instead the actual query should be placed at the end of the statement. For example to output, in inverted syntax, some query’s result, the following could be used:
output 'outfile' dialect:tsv select * from table1;
In non inverted syntax, this is equivalent to:
select * from output('query:select * from table1','outfile','dialect:tsv');
Inverting queries that include the query parameter, permits “pipelining” of many virtual table functions, while using an easy to read syntax. So to re-order a file and add line numbers you can do:
output 'outfile.csv' rowidvt select * from (file 'infile.csv') order by C2;
Let’s look at the details of this query. file
function reads infile.csv and returns the data as a table.
Returned data are then ordered by column C2.
The result of the query, select * from (file ‘infile.csv’) order by C2 is the query parameter for rowidvt
function.
This function adds a rowid column to the query result that indicates the order of the row in the result (and not in the initial table).
The result of rowidvt
function is forwarded into output
which writes the result in the outfile.csv file.
Understanding function return types¶
To understand how function return types works, make sure you understand how SQLite types work. The return type of the row and aggregate madIS functions refers to the storage class typically returned by the function. Virtual table functions return a table having a static (or dynamic) schema with column names and types affinity. To clarify the difference of the returned types of row/aggregate functions and virtual table functions consider the following example:
mterm> select * from (select kwnum('lol') as a) where a=='1';
Query executed in 0 min. 0 sec 17 msec
Here kwnum()
function returns the number of keywords of the ‘lol’ string which is an integer.
Although the first returned column (named a) is of integer type, the literal text ‘1’ in the where part is not converted to an integer,
so the condition fails. In the following example, by using the built in SQLite function cast, an integer type affinity
is suggested so the text literal ‘1’ is automatically converted to an integer and the condition is evaluated to true.
mterm> select * from (select cast(kwnum('lol') as int) as a) where a=='1';
1
Query executed in 0 min. 0 sec 21 msec
In conclusion, the returned values from row and aggregate functions, even multisets do not also imply data affinity.
Note
The same result could be reached using the virtual table functions, eg. typing
or setschema
.
Shell interaction¶
In madIS’s interactive shell mode some helpful shell commands and functions helping with complex query composition are supported. These can be listed in the terminal with the .help command.
Names function
Function names
is a virtual table function that operates over a query (See Understanding function parameters)
and returns the column names of the query result.
mterm> names select * from file('test.csv','header:t') ;
City|Region|Country|Population
Query executed in 0 min. 0 sec 24 msec
Function coltypes
, has similar functionality, additionally presenting the column’s
type affinity when this information is contained in its input.
mterm> coltypes select * from file('/home/meili/Desktop/test.csv','header:t') ;
City|text
Region|text
Country|text
Population|text
Empty schema exception¶
A price to pay for using virtual tables having dynamic schema creation, is that if no data are provided, for example an empty file
to the file
function or
a query that returns no rows to virtual tables that accept input queries (eg. cache
function), is an exception as a result:
mterm> select * from (cache select 5 as a where a!=5);
Madis SQLError: operator cache: Cannot initialise dynamic schema virtual table without data
To avoid this issue, especially in automatic flow execution where a flow crash is undesirable, the setschema
function can be used.
Setschema is virtual table function that enforces a given schema. In the case of non empty resultsets, it can be used to project,
rename and typecast inner query columns, while in case of an empty resultset, it works as a static schema definition.
mterm> select * from (setschema 'a' cache select 5 as a where a!=5);
Query executed in 0 min. 0 sec 62 msec
In the example above, cache function has a possibility of producing an empty schema exception. Applying the setschema function with the desirable schema (column a with NONE type) this problem is avoided.
Other possible causes of empty schema exceptions are also multiset functions which are actually implemented
through the expand
virtual table function.
Efficiency in streaming virtual tables¶
Virtual table functions that work as streams do not need to materialise the whole input/output data streams before returning them, they can produce them on demand. So executing, for example a join, between streaming virtual tables for a huge volume of data is not a good idea, as the data will have to be re-produced over and over again. Using cache virtual table function on one of the streaming queries will significantly increase performance.