Vtable functions list

cache function

cache(query:None) → [the same schema as input query]

Caches and indexes in memory the input query. Its output has exactly the same schema as the input query.

Examples:

>>> sql("select * from (cache select * from table1) order by a desc,b asc")
a  | b
------
10 | 1
7  | 8
5  | 6
3  | 4
1  | 2

>>> sql("select * from (cache select * from table1) where a=b")

>>> table2('''
... 5   1
... 5   2
... 5   3
... 2   3
... 2   2
... 2   1
... ''')

>>> sql("select * from (cache select * from table2) where b>=2 order by a desc,b desc")
a | b
-----
5 | 3
5 | 2
2 | 3
2 | 2

>>> sql("select * from (cache select * from table2) where 3>=a and 3<=b")
a | b
-----
2 | 3

clipboard function

clipboard()

Returns the contents of the system’s clipboard. If the clipboard’s contents are guessed to be a table, then it automatically splits the contents in its output.

h:if the ‘h’ option is provided to clipboard() function, the first row of the clipboard’s data is regarded as the schema of the data.
Returned table schema:
 Column names start from C1… , all column types are text

Examples:

>>> sql("select * from clipboard()")
C1   | C2                    | C3          | C4
------------------------------------------------------
1    | Zimbabwe              | 304.30      | 2009 est.
2    | Japan                 | 192.10      | 2009 est.
3    | Saint Kitts and Nevis | 185.00      | 2009 est.

clipout function

clipout([h:0, ]query:None)

Writes to clipboard the output of query. The clipboard contents will be tab delimited.

header option:

if an ‘h’ or h:1 option is found then it also exports the schema of the query.

Returned table schema:
 
  • return_value int
    Boolean value 1 indicating success. On failure an exception is thrown.

Examples:

>>> sql("clipout select 5,6")
return_value
------------
1

coltypes function

coltypes(query:None)

Returns the input query results column names and types.

Returned table schema:
 
  • column text
    Column name of input query schema
  • type text
    Type of column

Examples:

>>> sql("coltypes select 5 as vt")
column | type
-------------
vt     | None

Applying coltypes in the result of virtual table func:typing function in the same query

>>> sql("coltypes typing 'vt:int' select 5 as vt")
column | type
-------------
vt     | int

dirfiles function

dirfiles([rec:1, ]directory_name) → path_filename, filename

Returns the files name in a given directory. With the option ‘rec:1’ it returns the files under the provided directory and all its subdirectories.

This function is very usefull when used with the execprogram function to execute an external command for every filename.

Note

Dirfiles does not follow links.

Returned table schema:
 Column C1 is the full filename (path/filename) Column C2 is filename

Examples:

>>> sql("select c2 from dirfiles('.') where c2 like 'f%.py'")    
c2
-------
file.py
flow.py
>>> sql("select c2 from dirfiles('rec:1','.') where c2 like 'c%.py'")    
c2
------------
coltypes.py
clipout.py
cache.py
continue.py
clipboard.py

examplevt function

examplevt(arguments)

A minimal example of a virtual table. Returns all the arguments passed to it.

Returned table schema:
 Column names start from C1… , all column types are text

Examples:

>>> sql("select * from examplevt(1, '2', 'var3')")    
varname          | value
-------------------------------------------------------------
parsedargs       | (u'1', u'2', u'var3')
envar:tablename  | vt_773987998
envar:modulename | examplevt
...
envar:dbname     | temp
>>> sql("select * from (examplevt 'var1' 'var2' v1:test select 5)")    
varname          | value
--------------------------------------------------------------------
parsedargs       | (u'query:select 5', u'var1', u'var2', u'v1:test')
envar:tablename  | vt_1975870853
envar:modulename | examplevt
...
envar:dbname     | temp

exec function

exec(query:None[, path:None, variables])

Executes the input query. Gets the first column of the returned result and executes its rows content supposing it is an sql statement.

Path parameter sets the current working directory while executing the statements.

Variables are named parameters that set variables in execution environment. For example c:v named parameter sets the variable c in the new environment, initialized with current variable’s v value.

Returned table schema:
 
  • return_value int
    Boolean value 1 indicating success of the SQL statements flow execution. On failure an exception is thrown.

Examples:

This query executes the statements in quotes and returns successfully

>>> sql("exec select 'select 5'")
return_value
------------
1

Typical usage.

>>> sql("file 'testing/testtable.sql'")
C1
----------------------------------------
create table table1 (a,b,c);
insert into table1 values('James',10,2);
insert into table1 values('Mark',7,3);
insert into table1 values('Lila',74,1);

>>> sql("exec file 'testing/testtable.sql'")
return_value
------------
1

>>> sql("select * from table1")
a     | b  | c
--------------
James | 10 | 2
Mark  | 7  | 3
Lila  | 74 | 1

Nesting flows. Usage of path and variables parameters.

>>> sql("file 'testing/topflow.sql'")
C1
-----------------------------------------------------------------------------
var 'v' 5;
var 'tablename' 'internaltable';
var 'lastdate' from select '2008-01-01';
create table topflowvars as select * from getvars() where variable!='execdb';
exec 'tablename' 'c:v'  file 'internalflow.sql';

>>> sql("file 'testing/internalflow.sql'")
C1
------------------------------------------------------------------------------
create table %{tablename} as select * from getvars() where variable!='execdb';

>>> sql("select * from variables() where variable!='execdb'")
variable | value
----------------
flowname |

>>> sql("exec 'path:testing' file 'testing/topflow.sql'")
return_value
------------
1
>>> sql("select * from topflowvars")
variable  | value
-------------------------
flowname  | notset
lastdate  | 2008-01-01
tablename | internaltable
v         | 5

>>> sql("select * from internaltable")
variable  | value
-------------------------
flowname  | notset
c         | 5
tablename | internaltable

>>> sql("select * from variables() where variable!='execdb'")
variable | value
----------------
flowname |

Test files:

expand function

expand(query:None)

Executes the input query and returns the result expanding any multiset values returned. The returned result is produced iteratively.

Returned table schema:
 Same as input query schema expanded with multiset functions column naming. When as renaming function is used at a multiset function, if the multiset function returns only one column it is named according to the as value, else a positive integer (1,2…n) is appended to the column name indicating column index in the multiset function result.

Examples:

>>> table1('''
... James   10      2
... Mark    7       3
... Lila    74      1
... ''')
>>> sql("select ontop(1,c,a,b) from table1")
top1 | top2
-----------
Mark | 7
>>> sql("select ontop(1,c,b,c) as prefs from table1")
prefs1 | prefs2
---------------
7      | 3
>>> sql("select ontop(1,c,a) as nameontop from table1")
nameontop
---------
Mark

The explicit invocation of expand function won't affect the output since it is already automatically invoked because of the multiset function ontop.

>>> sql("expand expand select ontop(2,b,a) from table1")
top1
-----
Lila
James

file function

file(location[, formatting options])

Opens and returns a file or url as a table. The file’s format is defined through named options. location is defined through a URL, or a regular filename, can be given also as the named parameter url or file. If no named parameters are given the returned table has one column with each line of resource as a row or it assumes the dialect from the file ending (Files ending in .tsv, .csv, .json are assumed to be in the corresponding dialect).

Returned table schema:
 Columns are automatically named as C1, C2… or if header is set, columns are named by the resource first line value, and have the type text

Formatting options:

fast:

Default is 0 (false). Fast option speeds up the parsing of lines into values, exchanging accuracy for speed. It uses the delimiter option to split lines.

strict:
  • strict:1 (default), if a failure occurs, the current transaction will be cancelled and an error will be returned.
  • strict:0 , returns all data that succesfully parses.
  • strict:-1 , returns all input lines in which the parser finds a problem. In essence this works as a negative parser.

If no strict option is defined in fast:1 mode, then no strictness checking is applied at all, and an “Unknown error” will be returned if a problem occurs.

encoding:

A standar encoding name. (List of encodings)

compression:

t/f

Default is f (False)

compressiontype:
 

zip/gzip

Default is zip

Formatting options for CSV file types:

dialect:

tsv/csv/json

Formats field as tab/comma separated values with minimal quoting. JSON dialect uses a line oriented JSON based format.

File extensions that are recognised as a dialect (.tsv, .csv, .json) take precedence over a specified dialect parameter.

header:

t/f

Set the column names of the returned table

delimiter:

A string used to separate fields. It defaults to ‘,’

doublequote:

t/f

Controls how instances of quotechar appearing inside a field should be themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. It defaults to True. On output, if doublequote is False and no escapechar is set, Error is raised if a quotechar is found in a field

escapechar:

A one-character string used by the writer to escape the delimiter if quoting is set to QUOTE_NONE and the quotechar if doublequote is False. On reading, the escapechar removes any special meaning from the following character. It defaults to None, which disables escaping

lineterminator:

The string used to terminate lines produced by the writer. It defaults to ‘\r\n’

quotechar:

A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. It defaults to ‘”’.

quoting:

Controls when quotes should be generated by the writer and recognised by the reader. It can take on any of the QUOTE_* constants and defaults to QUOTE_MINIMAL. Possible values are QUOTE_ALL, QUOTE_NONE, QUOTE_MINIMAL, QUOTE_NONNUMERIC

skipinitialspace:
 

t/f

When True, whitespace immediately following the delimiter is ignored. The default is False

toj:

Num

When toj is defined, columns 0-Num are returned as normal, and all columns >Num are returned as a JSON list or JSON dict, depending on if the header is enabled.

useregexfilename:
 

t/f

When true, the provided filename is treated as a regex. This means that madiS will open the first file it finds to match that regex.

Examples:

>>> sql("select * from (file file:testing/colpref.csv dialect:csv) limit 3;")
C1     | C2    | C3         | C4
--------------------------------------
userid | colid | preference | usertype
agr    |       | 6617580.0  | agr
agr    | a0037 | 2659050.0  | agr
>>> sql("select * from (file file:testing/colpref.csv dialect:csv header:t) limit 3")
userid | colid | preference | usertype
--------------------------------------
agr    |       | 6617580.0  | agr
agr    | a0037 | 2659050.0  | agr
agr    | a0086 | 634130.0   | agr
>>> sql("select * from (file file:testing/colpref.zip header:t dialect:csv compression:t) limit 3;")
userid | colid | preference | usertype
--------------------------------------
agr    |       | 6617580.0  | agr
agr    | a0037 | 2659050.0  | agr
agr    | a0086 | 634130.0   | agr
>>> sql("select * from (file 'testing/colpref.tsv' delimiter:| ) limit 3;")
C1  | C2    | C3        | C4
-----------------------------
agr |       | 6617580.0 | agr
agr | a0037 | 2659050.0 | agr
agr | a0086 | 634130.0  | agr
>>> sql("select * from (file useregexfilename:True 'testing/col*.tsv' delimiter:| ) limit 3;")
C1  | C2    | C3        | C4
-----------------------------
agr |       | 6617580.0 | agr
agr | a0037 | 2659050.0 | agr
agr | a0086 | 634130.0  | agr
>>> sql("select * from (file 'testing/colpref.tsv.gz' delimiter:| compression:t compressiontype:gzip) limit 3;")
C1  | C2    | C3        | C4
-----------------------------
agr |       | 6617580.0 | agr
agr | a0037 | 2659050.0 | agr
agr | a0086 | 634130.0  | agr
>>> sql("select * from file('http://sites.google.com/site/stats202/data/test_data.csv?attredirects=0') limit 10;")
C1
-----------------
Age,Number,Start
middle,5,10
young,2,17
old,10,6
young,2,17
old,4,15
middle,5,15
young,3,13
old,5,8
young,7,9
>>> sql("select * from file('file:testing/GeoIPCountryCSV.zip','compression:t','dialect:csv') limit 4")
C1          | C2           | C3       | C4       | C5 | C6
----------------------------------------------------------------------
2.6.190.56  | 2.6.190.63   | 33996344 | 33996351 | GB | United Kingdom
3.0.0.0     | 4.17.135.31  | 50331648 | 68257567 | US | United States
4.17.135.32 | 4.17.135.63  | 68257568 | 68257599 | CA | Canada
4.17.135.64 | 4.17.142.255 | 68257600 | 68259583 | US | United States

flow function

flow(query:None)

Translates the input query results into sql statements if possible.

Returned table schema:
 
  • query text
    A complete sql query statement with the semicolon at the end

Note

Input query results must be sql statements separated with semicolons in the first place. Using in the input query the file() operator any file with sql statements can be divided in sql query statements. Multiline comments are considered as statements.

Examples:

>>> sql("select * from (flow file 'testing/testflow.sql') limit 1") 
query
-----------------------------------------------------------------------------------------------------------------------------------------------------------
/*====== countries: table of Country ISO codes , country names ===========*/
CREATE TABLE countries (
    country2 PRIMARY KEY UNIQUE,
    country_name
);
>>> sql("select * from (flow file 'testing/colpref.csv' limit 5) ")  
Traceback (most recent call last):
    ...
OperatorError: Madis SQLError:
Operator FLOW: Incomplete statement found : userid colid pr ... 41 416900.0 agr

Test files:

fromeav function

fromeav(query) → Relational table

Transforms the query input results to a relational table from an Entity-Attribute-Value (EAV) model. Examples:

>>> table1('''
... 1    name    James
... 1    city    Chicago
... 1    job    Programmer
... 1    age    35
... 2    name    Mark
... 2    city    London
... 2    job    Pilot
... 2    age    43
... 5    name    Lila
... 5    city    'New York'
... 5    job    Teacher
... 5    age    29
... ''')
>>> sql("fromeav select * from table1 where a = 1")
rid | row_id | name  | city    | job        | age
-------------------------------------------------
0   | 1      | James | Chicago | Programmer | 35
>>> sql("fromeav select * from table1 where a in (1,5)")
rid | row_id | name  | city     | job        | age
--------------------------------------------------
0   | 1      | James | Chicago  | Programmer | 35
1   | 5      | Lila  | New York | Teacher    | 29
>>> sql("fromeav select * from table1")
rid | row_id | name  | city     | job        | age
--------------------------------------------------
0   | 1      | James | Chicago  | Programmer | 35
1   | 2      | Mark  | London   | Pilot      | 43
2   | 5      | Lila  | New York | Teacher    | 29
>>> sql("fromeav select * from table1 where b in ('city', 'job')")
rid | row_id | city     | job
------------------------------------
0   | 1      | Chicago  | Programmer
1   | 2      | London   | Pilot
2   | 5      | New York | Teacher

hidden function

hidden(query) → query results

Executes the query, without returning any of its rows.

Returned table schema:
 Same as input query schema.

Examples:

>>> table1('''
... James   10      2
... Mark    7       3
... Lila    74      1
... ''')
>>> sql("hidden select * from table1")
a | b | c
---------

>>> sql("hidden select * from table1 order by c")
a | b | c
---------

jsonpipe function

jsonpipe(query:None[, lines:t])

Executes query as a shell command and returns the standard output lines as rows of one column table. Setting lines parameter to f the command output will be returned in one table row.

Returned table schema:
 
  • output text
    Output of shell command execution

Examples:

.. doctest::
>>> sql("pipe 'ls ./testing/*.csv' ")
C1
---------------------
./testing/colpref.csv
>>> sql("pipe wc ./testing/colpref.csv")
C1
---------------------------------
 19  20 463 ./testing/colpref.csv

mysql function

mysql(host, port, user, passwd, db, query:None)

Connects to an MySQL DB and returns the results of query.

Examples:

>>> sql("select * from (mysql h:127.0.0.1 port:3306 u:root p:rootpw db:mysql select 5 as num, 'test' as text);")
num | text
-----------
5   | test

oaiget function

oaiget(url, verb, metadataPrefix, ...)

Fetches data from an OAIPMH service, using resumption tokens to fetch large datasets.

  • If no verb is provided then verb is assumed to be ‘ListRecords’.
  • If no metadataPrefix is provided then verb is assumed to be ‘ListMetadataFormats’, which will list all metadata formats.
Returned table schema:
 Column C1 as text

Examples:

>>> sql("select * from oaiget('verb:ListRecords', 'metadataPrefix:ctxo')")    
Traceback (most recent call last):
...
OperatorError: Madis SQLError:
Operator OAIGET: An OAIPMH URL should be provided
>>> sql("select * from (oaiget verb:ListRecords metadataPrefix:ctxo 'http://oaiurl' )")    
Traceback (most recent call last):
...
OperatorError: Madis SQLError:
Operator OAIGET: <urlopen error [Errno -2] Name or service not known>

oracle function

oracle(jdbc_connection_string, user, passwd, query:None)

Connects to an Oracle DB and returns the results of query.

The requirements for this to work are:

  • Install JPype Python package:

    Homepage: http://jpype.sourceforge.net/

    In Debian based systems such as Ubuntu, install python-jpype using:

    sudo apt-get install python-jpype

  • Add the Oracle JDBC JAR (ojdbc5.jar) in directory madis/lib/jdbc/.

  • Have a JVM installed, and the JAVA_HOME environment variable set correctly.

Examples:

>>> sql("select * from (oracle jdbc:oracle:thin:@//127.0.0.1:6667/xe u:system p:password select 5 as num, 'test' as text);")
num | text
-----------
5   | test

ordered function

ordered(query) → query results

Ordered virtual table returns its data as they are. The main difference with unindexed, is that it signals to the SQLite engine that the results are ordered in whatever order SQLite prefers, so a possible group by on the results will happen incrementally.

Returned table schema:
 Same as input query schema.

Examples:

>>> table1('''
... James   10      2
... Mark    7       3
... Lila    74      1
... ''')

The following query is calculated incrementally

>>> sql("select a, count(*) from (ordered select * from table1) group by a")
a     | count(*)
----------------
James | 1
Mark  | 1
Lila  | 1

>>> sql("select * from (ordered select * from table1) order by c")
a     | b  | c
--------------
James | 10 | 2
Mark  | 7  | 3
Lila  | 74 | 1

Notice that the order by does not work as it should because igroup has
fooled the SQLite engine into believing that the order of the results are
in the correct order (they aren't).

output function

output formatting_options 'filename' query

Writes in filename the output of query formatted according to formatting options.

Returned table schema:
 
  • return_value int
    Boolean value 1 indicating success. On failure an exception is thrown.

Formatting options:

mode:
  • plain Default. The columns are concatened and written together.

  • tsv Writes data in a tab separated format. TSV mode is autoselected when the filename ends in “.tsv”.

  • csv Writes data in a comma separated format. CSV mode is autoselected when the filename ends in “.csv”.

  • json Writes data in a line separated JSON format. Header is always added. JSON mode is autoselected when the filename ends in “.JSON”.

  • db Writes data in a SQLite DB. DB mode is autoselected when the filename ends in “.db”.

    • If pagesize:xxxxx option is given, set new DBs page size to parameter given, if not inherit page size from parent DB.
  • gtable In gtable mode table is formatted as a google Data Table for visualisation.

  • gjson In gjson mode table is formatted in a json format accepted by google visualisation widgets.

If mode is not csv any given csv formatting options are ignored.

append:

t/f If true the output is append in the file, ignored in compression mode.

compression:

t/f If true the output will be compressed. Default compressor type is gz.

compressiontype:
 

gz/zip Selects between the two compression types.

split:

(number) It splits the input to many db or json files. Splitting only works when writting to a db or JSON. Splitting is done by using the first column of the input and it outputs all columns except the first one. If the split argument is greater than 1 then the output will always be splitted to the defined number of files. If the split argument is 1 or lower, then the output will only contain the parts of which a key were found on the first column of the input data.

Detailed description of additional output formating options can be found in file() function description.

Examples:

>>> table1('''
... James   10      2
... Mark    7       3
... Lila    74      1
... ''')
>>> sql("select * from table1")
a     | b  | c
--------------
James | 10 | 2
Mark  | 7  | 3
Lila  | 74 | 1
>>> sql("output file:../../tests/table1.csv delimiter:# header:t select a as name , b as age, c as rank from table1")
return_value
------------
1
>>> sql("file file:../../tests/table1.csv delimiter:# header:t")
name  | age | rank
------------------
James | 10  | 2
Mark  | 7   | 3
Lila  | 74  | 1

pipe function

pipe(query:None[, lines:t])

Executes query as a shell command and returns the standard output lines as rows of one column table. Setting lines parameter to f the command output will be returned in one table row.

Returned table schema:
 
  • output text
    Output of shell command execution

Examples:

.. doctest::
>>> sql("pipe 'ls ./testing/*.csv' ")
C1
---------------------
./testing/colpref.csv
>>> sql("pipe wc ./testing/colpref.csv")
C1
---------------------------------
 19  20 463 ./testing/colpref.csv

postgres function

postgres(host, port, user, passwd, db, query:None)

Connects to an PostgreSQL DB and returns the results of query.

Examples:

>>> sql("select * from (postgres h:127.0.0.1 port:5432 u:root p:rootpw db:testdb select 5 as num, 'test' as text);")
num | text
-----------
5   | test

queryplan function

queryplan(query) → Query plan

Returns the query plan of the input query.

Examples:

>>> sql("queryplan select 5")
operation     | paramone | paramtwo | databasename | triggerorview
------------------------------------------------------------------
SQLITE_SELECT | None     | None     | None         | None

range function

range([from:0[, to:10[, step:1]]])

Returns a range of integer numbers.

Returned table schema:
 
  • value int
    Number in range.

Note

The parameters can be given both named or unnamed. In unnamed mode parameter order is from,to,step.

Named parameters:

from:Number to begin from. Default is 0
to:Number to reach. Default is 10. The to number is not returned
step:Step to augment the returned numbers. Default is 1

Examples:

>>> sql("select * from range()")
C1
--
0
1
2
3
4
5
6
7
8
9

>>> sql("select * from range('from:1','to:11')")
C1
--
1
2
3
4
5
6
7
8
9
10

>>> sql("select * from range('from:2','to:15','step:3')")
C1
--
2
5
8
11
14

>>> sql("select * from range(1,10,2)")
C1
--
1
3
5
7
9

>>> sql("select * from range(5)")
C1
--
1
2
3
4
5

rowidvt function

rowidvt(query:None)

Returns the query input result adding rowid number of the result row.

Returned table schema:
 

Same as input query schema with addition of rowid column.

  • rowid int
    Input query result rowid.

Examples:

>>> table1('''
... James   10      2
... Mark    7       3
... Lila    74      1
... ''')
>>> sql("rowidvt select * from table1")
rowid | a     | b  | c
----------------------
1     | James | 10 | 2
2     | Mark  | 7  | 3
3     | Lila  | 74 | 1
>>> sql("rowidvt select * from table1 order by c")
rowid | a     | b  | c
----------------------
1     | Lila  | 74 | 1
2     | James | 10 | 2
3     | Mark  | 7  | 3

Note the difference with rowid table column.

>>> sql("select rowid,* from table1 order by c")
rowid | a     | b  | c
----------------------
3     | Lila  | 74 | 1
1     | James | 10 | 2
2     | Mark  | 7  | 3

sample function

sample(sample_size query) → samples rows from input

Returns a random sample_size set of rows.

Returned table schema:
 Same as input query schema.

Options:

size:Sample size

Examples:

>>> table1('''
... James   10      2
... Mark    7       3
... Lila    74      1
... ''')
>>> sql("sample '10' select * from table1")
a     | b  | c
--------------
James | 10 | 2
Mark  | 7  | 3
Lila  | 74 | 1

>>> sql("sample size:1 select * from table1") 
a     | b  | c
...

>>> sql("sample size:0 select * from table1")

setschema function

setschema(query:None, schemadefinition)

Returns the result of the input query with changed schema according to schemadefinition parameter. Parameter schemadefinition is text identical to schema definition between parenthesis of a CREATE TABLE SQL statement.

Can perform renaming, typecasting and projection on some columns of the input query result.

Note

This function can be used to avoid DynamicSchemaWithEmptyResultError caused by dynamic schema virtual tables on empty query input result.

Returned table schema:
 As defined at schemadefinition parameter.

Examples:

>>> sql("setschema 'col1 int,col2 text' select 5,6")
col1 | col2
-----------
5    | 6

>>> sql("select strsplitv(q) from (select 5 as q) where q!=5")    
Traceback (most recent call last):
...
DynamicSchemaWithEmptyResultError: Madis SQLError:
Operator EXPAND: Cannot initialise dynamic schema virtual table without data

>>> sql("setschema 'a,b' select strsplitv(q) from (select 5 as q) where q!=5")
a | b
-----

>>> sql("select * from (file file:testing/colpref.csv dialect:csv header:t) limit 3")
userid | colid | preference | usertype
--------------------------------------
agr    |       | 6617580.0  | agr
agr    | a0037 | 2659050.0  | agr
agr    | a0086 | 634130.0   | agr

The query below has constraints preference column to be less than an int value , but preference is text ( outcomes from file() are text), so an empty result is produced

>>> sql("select * from (select * from (file file:testing/colpref.csv dialect:csv header:t) limit 3) where cast(preference as int) <634130")

With setschema functions preference column is casted as float.

>>> sql("select * from (setschema 'type,colid , pref float, userid' select * from (file file:testing/colpref.csv dialect:csv header:t) limit 3) where pref<634131")
type | colid | pref     | userid
--------------------------------
agr  | a0086 | 634130.0 | agr

skpredict function

skpredict filename: “mymodel” select * from t;

Loads a predictive model trained by sktrain operator from file (stored from sktrain operator) and classifies the new data provided selected from the query. It returns a table with the new predictions

>>> table('''
... 0.0   4.4   0
... 2.1   2.2   2
... -2.1   4.4   0
... 2.1   2.2   0
... 0.0   4.4   2
... -4.2   4.4   2
... -4.2   4.4   1
... -2.1   -0.0   0
... 2.1   -0.0   0
... -2.1   -2.2   0
... -4.2   -0.0   2
... --- [0|Column names ---
... [1|C1 [2|C2 [3|C3
... ''')
>>> sql("skpredict filename:SVMmodel select C1,C2 from table;")
id  |  prediction  |  prediction_probability_per_class
-------------
0   |  0           |  [ 0.4101318   0.20131647  0.38855173]
1   |  0           |  [ 0.41863251  0.20180877  0.37955871]
2   |  2           |  [ 0.27520722  0.19621797  0.52857481]
3   |  0           |  [ 0.4149133   0.20182841  0.3832583 ]
4   |  0           |  [ 0.4101318   0.20131647  0.38855173]
5   |  2           |  [ 0.90338454  0.01203995  0.08457551]
6   |  2           |  [ 0.90338454  0.01203995  0.08457551]
7   |  0           |  [ 0.27481114  0.19661277  0.52857609]
8   |  0           |  [ 0.27504844  0.19632018  0.52863138]
9   |  0           |  [ 0.27491203  0.19661313  0.52847484]
10  |  2           |  [ 0.77210661  0.12397848  0.10391491]

sktrain function

slidingwindow function

slidingwindow(window) → query results

Returns the query input results annotated with the window id as an extra column. The window parameter defines the size of the window.

Returned table schema:
 Same as input query schema.

Examples:

>>> table1('''
... James   10
... Mark    7
... Lila    74
... Jane    44
... ''')
>>> sql("slidingwindow window:2 select * from table1")
wid | a     | b
----------------
0   | James | 10
1   | James | 10
1   | Mark  | 7
2   | Mark  | 7
2   | Lila  | 74
3   | Lila  | 74
3   | Jane  | 44
>>> sql("slidingwindow window:3 select * from table1")
wid | a     | b
----------------
0   | James | 10
1   | James | 10
1   | Mark  | 7
2   | James | 10
2   | Mark  | 7
2   | Lila  | 74
3   | Mark  | 7
3   | Lila  | 74
3   | Jane  | 44

sqlite function

sqlite(dbfilename, query:None)

Connects to an SQLite DB and returns the results of query.

Examples:

>>> sql("select * from (sqlite 'testdb.db' select 5 as num, 'test' as text);")
num | text
-----------
5   | test

stdinput function

stdinput() → standard input stream

Returns the standard input stream

Returned table schema:
 One column automatically named C1.
Examples::
>>> sql("select * from stdinput()")
c1
-------------
stdinputline1
stdinputline2
stdinputline3

timeslidingwindow function

timeslidingwindow(timewindow, timecolumn) → query results

Returns the query input results annotated with the window id as an extra column. The following arguments can be passed as parameters:

timewindow: It can be a numeric value that specifies the time length of the window (in seconds).

timecolumn: It is the index of the temporal column (starting from 0) For the moment, we assume that the data is ordered by the temporal column that

the user gives as input in ascending order.

Examples:

>>> table1('''

… “12.05.2010 00:00:00” … “12.05.2010 00:01:00” … “12.05.2010 00:02:00” … “12.05.2010 00:03:00” … “12.05.2010 00:04:00” … ‘’‘) >>> sql(“timeslidingwindow timewindow:180 timecolumn:0 select * from table1”) wid | a ————————- 0 | 12.05.2010 00:00:00 0 | 12.05.2010 00:01:00 0 | 12.05.2010 00:02:00 0 | 12.05.2010 00:03:00 1 | 12.05.2010 00:01:00 1 | 12.05.2010 00:02:00 1 | 12.05.2010 00:03:00 1 | 12.05.2010 00:04:00 >>> table1(‘’’ … “12.05.2010 00:00:00” … “12.05.2010 00:01:00” … “12.05.2010 00:01:00” … “12.05.2010 00:02:00” … “12.05.2010 00:03:00” … “12.05.2010 00:04:00” … “12.05.2010 00:05:00” … ‘’‘)

… ‘’‘) >>> sql(“timeslidingwindow timewindow:120 timecolumn:0 select * from table1”) wid | a ————————- 0 | 12.05.2010 00:00:00 0 | 12.05.2010 00:01:00 0 | 12.05.2010 00:01:00 0 | 12.05.2010 00:02:00 1 | 12.05.2010 00:01:00 1 | 12.05.2010 00:01:00 1 | 12.05.2010 00:02:00 1 | 12.05.2010 00:03:00 2 | 12.05.2010 00:02:00 2 | 12.05.2010 00:03:00 2 | 12.05.2010 00:04:00 3 | 12.05.2010 00:03:00 3 | 12.05.2010 00:04:00 3 | 12.05.2010 00:05:00

>>> table2('''
... "12/05/2010 00:00:00"
... "12/05/2010 00:01:00"
... "12/05/2010 00:02:00"
... ''')

… ‘’‘) >>> sql(“timeslidingwindow timewindow:180 timecolumn:0 select * from table2”) wid | a ————————- 0 | 12/05/2010 00:00:00 0 | 12/05/2010 00:01:00 0 | 12/05/2010 00:02:00

toeav function

toeav(query) → Entity Attribute Value table

Transforms the query input results to an Entity Attribute Value model table.

Returned table schema:
 ID, Attribute, Value

Examples:

>>> table1('''
... James   10      2
... Mark    7       3
... Lila    74      1
... ''')
>>> sql("toeav select * from table1")
rid   | colname | val
---------------------
James | b       | 10
James | c       | 2
Mark  | b       | 7
Mark  | c       | 3
Lila  | b       | 74
Lila  | c       | 1

unindexed function

unindexed(query) → query results

Returns the query input results without any change. UNINDEXED can be used as a barrier for SQLite’s optimizer, for debugging etc.

Returned table schema:
 Same as input query schema.

Examples:

>>> table1('''
... James   10      2
... Mark    7       3
... Lila    74      1
... ''')
>>> sql("unindexed select * from table1")
a     | b  | c
--------------
James | 10 | 2
Mark  | 7  | 3
Lila  | 74 | 1

>>> sql("unindexed select * from table1 order by c")
a     | b  | c
--------------
Lila  | 74 | 1
James | 10 | 2
Mark  | 7  | 3

Note the difference with rowid table column.

unionalldb function

unionalldb(db_filename)

This function returns the contents of a table that has been split using OUTPUT split functionality.

Its input are DBs with names such as:

dbname.0.db dbname.1.db …

It is assumed that inside each of above DBs, a table named as dbname will exist. All of these tables should have the same schema

If a start or end argument is present then unionalldb will start scanning from the start numbered part and end scanning at end numbered db part (without including the end numbered part).

Usage examples:

select * from (unionalldb ‘dbname’);

select * from (unionalldb start:1 end:4 ‘dbname’);

variables function

variables()

Returns the defined variables with their values.

Returned table schema:
 
  • variable text
    Variable name.
  • value text
    Variable value

Examples:

>>> sql("var 'env' 'testing' ")
var('env','testing')
--------------------
testing
>>> sql("variables")
variable | value
-------------------
flowname |
execdb   | :memory:
env      | testing

webtable function

webtable(url[, tableNumber])

Returns the result of the first or the tableNumber HTML table from the url.

Returned table schema:
 Column names same as HTML table column headers. If there are no headers columns are named as C1,C2….Cn

Examples:

>>> sql("select * from webtable('http://en.wikipedia.org/wiki/List_of_countries_by_public_debt',2) order by 2 desc limit 3")
Country | Public debt as % of GDP(CIA)[1] | Date1     | Gross government debt as % of GDP(IMF)[2] | Date2     | Region
-----------------------------------------------------------------------------------------------------------------------------
Belize  | 90.8                            | 2012 est. | 81.003                                    | 2012 est. | North America
Sudan   | 89.3                            | 2012 est. | 112.15                                    | 2012 est. | Africa
France  | 89.1                            | 2012 est. | 89.97                                     | 2012 est. | Europe

whilevt function

whilevt([from:0, [to:10, step:1, ]]query)

Returns a range of integer numbers while a query’s result is true.

Returned table schema:
 
  • value int
    Number in range.
from:

Number to begin from. Default is 0

to:

Number to reach. Default is 10. The to number is not returned

step:

Step to augment the returned numbers. Default is 1

Examples:

>>> sql("select * from range()")
C1
--
0
1
2
3
4
5
6
7
8
9

>>> sql("select * from range('from:1','to:11')")
C1
--
1
2
3
4
5
6
7
8
9
10

>>> sql("select * from range('from:2','to:15','step:3')")
C1
--
2
5
8
11
14

>>> sql("select * from range(1,10,2)")
C1
--
1
3
5
7
9

>>> sql("select * from range(5)")
C1
--
1
2
3
4
5

xmlparse function

xmlparse([root:None, strict:1, namespace:False, xmlprototype, ]query:None)

Parses an input xml stream. It starts parsing when it finds a root tag. A provided XML prototype fragment is used to create an schema, mapping from xml to a relational table. If multiple values are found for the same tag in the input stream, then all values are returned separated with a tab (use tab-set operators to process them).

If no XML prototype is provided, then a jdict of the data is returned. If no root tag is provided, then the output is a raw feed of {path:data} pairs without any row aggregation. Rootless mode is usefull when trying to find what root tag to use.

Is a root tag is provided then each returned row, contains a jdict of all the paths found below the specified root tag.

XML prototype:

XML prototype may be:

  • a fragment of XML which will be matched with the input data.
  • a jpack.
  • a jdict.

If a the characters “*” or “$” are provided as a value of any of these prototypes, then a full XML subtree of a path will be returned in the resulting data.

‘namespace’ or ‘ns’ option:
 

Include namespace information in the returned jdicts.

‘fast’ option (default 0):
 

Read input data in bulk. For some XML input files (having lots of small line lengths), it can speed up XML processing by up to 30%. The downside of this option, is that when an error occurs no last line information is returned, so use this option only when you are sure that the XML input is well formed.

  • fast:0 (default), parses the input stream in a conservative line by line way
  • fast:1 ,is the same as fast:0, but it doesn’t return Last line information in the case of an error
  • fast:2 ,in this mode XMLPARSER doesn’t convert HTML entities and doesn’t skip “<?xml version=…” lines
‘strict’ option:
 
  • strict:2 ,if a failure occurs, the current transaction will be cancelled. Additionally if a tag isn’t found in the xml prototype it will be regarded as failure.
  • strict:1 (default), if a failure occurs, the current transaction will be cancelled. Undeclared tags aren’t regarded as failure.
  • strict:0 , returns all data that succesfully parses. The difference with strict 1, is that strict 0 tries to restart the xml-parsing after the failures and doesn’t fail the transaction.
  • strict:-1 , returns all input lines in which the xml parser finds a problem. In essence this works as a negative xml parser.

For strict modes 0 and -1, the fast:0 mode is enforced.

Returned table schema:
 

Column names are named according to the schema of the provided xml prototype.

Examples:
>>> table1('''
... '<a><b>row1val1</b><b>row1val1b</b><b>row1val1c</b></a>'
... '<a>'
... '<b>'
... 'row2val1</b><c><d>row2val</d></c>'
... '</a>'
... ''')
>>> sql("select * from (xmlparse select * from table1)") 
C1
-------------------
{"a/b":"row1val1"}
{"a/b":"row1val1b"}
{"a/b":"row1val1c"}
{"a/b":"row2val1"}
{"a/c/d":"row2val"}
>>> sql("select jgroupunion(jdictkeys(c1)) from (xmlparse select * from table1)") 
jgroupunion(jdictkeys(c1))
--------------------------
["a/b","a/c/d"]
>>> sql('''select * from (xmlparse '["a/b","a/c/d"]' select * from table1)''') 
b                            | c_d
--------------------------------------
row1val1        row1val1b        row1val1c |
row2val1                     | row2val
>>> sql("select * from (xmlparse '<a><b>val1</b><b>val1</b><c><d>val2</d></c></a>' select * from table1)") 
b        | b1                  | c_d
----------------------------------------
row1val1 | row1val1b        row1val1c |
row2val1 |                     | row2val
>>> sql("select * from (xmlparse root:a '<t><a><b>val1</b><c><d>val2</d></c></a></t>' select * from table1)") 
b                            | c_d
--------------------------------------
row1val1        row1val1b        row1val1c |
row2val1                     | row2val
>>> table2('''
... '<a b="attrval1"><b>row1val1</b></a>'
... '<a>'
... '<b>'
... 'row2val1</b><c>asdf<d>row2val</d></c>'
... '</a>'
... ''')
>>> sql("select * from (xmlparse '<a b=\"v\"><b>v</b><c><d>v</d></c></a>' select * from table2)")
b        | b1       | c_d
-----------------------------
attrval1 | row1val1 |
         | row2val1 | row2val
>>> sql('''select * from (xmlparse  '["a/@/b","a/b","a/c/d"]' select * from table2)''')
b        | b1       | c_d
-----------------------------
attrval1 | row1val1 |
         | row2val1 | row2val
>>> sql('''select * from (xmlparse  '{"a/b":[1,2] ,"a/c/d":1}' select * from table2)''')
b        | b1 | c_d
-----------------------
row1val1 |    |
row2val1 |    | row2val
>>> sql('''select * from (xmlparse  '{"a/b":[1,2] ,"a/c":[1,"*"]}' select * from table2)''')
b        | b1 | c    | c_$
-----------------------------------------
row1val1 |    |      |
row2val1 |    | asdf | asdf<d>row2val</d>
>>> sql('''select * from (xmlparse  '["a/b", "a/c", "a/c/*"]' select * from table2)''')
b        | c    | c_$
------------------------------------
row1val1 |      |
row2val1 | asdf | asdf<d>row2val</d>
>>> sql('''select * from (xmlparse  root:a '{"a/b":"", "a":"*"}' select * from table2)''') 
b        | $
------------------------------------------------------
row1val1 | <b>row1val1</b>
row2val1 |
<b>
row2val1</b><c>asdf<d>row2val</d></c>
>>> sql("select * from (xmlparse '<a><b>v</b><c>*</c></a>' select * from table2)")
b        | c_$
-----------------------------
row1val1 |
row2val1 | asdf<d>row2val</d>
>>> sql("select * from (xmlparse root:a select * from table2)")
C1
-------------------------------------------------
{"a/@/b":"attrval1","a/b":"row1val1"}
{"a/b":"row2val1","a/c/d":"row2val","a/c":"asdf"}
>>> table2('''
... '<a b="attrval1"><b>row1val1</b></a>'
... '<a>'
... '</b>'
... 'row2val1</b><c><d>row2val</d></c>'
... '</a>'
... ''')
>>> sql("select * from (xmlparse strict:0 '<a b=\"v\"><b>v</b><c><d>v</d></c></a>' select * from table2)")
b        | b1       | c_d
-------------------------
attrval1 | row1val1 |
>>> table3('''
... '<a><b>row1val1</b></a>'
... '<a>'
... '<b np="np">'
... 'row2val1</b><c><d>row2val</d></c>'
... '</a>'
... ''')
>>> sql("select * from (xmlparse strict:2 '<a><b>val1</b><c><d>val2</d></c></a>' select * from table3)") 
Traceback (most recent call last):
...
OperatorError: Madis SQLError:
Operator XMLPARSE: Undeclared path in XML-prototype was found in the input data. The path is:
/b/@/np
The data to insert into path was:
np
Last input line was:
<b np="np">
>>> table4('''
... '<a><b>row1val1</b></a>'
... '<a><b>row1val2</b</a>'
... '<a><b np="np">row1val1</b></a>'
... '<a><b>row1val3/b></a>'
... '<a><b>row1val4</b></a>'
... ''')
>>> sql("select * from (xmlparse strict:-1 '<a><b>val1</b><c><d>val2</d></c></a>' select * from table4)")
C1
----------------------
<a><b>row1val2</b</a>
<a><b>row1val3/b></a>
>>> table5('''
... '<a><b><a><b>row1val1</b></a></b></a>'
... '<a><b>row2val1</b></a>'
... '<a><b>row3val1</b></a>'
... '<a><b>row4val1</b><c>row4val2</c>'
... '</a>'
... ''')
>>> sql('''select * from (xmlparse '["a/b", "a/c"]' select * from table5)''')
b        | c
-------------------
row1val1 |
row2val1 |
row3val1 |
row4val1 | row4val2