Vtable functions list¶
cache
function¶
-
cache
(query:None) → [the same schema as input query]¶
Caches and indexes in memory the input query. Its output has exactly the same schema as the input query.
Examples:
>>> sql("select * from (cache select * from table1) order by a desc,b asc")
a | b
------
10 | 1
7 | 8
5 | 6
3 | 4
1 | 2
>>> sql("select * from (cache select * from table1) where a=b")
>>> table2('''
... 5 1
... 5 2
... 5 3
... 2 3
... 2 2
... 2 1
... ''')
>>> sql("select * from (cache select * from table2) where b>=2 order by a desc,b desc")
a | b
-----
5 | 3
5 | 2
2 | 3
2 | 2
>>> sql("select * from (cache select * from table2) where 3>=a and 3<=b")
a | b
-----
2 | 3
clipboard
function¶
-
clipboard
()¶
Returns the contents of the system’s clipboard. If the clipboard’s contents are guessed to be a table, then it automatically splits the contents in its output.
h: | if the ‘h’ option is provided to clipboard() function, the first row of the clipboard’s data is regarded as the schema of the data. |
---|---|
Returned table schema: | |
Column names start from C1… , all column types are text |
Examples:
>>> sql("select * from clipboard()")
C1 | C2 | C3 | C4
------------------------------------------------------
1 | Zimbabwe | 304.30 | 2009 est.
2 | Japan | 192.10 | 2009 est.
3 | Saint Kitts and Nevis | 185.00 | 2009 est.
clipout
function¶
-
clipout
([h:0, ]query:None)¶
Writes to clipboard the output of query. The clipboard contents will be tab delimited.
header option: | if an ‘h’ or h:1 option is found then it also exports the schema of the query. |
---|---|
Returned table schema: | |
|
Examples:
>>> sql("clipout select 5,6")
return_value
------------
1
coltypes
function¶
-
coltypes
(query:None)¶
Returns the input query results column names and types.
Returned table schema: | |
---|---|
|
Examples:
>>> sql("coltypes select 5 as vt")
column | type
-------------
vt | None
Applying coltypes in the result of virtual table func:typing function in the same query
>>> sql("coltypes typing 'vt:int' select 5 as vt")
column | type
-------------
vt | int
dirfiles
function¶
-
dirfiles
([rec:1, ]directory_name) → path_filename, filename¶
Returns the files name in a given directory. With the option ‘rec:1’ it returns the files under the provided directory and all its subdirectories.
This function is very usefull when used with the execprogram function to execute an external command for every filename.
Note
Dirfiles does not follow links.
Returned table schema: | |
---|---|
Column C1 is the full filename (path/filename) Column C2 is filename |
Examples:
>>> sql("select c2 from dirfiles('.') where c2 like 'f%.py'")
c2
-------
file.py
flow.py
>>> sql("select c2 from dirfiles('rec:1','.') where c2 like 'c%.py'")
c2
------------
coltypes.py
clipout.py
cache.py
continue.py
clipboard.py
examplevt
function¶
-
examplevt
(arguments)¶
A minimal example of a virtual table. Returns all the arguments passed to it.
Returned table schema: | |
---|---|
Column names start from C1… , all column types are text |
Examples:
>>> sql("select * from examplevt(1, '2', 'var3')")
varname | value
-------------------------------------------------------------
parsedargs | (u'1', u'2', u'var3')
envar:tablename | vt_773987998
envar:modulename | examplevt
...
envar:dbname | temp
>>> sql("select * from (examplevt 'var1' 'var2' v1:test select 5)")
varname | value
--------------------------------------------------------------------
parsedargs | (u'query:select 5', u'var1', u'var2', u'v1:test')
envar:tablename | vt_1975870853
envar:modulename | examplevt
...
envar:dbname | temp
exec
function¶
-
exec
(query:None[, path:None, variables])¶
Executes the input query. Gets the first column of the returned result and executes its rows content supposing it is an sql statement.
Path parameter sets the current working directory while executing the statements.
Variables are named parameters that set variables in execution environment. For example c:v named parameter sets the variable c in the new environment, initialized with current variable’s v value.
Returned table schema: | |
---|---|
|
Examples:
This query executes the statements in quotes and returns successfully
>>> sql("exec select 'select 5'")
return_value
------------
1
Typical usage.
>>> sql("file 'testing/testtable.sql'")
C1
----------------------------------------
create table table1 (a,b,c);
insert into table1 values('James',10,2);
insert into table1 values('Mark',7,3);
insert into table1 values('Lila',74,1);
>>> sql("exec file 'testing/testtable.sql'")
return_value
------------
1
>>> sql("select * from table1")
a | b | c
--------------
James | 10 | 2
Mark | 7 | 3
Lila | 74 | 1
Nesting flows. Usage of path and variables parameters.
>>> sql("file 'testing/topflow.sql'")
C1
-----------------------------------------------------------------------------
var 'v' 5;
var 'tablename' 'internaltable';
var 'lastdate' from select '2008-01-01';
create table topflowvars as select * from getvars() where variable!='execdb';
exec 'tablename' 'c:v' file 'internalflow.sql';
>>> sql("file 'testing/internalflow.sql'")
C1
------------------------------------------------------------------------------
create table %{tablename} as select * from getvars() where variable!='execdb';
>>> sql("select * from variables() where variable!='execdb'")
variable | value
----------------
flowname |
>>> sql("exec 'path:testing' file 'testing/topflow.sql'")
return_value
------------
1
>>> sql("select * from topflowvars")
variable | value
-------------------------
flowname | notset
lastdate | 2008-01-01
tablename | internaltable
v | 5
>>> sql("select * from internaltable")
variable | value
-------------------------
flowname | notset
c | 5
tablename | internaltable
>>> sql("select * from variables() where variable!='execdb'")
variable | value
----------------
flowname |
Test files:
expand
function¶
-
expand
(query:None)¶
Executes the input query and returns the result expanding any multiset values returned. The returned result is produced iteratively.
Returned table schema: | |
---|---|
Same as input query schema expanded with multiset functions column naming. When as renaming function is used at a multiset function, if the multiset function returns only one column it is named according to the as value, else a positive integer (1,2…n) is appended to the column name indicating column index in the multiset function result. |
Examples:
>>> table1('''
... James 10 2
... Mark 7 3
... Lila 74 1
... ''')
>>> sql("select ontop(1,c,a,b) from table1")
top1 | top2
-----------
Mark | 7
>>> sql("select ontop(1,c,b,c) as prefs from table1")
prefs1 | prefs2
---------------
7 | 3
>>> sql("select ontop(1,c,a) as nameontop from table1")
nameontop
---------
Mark
The explicit invocation of expand function won't affect the output since it is already automatically invoked because of the multiset function ontop.
>>> sql("expand expand select ontop(2,b,a) from table1")
top1
-----
Lila
James
file
function¶
-
file
(location[, formatting options])¶
Opens and returns a file or url as a table. The file’s format is defined through named options. location is defined through a URL, or a regular filename, can be given also as the named parameter url or file. If no named parameters are given the returned table has one column with each line of resource as a row or it assumes the dialect from the file ending (Files ending in .tsv, .csv, .json are assumed to be in the corresponding dialect).
Returned table schema: | |
---|---|
Columns are automatically named as C1, C2… or if header is set, columns are named by the resource first line value, and have the type text |
Formatting options:
fast: | Default is 0 (false). Fast option speeds up the parsing of lines into values, exchanging accuracy for speed. It uses the delimiter option to split lines. |
---|---|
strict: |
If no strict option is defined in fast:1 mode, then no strictness checking is applied at all, and an “Unknown error” will be returned if a problem occurs. |
encoding: | A standar encoding name. (List of encodings) |
compression: | t/f Default is f (False) |
compressiontype: | |
zip/gzip Default is zip |
Formatting options for CSV file types:
dialect: | tsv/csv/json Formats field as tab/comma separated values with minimal quoting. JSON dialect uses a line oriented JSON based format. File extensions that are recognised as a dialect (.tsv, .csv, .json) take precedence over a specified dialect parameter. |
---|---|
header: | t/f Set the column names of the returned table |
delimiter: | A string used to separate fields. It defaults to ‘,’ |
doublequote: | t/f Controls how instances of quotechar appearing inside a field should be themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. It defaults to True. On output, if doublequote is False and no escapechar is set, Error is raised if a quotechar is found in a field |
escapechar: | A one-character string used by the writer to escape the delimiter if quoting is set to QUOTE_NONE and the quotechar if doublequote is False. On reading, the escapechar removes any special meaning from the following character. It defaults to None, which disables escaping |
lineterminator: | The string used to terminate lines produced by the writer. It defaults to ‘\r\n’ |
quotechar: | A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. It defaults to ‘”’. |
quoting: | Controls when quotes should be generated by the writer and recognised by the reader. It can take on any of the QUOTE_* constants and defaults to QUOTE_MINIMAL. Possible values are QUOTE_ALL, QUOTE_NONE, QUOTE_MINIMAL, QUOTE_NONNUMERIC |
skipinitialspace: | |
t/f When True, whitespace immediately following the delimiter is ignored. The default is False |
|
toj: | Num When toj is defined, columns 0-Num are returned as normal, and all columns >Num are returned as a JSON list or JSON dict, depending on if the header is enabled. |
useregexfilename: | |
t/f When true, the provided filename is treated as a regex. This means that madiS will open the first file it finds to match that regex. |
Examples:
>>> sql("select * from (file file:testing/colpref.csv dialect:csv) limit 3;")
C1 | C2 | C3 | C4
--------------------------------------
userid | colid | preference | usertype
agr | | 6617580.0 | agr
agr | a0037 | 2659050.0 | agr
>>> sql("select * from (file file:testing/colpref.csv dialect:csv header:t) limit 3")
userid | colid | preference | usertype
--------------------------------------
agr | | 6617580.0 | agr
agr | a0037 | 2659050.0 | agr
agr | a0086 | 634130.0 | agr
>>> sql("select * from (file file:testing/colpref.zip header:t dialect:csv compression:t) limit 3;")
userid | colid | preference | usertype
--------------------------------------
agr | | 6617580.0 | agr
agr | a0037 | 2659050.0 | agr
agr | a0086 | 634130.0 | agr
>>> sql("select * from (file 'testing/colpref.tsv' delimiter:| ) limit 3;")
C1 | C2 | C3 | C4
-----------------------------
agr | | 6617580.0 | agr
agr | a0037 | 2659050.0 | agr
agr | a0086 | 634130.0 | agr
>>> sql("select * from (file useregexfilename:True 'testing/col*.tsv' delimiter:| ) limit 3;")
C1 | C2 | C3 | C4
-----------------------------
agr | | 6617580.0 | agr
agr | a0037 | 2659050.0 | agr
agr | a0086 | 634130.0 | agr
>>> sql("select * from (file 'testing/colpref.tsv.gz' delimiter:| compression:t compressiontype:gzip) limit 3;")
C1 | C2 | C3 | C4
-----------------------------
agr | | 6617580.0 | agr
agr | a0037 | 2659050.0 | agr
agr | a0086 | 634130.0 | agr
>>> sql("select * from file('http://sites.google.com/site/stats202/data/test_data.csv?attredirects=0') limit 10;")
C1
-----------------
Age,Number,Start
middle,5,10
young,2,17
old,10,6
young,2,17
old,4,15
middle,5,15
young,3,13
old,5,8
young,7,9
>>> sql("select * from file('file:testing/GeoIPCountryCSV.zip','compression:t','dialect:csv') limit 4")
C1 | C2 | C3 | C4 | C5 | C6
----------------------------------------------------------------------
2.6.190.56 | 2.6.190.63 | 33996344 | 33996351 | GB | United Kingdom
3.0.0.0 | 4.17.135.31 | 50331648 | 68257567 | US | United States
4.17.135.32 | 4.17.135.63 | 68257568 | 68257599 | CA | Canada
4.17.135.64 | 4.17.142.255 | 68257600 | 68259583 | US | United States
flow
function¶
-
flow
(query:None)¶
Translates the input query results into sql statements if possible.
Returned table schema: | |
---|---|
|
Note
Input query results must be sql statements separated with semicolons in the first place. Using in the input query the file()
operator any file with sql statements can be divided in sql query statements. Multiline comments are considered as statements.
Examples:
>>> sql("select * from (flow file 'testing/testflow.sql') limit 1")
query
-----------------------------------------------------------------------------------------------------------------------------------------------------------
/*====== countries: table of Country ISO codes , country names ===========*/
CREATE TABLE countries (
country2 PRIMARY KEY UNIQUE,
country_name
);
>>> sql("select * from (flow file 'testing/colpref.csv' limit 5) ")
Traceback (most recent call last):
...
OperatorError: Madis SQLError:
Operator FLOW: Incomplete statement found : userid colid pr ... 41 416900.0 agr
Test files:
fromeav
function¶
-
fromeav
(query) → Relational table¶
Transforms the query input results to a relational table from an Entity-Attribute-Value (EAV) model. Examples:
>>> table1('''
... 1 name James
... 1 city Chicago
... 1 job Programmer
... 1 age 35
... 2 name Mark
... 2 city London
... 2 job Pilot
... 2 age 43
... 5 name Lila
... 5 city 'New York'
... 5 job Teacher
... 5 age 29
... ''')
>>> sql("fromeav select * from table1 where a = 1")
rid | row_id | name | city | job | age
-------------------------------------------------
0 | 1 | James | Chicago | Programmer | 35
>>> sql("fromeav select * from table1 where a in (1,5)")
rid | row_id | name | city | job | age
--------------------------------------------------
0 | 1 | James | Chicago | Programmer | 35
1 | 5 | Lila | New York | Teacher | 29
>>> sql("fromeav select * from table1")
rid | row_id | name | city | job | age
--------------------------------------------------
0 | 1 | James | Chicago | Programmer | 35
1 | 2 | Mark | London | Pilot | 43
2 | 5 | Lila | New York | Teacher | 29
>>> sql("fromeav select * from table1 where b in ('city', 'job')")
rid | row_id | city | job
------------------------------------
0 | 1 | Chicago | Programmer
1 | 2 | London | Pilot
2 | 5 | New York | Teacher
jsonpipe
function¶
-
jsonpipe
(query:None[, lines:t])¶
Executes query as a shell command and returns the standard output lines as rows of one column table. Setting lines parameter to f the command output will be returned in one table row.
Returned table schema: | |
---|---|
|
Examples:
.. doctest::
>>> sql("pipe 'ls ./testing/*.csv' ")
C1
---------------------
./testing/colpref.csv
>>> sql("pipe wc ./testing/colpref.csv")
C1
---------------------------------
19 20 463 ./testing/colpref.csv
mysql
function¶
-
mysql
(host, port, user, passwd, db, query:None)¶
Connects to an MySQL DB and returns the results of query.
Examples:
>>> sql("select * from (mysql h:127.0.0.1 port:3306 u:root p:rootpw db:mysql select 5 as num, 'test' as text);")
num | text
-----------
5 | test
oaiget
function¶
-
oaiget
(url, verb, metadataPrefix, ...)¶
Fetches data from an OAIPMH service, using resumption tokens to fetch large datasets.
- If no verb is provided then verb is assumed to be ‘ListRecords’.
- If no metadataPrefix is provided then verb is assumed to be ‘ListMetadataFormats’, which will list all metadata formats.
Returned table schema: | |
---|---|
Column C1 as text |
Examples:
>>> sql("select * from oaiget('verb:ListRecords', 'metadataPrefix:ctxo')")
Traceback (most recent call last):
...
OperatorError: Madis SQLError:
Operator OAIGET: An OAIPMH URL should be provided
>>> sql("select * from (oaiget verb:ListRecords metadataPrefix:ctxo 'http://oaiurl' )")
Traceback (most recent call last):
...
OperatorError: Madis SQLError:
Operator OAIGET: <urlopen error [Errno -2] Name or service not known>
oracle
function¶
-
oracle
(jdbc_connection_string, user, passwd, query:None)¶
Connects to an Oracle DB and returns the results of query.
The requirements for this to work are:
Install JPype Python package:
Homepage: http://jpype.sourceforge.net/
In Debian based systems such as Ubuntu, install python-jpype using:
sudo apt-get install python-jpype
Add the Oracle JDBC JAR (ojdbc5.jar) in directory madis/lib/jdbc/.
Have a JVM installed, and the JAVA_HOME environment variable set correctly.
Examples:
>>> sql("select * from (oracle jdbc:oracle:thin:@//127.0.0.1:6667/xe u:system p:password select 5 as num, 'test' as text);")
num | text
-----------
5 | test
ordered
function¶
-
ordered
(query) → query results¶
Ordered virtual table returns its data as they are. The main difference with unindexed, is that it signals to the SQLite engine that the results are ordered in whatever order SQLite prefers, so a possible group by on the results will happen incrementally.
Returned table schema: | |
---|---|
Same as input query schema. |
Examples:
>>> table1('''
... James 10 2
... Mark 7 3
... Lila 74 1
... ''')
The following query is calculated incrementally
>>> sql("select a, count(*) from (ordered select * from table1) group by a")
a | count(*)
----------------
James | 1
Mark | 1
Lila | 1
>>> sql("select * from (ordered select * from table1) order by c")
a | b | c
--------------
James | 10 | 2
Mark | 7 | 3
Lila | 74 | 1
Notice that the order by does not work as it should because igroup has
fooled the SQLite engine into believing that the order of the results are
in the correct order (they aren't).
output
function¶
-
output formatting_options 'filename' query
Writes in filename the output of query formatted according to formatting options.
Returned table schema: | |
---|---|
|
Formatting options:
mode: |
If mode is not csv any given csv formatting options are ignored. |
---|---|
append: | t/f If true the output is append in the file, ignored in compression mode. |
compression: | t/f If true the output will be compressed. Default compressor type is gz. |
compressiontype: | |
gz/zip Selects between the two compression types. |
|
split: | (number) It splits the input to many db or json files. Splitting only works when writting to a db or JSON. Splitting is done by using the first column of the input and it outputs all columns except the first one. If the split argument is greater than 1 then the output will always be splitted to the defined number of files. If the split argument is 1 or lower, then the output will only contain the parts of which a key were found on the first column of the input data. |
Detailed description of additional output formating options can be found in file()
function description.
Examples:
>>> table1('''
... James 10 2
... Mark 7 3
... Lila 74 1
... ''')
>>> sql("select * from table1")
a | b | c
--------------
James | 10 | 2
Mark | 7 | 3
Lila | 74 | 1
>>> sql("output file:../../tests/table1.csv delimiter:# header:t select a as name , b as age, c as rank from table1")
return_value
------------
1
>>> sql("file file:../../tests/table1.csv delimiter:# header:t")
name | age | rank
------------------
James | 10 | 2
Mark | 7 | 3
Lila | 74 | 1
pipe
function¶
-
pipe
(query:None[, lines:t])¶
Executes query as a shell command and returns the standard output lines as rows of one column table. Setting lines parameter to f the command output will be returned in one table row.
Returned table schema: | |
---|---|
|
Examples:
.. doctest::
>>> sql("pipe 'ls ./testing/*.csv' ")
C1
---------------------
./testing/colpref.csv
>>> sql("pipe wc ./testing/colpref.csv")
C1
---------------------------------
19 20 463 ./testing/colpref.csv
postgres
function¶
-
postgres
(host, port, user, passwd, db, query:None)¶
Connects to an PostgreSQL DB and returns the results of query.
Examples:
>>> sql("select * from (postgres h:127.0.0.1 port:5432 u:root p:rootpw db:testdb select 5 as num, 'test' as text);")
num | text
-----------
5 | test
queryplan
function¶
-
queryplan
(query) → Query plan¶
Returns the query plan of the input query.
Examples:
>>> sql("queryplan select 5")
operation | paramone | paramtwo | databasename | triggerorview
------------------------------------------------------------------
SQLITE_SELECT | None | None | None | None
range
function¶
-
range
([from:0[, to:10[, step:1]]])¶
Returns a range of integer numbers.
Returned table schema: | |
---|---|
|
Note
The parameters can be given both named or unnamed. In unnamed mode parameter order is from,to,step.
Named parameters:
from: | Number to begin from. Default is 0 |
---|---|
to: | Number to reach. Default is 10. The to number is not returned |
step: | Step to augment the returned numbers. Default is 1 |
Examples:
>>> sql("select * from range()")
C1
--
0
1
2
3
4
5
6
7
8
9
>>> sql("select * from range('from:1','to:11')")
C1
--
1
2
3
4
5
6
7
8
9
10
>>> sql("select * from range('from:2','to:15','step:3')")
C1
--
2
5
8
11
14
>>> sql("select * from range(1,10,2)")
C1
--
1
3
5
7
9
>>> sql("select * from range(5)")
C1
--
1
2
3
4
5
rowidvt
function¶
-
rowidvt
(query:None)¶
Returns the query input result adding rowid number of the result row.
Returned table schema: | |
---|---|
Same as input query schema with addition of rowid column.
|
Examples:
>>> table1('''
... James 10 2
... Mark 7 3
... Lila 74 1
... ''')
>>> sql("rowidvt select * from table1")
rowid | a | b | c
----------------------
1 | James | 10 | 2
2 | Mark | 7 | 3
3 | Lila | 74 | 1
>>> sql("rowidvt select * from table1 order by c")
rowid | a | b | c
----------------------
1 | Lila | 74 | 1
2 | James | 10 | 2
3 | Mark | 7 | 3
Note the difference with rowid table column.
>>> sql("select rowid,* from table1 order by c")
rowid | a | b | c
----------------------
3 | Lila | 74 | 1
1 | James | 10 | 2
2 | Mark | 7 | 3
sample
function¶
-
sample
(sample_size query) → samples rows from input¶
Returns a random sample_size set of rows.
Returned table schema: | |
---|---|
Same as input query schema. |
Options:
size: | Sample size |
---|
Examples:
>>> table1('''
... James 10 2
... Mark 7 3
... Lila 74 1
... ''')
>>> sql("sample '10' select * from table1")
a | b | c
--------------
James | 10 | 2
Mark | 7 | 3
Lila | 74 | 1
>>> sql("sample size:1 select * from table1")
a | b | c
...
>>> sql("sample size:0 select * from table1")
setschema
function¶
-
setschema
(query:None, schemadefinition)¶ Returns the result of the input query with changed schema according to schemadefinition parameter. Parameter schemadefinition is text identical to schema definition between parenthesis of a CREATE TABLE SQL statement.
Can perform renaming, typecasting and projection on some columns of the input query result.
Note
This function can be used to avoid DynamicSchemaWithEmptyResultError caused by dynamic schema virtual tables on empty query input result.
Returned table schema: | |
---|---|
As defined at schemadefinition parameter. |
Examples:
>>> sql("setschema 'col1 int,col2 text' select 5,6")
col1 | col2
-----------
5 | 6
>>> sql("select strsplitv(q) from (select 5 as q) where q!=5")
Traceback (most recent call last):
...
DynamicSchemaWithEmptyResultError: Madis SQLError:
Operator EXPAND: Cannot initialise dynamic schema virtual table without data
>>> sql("setschema 'a,b' select strsplitv(q) from (select 5 as q) where q!=5")
a | b
-----
>>> sql("select * from (file file:testing/colpref.csv dialect:csv header:t) limit 3")
userid | colid | preference | usertype
--------------------------------------
agr | | 6617580.0 | agr
agr | a0037 | 2659050.0 | agr
agr | a0086 | 634130.0 | agr
The query below has constraints preference column to be less than an int value , but preference is text ( outcomes from file()
are text), so an empty result is produced
>>> sql("select * from (select * from (file file:testing/colpref.csv dialect:csv header:t) limit 3) where cast(preference as int) <634130")
With setschema functions preference column is casted as float.
>>> sql("select * from (setschema 'type,colid , pref float, userid' select * from (file file:testing/colpref.csv dialect:csv header:t) limit 3) where pref<634131")
type | colid | pref | userid
--------------------------------
agr | a0086 | 634130.0 | agr
skpredict
function¶
skpredict filename: “mymodel” select * from t;
Loads a predictive model trained by sktrain operator from file (stored from sktrain operator) and classifies the new data provided selected from the query. It returns a table with the new predictions
>>> table(''' ... 0.0 4.4 0 ... 2.1 2.2 2 ... -2.1 4.4 0 ... 2.1 2.2 0 ... 0.0 4.4 2 ... -4.2 4.4 2 ... -4.2 4.4 1 ... -2.1 -0.0 0 ... 2.1 -0.0 0 ... -2.1 -2.2 0 ... -4.2 -0.0 2 ... --- [0|Column names --- ... [1|C1 [2|C2 [3|C3 ... ''') >>> sql("skpredict filename:SVMmodel select C1,C2 from table;") id | prediction | prediction_probability_per_class ------------- 0 | 0 | [ 0.4101318 0.20131647 0.38855173] 1 | 0 | [ 0.41863251 0.20180877 0.37955871] 2 | 2 | [ 0.27520722 0.19621797 0.52857481] 3 | 0 | [ 0.4149133 0.20182841 0.3832583 ] 4 | 0 | [ 0.4101318 0.20131647 0.38855173] 5 | 2 | [ 0.90338454 0.01203995 0.08457551] 6 | 2 | [ 0.90338454 0.01203995 0.08457551] 7 | 0 | [ 0.27481114 0.19661277 0.52857609] 8 | 0 | [ 0.27504844 0.19632018 0.52863138] 9 | 0 | [ 0.27491203 0.19661313 0.52847484] 10 | 2 | [ 0.77210661 0.12397848 0.10391491]
sktrain
function¶
slidingwindow
function¶
-
slidingwindow
(window) → query results¶
Returns the query input results annotated with the window id as an extra column. The window parameter defines the size of the window.
Returned table schema: | |
---|---|
Same as input query schema. |
Examples:
>>> table1('''
... James 10
... Mark 7
... Lila 74
... Jane 44
... ''')
>>> sql("slidingwindow window:2 select * from table1")
wid | a | b
----------------
0 | James | 10
1 | James | 10
1 | Mark | 7
2 | Mark | 7
2 | Lila | 74
3 | Lila | 74
3 | Jane | 44
>>> sql("slidingwindow window:3 select * from table1")
wid | a | b
----------------
0 | James | 10
1 | James | 10
1 | Mark | 7
2 | James | 10
2 | Mark | 7
2 | Lila | 74
3 | Mark | 7
3 | Lila | 74
3 | Jane | 44
sqlite
function¶
-
sqlite
(dbfilename, query:None)¶
Connects to an SQLite DB and returns the results of query.
Examples:
>>> sql("select * from (sqlite 'testdb.db' select 5 as num, 'test' as text);")
num | text
-----------
5 | test
stdinput
function¶
-
stdinput
() → standard input stream¶
Returns the standard input stream
Returned table schema: | |
---|---|
One column automatically named C1. |
- Examples::
>>> sql("select * from stdinput()") c1 ------------- stdinputline1 stdinputline2 stdinputline3
timeslidingwindow
function¶
-
timeslidingwindow
(timewindow, timecolumn) → query results¶
Returns the query input results annotated with the window id as an extra column. The following arguments can be passed as parameters:
timewindow: It can be a numeric value that specifies the time length of the window (in seconds).
timecolumn: It is the index of the temporal column (starting from 0) For the moment, we assume that the data is ordered by the temporal column that
the user gives as input in ascending order.
Examples:
>>> table1('''… “12.05.2010 00:00:00” … “12.05.2010 00:01:00” … “12.05.2010 00:02:00” … “12.05.2010 00:03:00” … “12.05.2010 00:04:00” … ‘’‘) >>> sql(“timeslidingwindow timewindow:180 timecolumn:0 select * from table1”) wid | a ————————- 0 | 12.05.2010 00:00:00 0 | 12.05.2010 00:01:00 0 | 12.05.2010 00:02:00 0 | 12.05.2010 00:03:00 1 | 12.05.2010 00:01:00 1 | 12.05.2010 00:02:00 1 | 12.05.2010 00:03:00 1 | 12.05.2010 00:04:00 >>> table1(‘’’ … “12.05.2010 00:00:00” … “12.05.2010 00:01:00” … “12.05.2010 00:01:00” … “12.05.2010 00:02:00” … “12.05.2010 00:03:00” … “12.05.2010 00:04:00” … “12.05.2010 00:05:00” … ‘’‘)
… ‘’‘) >>> sql(“timeslidingwindow timewindow:120 timecolumn:0 select * from table1”) wid | a ————————- 0 | 12.05.2010 00:00:00 0 | 12.05.2010 00:01:00 0 | 12.05.2010 00:01:00 0 | 12.05.2010 00:02:00 1 | 12.05.2010 00:01:00 1 | 12.05.2010 00:01:00 1 | 12.05.2010 00:02:00 1 | 12.05.2010 00:03:00 2 | 12.05.2010 00:02:00 2 | 12.05.2010 00:03:00 2 | 12.05.2010 00:04:00 3 | 12.05.2010 00:03:00 3 | 12.05.2010 00:04:00 3 | 12.05.2010 00:05:00
>>> table2(''' ... "12/05/2010 00:00:00" ... "12/05/2010 00:01:00" ... "12/05/2010 00:02:00" ... ''')… ‘’‘) >>> sql(“timeslidingwindow timewindow:180 timecolumn:0 select * from table2”) wid | a ————————- 0 | 12/05/2010 00:00:00 0 | 12/05/2010 00:01:00 0 | 12/05/2010 00:02:00
toeav
function¶
-
toeav
(query) → Entity Attribute Value table¶
Transforms the query input results to an Entity Attribute Value model table.
Returned table schema: | |
---|---|
ID, Attribute, Value |
Examples:
>>> table1('''
... James 10 2
... Mark 7 3
... Lila 74 1
... ''')
>>> sql("toeav select * from table1")
rid | colname | val
---------------------
James | b | 10
James | c | 2
Mark | b | 7
Mark | c | 3
Lila | b | 74
Lila | c | 1
unindexed
function¶
-
unindexed
(query) → query results¶
Returns the query input results without any change. UNINDEXED can be used as a barrier for SQLite’s optimizer, for debugging etc.
Returned table schema: | |
---|---|
Same as input query schema. |
Examples:
>>> table1('''
... James 10 2
... Mark 7 3
... Lila 74 1
... ''')
>>> sql("unindexed select * from table1")
a | b | c
--------------
James | 10 | 2
Mark | 7 | 3
Lila | 74 | 1
>>> sql("unindexed select * from table1 order by c")
a | b | c
--------------
Lila | 74 | 1
James | 10 | 2
Mark | 7 | 3
Note the difference with rowid table column.
unionalldb
function¶
-
unionalldb
(db_filename)¶
This function returns the contents of a table that has been split using OUTPUT split functionality.
Its input are DBs with names such as:
dbname.0.db dbname.1.db …
It is assumed that inside each of above DBs, a table named as dbname will exist. All of these tables should have the same schema
If a start or end argument is present then unionalldb will start scanning from the start numbered part and end scanning at end numbered db part (without including the end numbered part).
Usage examples:
select * from (unionalldb ‘dbname’);
select * from (unionalldb start:1 end:4 ‘dbname’);
variables
function¶
-
variables
()¶
Returns the defined variables with their values.
Returned table schema: | |
---|---|
|
Examples:
>>> sql("var 'env' 'testing' ")
var('env','testing')
--------------------
testing
>>> sql("variables")
variable | value
-------------------
flowname |
execdb | :memory:
env | testing
webtable
function¶
-
webtable
(url[, tableNumber])¶ Returns the result of the first or the tableNumber HTML table from the url.
Returned table schema: | |
---|---|
Column names same as HTML table column headers. If there are no headers columns are named as C1,C2….Cn |
Examples:
>>> sql("select * from webtable('http://en.wikipedia.org/wiki/List_of_countries_by_public_debt',2) order by 2 desc limit 3")
Country | Public debt as % of GDP(CIA)[1] | Date1 | Gross government debt as % of GDP(IMF)[2] | Date2 | Region
-----------------------------------------------------------------------------------------------------------------------------
Belize | 90.8 | 2012 est. | 81.003 | 2012 est. | North America
Sudan | 89.3 | 2012 est. | 112.15 | 2012 est. | Africa
France | 89.1 | 2012 est. | 89.97 | 2012 est. | Europe
whilevt
function¶
-
whilevt
([from:0, [to:10, step:1, ]]query)¶
Returns a range of integer numbers while a query’s result is true.
Returned table schema: | |
---|---|
|
|
from: | Number to begin from. Default is 0 |
to: | Number to reach. Default is 10. The to number is not returned |
step: | Step to augment the returned numbers. Default is 1 |
Examples:
>>> sql("select * from range()")
C1
--
0
1
2
3
4
5
6
7
8
9
>>> sql("select * from range('from:1','to:11')")
C1
--
1
2
3
4
5
6
7
8
9
10
>>> sql("select * from range('from:2','to:15','step:3')")
C1
--
2
5
8
11
14
>>> sql("select * from range(1,10,2)")
C1
--
1
3
5
7
9
>>> sql("select * from range(5)")
C1
--
1
2
3
4
5
xmlparse
function¶
-
xmlparse
([root:None, strict:1, namespace:False, xmlprototype, ]query:None)¶
Parses an input xml stream. It starts parsing when it finds a root tag. A provided XML prototype fragment is used to create an schema, mapping from xml to a relational table. If multiple values are found for the same tag in the input stream, then all values are returned separated with a tab (use tab-set operators to process them).
If no XML prototype is provided, then a jdict of the data is returned. If no root tag is provided, then the output is a raw feed of {path:data} pairs without any row aggregation. Rootless mode is usefull when trying to find what root tag to use.
Is a root tag is provided then each returned row, contains a jdict of all the paths found below the specified root tag.
XML prototype: | XML prototype may be:
If a the characters “*” or “$” are provided as a value of any of these prototypes, then a full XML subtree of a path will be returned in the resulting data. |
---|---|
‘namespace’ or ‘ns’ option: | |
Include namespace information in the returned jdicts. |
|
‘fast’ option (default 0): | |
Read input data in bulk. For some XML input files (having lots of small line lengths), it can speed up XML processing by up to 30%. The downside of this option, is that when an error occurs no last line information is returned, so use this option only when you are sure that the XML input is well formed.
|
|
‘strict’ option: | |
For strict modes 0 and -1, the fast:0 mode is enforced. |
|
Returned table schema: | |
Column names are named according to the schema of the provided xml prototype. |
- Examples:
>>> table1(''' ... '<a><b>row1val1</b><b>row1val1b</b><b>row1val1c</b></a>' ... '<a>' ... '<b>' ... 'row2val1</b><c><d>row2val</d></c>' ... '</a>' ... ''')
>>> sql("select * from (xmlparse select * from table1)") C1 ------------------- {"a/b":"row1val1"} {"a/b":"row1val1b"} {"a/b":"row1val1c"} {"a/b":"row2val1"} {"a/c/d":"row2val"}
>>> sql("select jgroupunion(jdictkeys(c1)) from (xmlparse select * from table1)") jgroupunion(jdictkeys(c1)) -------------------------- ["a/b","a/c/d"]
>>> sql('''select * from (xmlparse '["a/b","a/c/d"]' select * from table1)''') b | c_d -------------------------------------- row1val1 row1val1b row1val1c | row2val1 | row2val
>>> sql("select * from (xmlparse '<a><b>val1</b><b>val1</b><c><d>val2</d></c></a>' select * from table1)") b | b1 | c_d ---------------------------------------- row1val1 | row1val1b row1val1c | row2val1 | | row2val
>>> sql("select * from (xmlparse root:a '<t><a><b>val1</b><c><d>val2</d></c></a></t>' select * from table1)") b | c_d -------------------------------------- row1val1 row1val1b row1val1c | row2val1 | row2val
>>> table2(''' ... '<a b="attrval1"><b>row1val1</b></a>' ... '<a>' ... '<b>' ... 'row2val1</b><c>asdf<d>row2val</d></c>' ... '</a>' ... ''') >>> sql("select * from (xmlparse '<a b=\"v\"><b>v</b><c><d>v</d></c></a>' select * from table2)") b | b1 | c_d ----------------------------- attrval1 | row1val1 | | row2val1 | row2val
>>> sql('''select * from (xmlparse '["a/@/b","a/b","a/c/d"]' select * from table2)''') b | b1 | c_d ----------------------------- attrval1 | row1val1 | | row2val1 | row2val
>>> sql('''select * from (xmlparse '{"a/b":[1,2] ,"a/c/d":1}' select * from table2)''') b | b1 | c_d ----------------------- row1val1 | | row2val1 | | row2val
>>> sql('''select * from (xmlparse '{"a/b":[1,2] ,"a/c":[1,"*"]}' select * from table2)''') b | b1 | c | c_$ ----------------------------------------- row1val1 | | | row2val1 | | asdf | asdf<d>row2val</d>
>>> sql('''select * from (xmlparse '["a/b", "a/c", "a/c/*"]' select * from table2)''') b | c | c_$ ------------------------------------ row1val1 | | row2val1 | asdf | asdf<d>row2val</d>
>>> sql('''select * from (xmlparse root:a '{"a/b":"", "a":"*"}' select * from table2)''') b | $ ------------------------------------------------------ row1val1 | <b>row1val1</b> row2val1 | <b> row2val1</b><c>asdf<d>row2val</d></c>
>>> sql("select * from (xmlparse '<a><b>v</b><c>*</c></a>' select * from table2)") b | c_$ ----------------------------- row1val1 | row2val1 | asdf<d>row2val</d>
>>> sql("select * from (xmlparse root:a select * from table2)") C1 ------------------------------------------------- {"a/@/b":"attrval1","a/b":"row1val1"} {"a/b":"row2val1","a/c/d":"row2val","a/c":"asdf"}
>>> table2(''' ... '<a b="attrval1"><b>row1val1</b></a>' ... '<a>' ... '</b>' ... 'row2val1</b><c><d>row2val</d></c>' ... '</a>' ... ''') >>> sql("select * from (xmlparse strict:0 '<a b=\"v\"><b>v</b><c><d>v</d></c></a>' select * from table2)") b | b1 | c_d ------------------------- attrval1 | row1val1 |
>>> table3(''' ... '<a><b>row1val1</b></a>' ... '<a>' ... '<b np="np">' ... 'row2val1</b><c><d>row2val</d></c>' ... '</a>' ... ''') >>> sql("select * from (xmlparse strict:2 '<a><b>val1</b><c><d>val2</d></c></a>' select * from table3)") Traceback (most recent call last): ... OperatorError: Madis SQLError: Operator XMLPARSE: Undeclared path in XML-prototype was found in the input data. The path is: /b/@/np The data to insert into path was: np Last input line was: <b np="np">
>>> table4(''' ... '<a><b>row1val1</b></a>' ... '<a><b>row1val2</b</a>' ... '<a><b np="np">row1val1</b></a>' ... '<a><b>row1val3/b></a>' ... '<a><b>row1val4</b></a>' ... ''') >>> sql("select * from (xmlparse strict:-1 '<a><b>val1</b><c><d>val2</d></c></a>' select * from table4)") C1 ---------------------- <a><b>row1val2</b</a> <a><b>row1val3/b></a>
>>> table5(''' ... '<a><b><a><b>row1val1</b></a></b></a>' ... '<a><b>row2val1</b></a>' ... '<a><b>row3val1</b></a>' ... '<a><b>row4val1</b><c>row4val2</c>' ... '</a>' ... ''') >>> sql('''select * from (xmlparse '["a/b", "a/c"]' select * from table5)''') b | c ------------------- row1val1 | row2val1 | row3val1 | row4val1 | row4val2