.. _vtable-functions-list: Vtable functions list ================================================= .. automodule:: functions.vtable.__init__ :mod:`cache` function --------------------- .. module:: functions.vtable.cache .. function:: cache(query:None) -> [the same schema as input query] Caches and indexes in memory the input query. Its output has exactly the same schema as the input query. Examples: .. doctest:: >>> table1(''' ... 1 2 ... 3 4 ... 5 6 ... 7 8 ... 10 1 ... ''') >>> sql("select * from (cache select * from table1) order by a desc,b asc") a | b ------ 10 | 1 7 | 8 5 | 6 3 | 4 1 | 2 >>> sql("select * from (cache select * from table1) where a=b") >>> table2(''' ... 5 1 ... 5 2 ... 5 3 ... 2 3 ... 2 2 ... 2 1 ... ''') >>> sql("select * from (cache select * from table2) where b>=2 order by a desc,b desc") a | b ----- 5 | 3 5 | 2 2 | 3 2 | 2 >>> sql("select * from (cache select * from table2) where 3>=a and 3<=b") a | b ----- 2 | 3 .. seealso:: * :ref:`tutcache` :mod:`clipboard` function ------------------------- .. module:: functions.vtable.clipboard .. function:: clipboard() Returns the contents of the system's clipboard. If the clipboard's contents are guessed to be a table, then it automatically splits the contents in its output. :h: if the 'h' option is provided to *clipboard()* function, the first row of the clipboard's data is regarded as the schema of the data. :Returned table schema: Column names start from C1... , all column types are text Examples: >>> sql("select * from clipboard()") C1 | C2 | C3 | C4 ------------------------------------------------------ 1 | Zimbabwe | 304.30 | 2009 est. 2 | Japan | 192.10 | 2009 est. 3 | Saint Kitts and Nevis | 185.00 | 2009 est. :mod:`clipout` function ----------------------- .. module:: functions.vtable.clipout .. function:: clipout([h:0], query:None) Writes to clipboard the output of *query*. The clipboard contents will be tab delimited. :header option: if an 'h' or h:1 option is found then it also exports the schema of the query. :Returned table schema: - *return_value* int Boolean value 1 indicating success. On failure an exception is thrown. Examples: >>> sql("clipout select 5,6") return_value ------------ 1 :mod:`coltypes` function ------------------------ .. module:: functions.vtable.coltypes .. function:: coltypes(query:None) Returns the input query results column names and types. :Returned table schema: - *column* text Column name of input query *schema* - *type* text Type of column Examples: >>> sql("coltypes select 5 as vt") column | type ------------- vt | None Applying coltypes in the result of virtual table func:`typing` function in the same query >>> sql("coltypes typing 'vt:int' select 5 as vt") column | type ------------- vt | int .. doctest:: :hide: >>> sql("select * from (coltypes typing 'text' select '10' ) as a, (coltypes typing 'int' select '10' ) as b where a.column=b.column") column | type | column | type ----------------------------- '10' | text | '10' | int :mod:`dirfiles` function ------------------------ .. module:: functions.vtable.dirfiles .. function:: dirfiles([rec:1], directory_name) -> path_filename, filename Returns the files name in a given directory. With the option 'rec:1' it returns the files under the provided directory and all its subdirectories. This function is very usefull when used with the *execprogram* function to execute an external command for every filename. .. note:: *Dirfiles* does not follow links. :Returned table schema: Column C1 is the full filename (path/filename) Column C2 is filename Examples: >>> sql("select c2 from dirfiles('.') where c2 like 'f%.py'") # doctest:+ELLIPSIS +NORMALIZE_WHITESPACE c2 ------- file.py flow.py >>> sql("select c2 from dirfiles('rec:1','.') where c2 like 'c%.py'") # doctest:+ELLIPSIS +NORMALIZE_WHITESPACE c2 ------------ coltypes.py clipout.py cache.py continue.py clipboard.py :mod:`examplevt` function ------------------------- .. module:: functions.vtable.examplevt .. function:: examplevt(arguments) A minimal example of a virtual table. Returns all the arguments passed to it. :Returned table schema: Column names start from C1... , all column types are text Examples: >>> sql("select * from examplevt(1, '2', 'var3')") # doctest:+ELLIPSIS varname | value ------------------------------------------------------------- parsedargs | (u'1', u'2', u'var3') envar:tablename | vt_773987998 envar:modulename | examplevt ... envar:dbname | temp >>> sql("select * from (examplevt 'var1' 'var2' v1:test select 5)") # doctest:+ELLIPSIS varname | value -------------------------------------------------------------------- parsedargs | (u'query:select 5', u'var1', u'var2', u'v1:test') envar:tablename | vt_1975870853 envar:modulename | examplevt ... envar:dbname | temp :mod:`exec` function -------------------- .. module:: functions.vtable.exec .. function:: exec(query:None,[path:None,variables]) Executes the input query. Gets the first column of the returned result and executes its rows content supposing it is an sql statement. *Path* parameter sets the current working directory while executing the statements. *Variables* are named parameters that set variables in execution environment. For example *c:v* named parameter sets the variable *c* in the new environment, initialized with current variable's *v* value. :Returned table schema: - *return_value* int Boolean value 1 indicating success of the SQL statements flow execution. On failure an exception is thrown. .. toadd See also variables.. LINK , file??? Examples: .. doctest:: :hide: >>> settestdb('../../tests/temp.db3') This query executes the statements in quotes and returns successfully .. doctest:: >>> sql("exec select 'select 5'") return_value ------------ 1 Typical usage. .. doctest:: >>> sql("file 'testing/testtable.sql'") C1 ---------------------------------------- create table table1 (a,b,c); insert into table1 values('James',10,2); insert into table1 values('Mark',7,3); insert into table1 values('Lila',74,1); >>> sql("exec file 'testing/testtable.sql'") return_value ------------ 1 >>> sql("select * from table1") a | b | c -------------- James | 10 | 2 Mark | 7 | 3 Lila | 74 | 1 Nesting flows. Usage of *path* and variables parameters. .. doctest:: >>> sql("file 'testing/topflow.sql'") C1 ----------------------------------------------------------------------------- var 'v' 5; var 'tablename' 'internaltable'; var 'lastdate' from select '2008-01-01'; create table topflowvars as select * from getvars() where variable!='execdb'; exec 'tablename' 'c:v' file 'internalflow.sql'; >>> sql("file 'testing/internalflow.sql'") C1 ------------------------------------------------------------------------------ create table %{tablename} as select * from getvars() where variable!='execdb'; >>> sql("select * from variables() where variable!='execdb'") variable | value ---------------- flowname | >>> sql("exec 'path:testing' file 'testing/topflow.sql'") return_value ------------ 1 >>> sql("select * from topflowvars") variable | value ------------------------- flowname | notset lastdate | 2008-01-01 tablename | internaltable v | 5 >>> sql("select * from internaltable") variable | value ------------------------- flowname | notset c | 5 tablename | internaltable >>> sql("select * from variables() where variable!='execdb'") variable | value ---------------- flowname | .. doctest:: :hide: >>> import os >>> os.remove('../../tests/temp.db3') Test files: - :download:`testtable.sql <../../functions/vtable/testing/testtable.sql>` - :download:`topflow.sql <../../functions/vtable/testing/topflow.sql>` - :download:`internalflow.sql <../../functions/vtable/testing/internalflow.sql>` :mod:`expand` function ---------------------- .. module:: functions.vtable.expand .. function:: expand(query:None) Executes the input query and returns the result expanding any multiset values returned. The returned result is produced iteratively. :Returned table schema: Same as input query schema expanded with multiset functions column naming. When *as* renaming function is used at a multiset function, if the multiset function returns only one column it is named according to the *as* value, else a positive integer (1,2...n) is appended to the column name indicating column index in the multiset function result. Examples:: >>> table1(''' ... James 10 2 ... Mark 7 3 ... Lila 74 1 ... ''') >>> sql("select ontop(1,c,a,b) from table1") top1 | top2 ----------- Mark | 7 >>> sql("select ontop(1,c,b,c) as prefs from table1") prefs1 | prefs2 --------------- 7 | 3 >>> sql("select ontop(1,c,a) as nameontop from table1") nameontop --------- Mark The explicit invocation of expand function won't affect the output since it is already automatically invoked because of the multiset function ontop. >>> sql("expand expand select ontop(2,b,a) from table1") top1 ----- Lila James .. doctest:: :hide: >>> table2(''' ... Fibi 40 ... Monika 5 ... Soula 17 ... ''') >>> sql("select * from (select ontop(1,c,a,b) from table1) as a,(select ontop(1,c,a,b) from table1) as b,(select ontop(2,b,a,b) from table2) as c where a.top2=b.top2 and a.top2>> sql("select * from (select ontop(3,c,a,b) from table1) as a,(select ontop(3,c,a,b) from table1) as b,(select ontop(2,b,a,b) from table2) as c") top1 | top2 | top1 | top2 | top1 | top2 ------------------------------------------ Mark | 7 | Mark | 7 | Fibi | 40 Mark | 7 | Mark | 7 | Soula | 17 Mark | 7 | James | 10 | Fibi | 40 Mark | 7 | James | 10 | Soula | 17 Mark | 7 | Lila | 74 | Fibi | 40 Mark | 7 | Lila | 74 | Soula | 17 James | 10 | Mark | 7 | Fibi | 40 James | 10 | Mark | 7 | Soula | 17 James | 10 | James | 10 | Fibi | 40 James | 10 | James | 10 | Soula | 17 James | 10 | Lila | 74 | Fibi | 40 James | 10 | Lila | 74 | Soula | 17 Lila | 74 | Mark | 7 | Fibi | 40 Lila | 74 | Mark | 7 | Soula | 17 Lila | 74 | James | 10 | Fibi | 40 Lila | 74 | James | 10 | Soula | 17 Lila | 74 | Lila | 74 | Fibi | 40 Lila | 74 | Lila | 74 | Soula | 17 :mod:`file` function -------------------- .. module:: functions.vtable.file .. function:: file(location, [formatting options]) Opens and returns a file or url as a table. The file's format is defined through named options. *location* is defined through a URL, or a regular filename, can be given also as the named parameter *url* or *file*. If no named parameters are given the returned table has one column with each line of resource as a row or it assumes the dialect from the file ending (Files ending in .tsv, .csv, .json are assumed to be in the corresponding dialect). :Returned table schema: Columns are automatically named as *C1, C2...* or if header is set, columns are named by the resource first line value, and have the type *text* Formatting options: :fast: Default is 0 (false). Fast option speeds up the parsing of lines into values, exchanging accuracy for speed. It uses the delimiter option to split lines. :strict: - strict:1 (default), if a failure occurs, the current transaction will be cancelled and an error will be returned. - strict:0 , returns all data that succesfully parses. - strict:-1 , returns all input lines in which the parser finds a problem. In essence this works as a negative parser. If no strict option is defined in fast:1 mode, then no strictness checking is applied at all, and an "Unknown error" will be returned if a problem occurs. :encoding: A standar encoding name. (`List of encodings `_) :compression: *t/f* Default is *f* (False) :compressiontype: *zip/gzip* Default is *zip* Formatting options for CSV file types: :dialect: *tsv/csv/json* Formats field as tab/comma separated values with minimal quoting. *JSON* dialect uses a line oriented *JSON* based format. File extensions that are recognised as a dialect (.tsv, .csv, .json) take precedence over a specified dialect parameter. :header: *t/f* Set the column names of the returned table :delimiter: A string used to separate fields. It defaults to ',' :doublequote: *t/f* Controls how instances of quotechar appearing inside a field should be themselves be quoted. When True, the character is doubled. When False, the escapechar is used as a prefix to the quotechar. It defaults to True. On output, if doublequote is False and no escapechar is set, Error is raised if a quotechar is found in a field :escapechar: A one-character string used by the writer to escape the delimiter if quoting is set to QUOTE_NONE and the quotechar if doublequote is False. On reading, the escapechar removes any special meaning from the following character. It defaults to None, which disables escaping :lineterminator: The string used to terminate lines produced by the writer. It defaults to '\\r\\n' :quotechar: A one-character string used to quote fields containing special characters, such as the delimiter or quotechar, or which contain new-line characters. It defaults to '"'. :quoting: Controls when quotes should be generated by the writer and recognised by the reader. It can take on any of the QUOTE_* constants and defaults to QUOTE_MINIMAL. Possible values are QUOTE_ALL, QUOTE_NONE, QUOTE_MINIMAL, QUOTE_NONNUMERIC :skipinitialspace: *t/f* When True, whitespace immediately following the delimiter is ignored. The default is False :toj: *Num* When toj is defined, columns 0-Num are returned as normal, and all columns >Num are returned as a JSON list or JSON dict, depending on if the *header* is enabled. :useregexfilename: *t/f* When true, the provided filename is treated as a regex. This means that madiS will open the first file it finds to match that regex. Examples:: >>> sql("select * from (file file:testing/colpref.csv dialect:csv) limit 3;") C1 | C2 | C3 | C4 -------------------------------------- userid | colid | preference | usertype agr | | 6617580.0 | agr agr | a0037 | 2659050.0 | agr >>> sql("select * from (file file:testing/colpref.csv dialect:csv header:t) limit 3") userid | colid | preference | usertype -------------------------------------- agr | | 6617580.0 | agr agr | a0037 | 2659050.0 | agr agr | a0086 | 634130.0 | agr >>> sql("select * from (file file:testing/colpref.zip header:t dialect:csv compression:t) limit 3;") userid | colid | preference | usertype -------------------------------------- agr | | 6617580.0 | agr agr | a0037 | 2659050.0 | agr agr | a0086 | 634130.0 | agr >>> sql("select * from (file 'testing/colpref.tsv' delimiter:| ) limit 3;") C1 | C2 | C3 | C4 ----------------------------- agr | | 6617580.0 | agr agr | a0037 | 2659050.0 | agr agr | a0086 | 634130.0 | agr >>> sql("select * from (file useregexfilename:True 'testing/col*.tsv' delimiter:| ) limit 3;") C1 | C2 | C3 | C4 ----------------------------- agr | | 6617580.0 | agr agr | a0037 | 2659050.0 | agr agr | a0086 | 634130.0 | agr >>> sql("select * from (file 'testing/colpref.tsv.gz' delimiter:| compression:t compressiontype:gzip) limit 3;") C1 | C2 | C3 | C4 ----------------------------- agr | | 6617580.0 | agr agr | a0037 | 2659050.0 | agr agr | a0086 | 634130.0 | agr >>> sql("select * from file('http://sites.google.com/site/stats202/data/test_data.csv?attredirects=0') limit 10;") C1 ----------------- Age,Number,Start middle,5,10 young,2,17 old,10,6 young,2,17 old,4,15 middle,5,15 young,3,13 old,5,8 young,7,9 >>> sql("select * from file('file:testing/GeoIPCountryCSV.zip','compression:t','dialect:csv') limit 4") C1 | C2 | C3 | C4 | C5 | C6 ---------------------------------------------------------------------- 2.6.190.56 | 2.6.190.63 | 33996344 | 33996351 | GB | United Kingdom 3.0.0.0 | 4.17.135.31 | 50331648 | 68257567 | US | United States 4.17.135.32 | 4.17.135.63 | 68257568 | 68257599 | CA | Canada 4.17.135.64 | 4.17.142.255 | 68257600 | 68259583 | US | United States :mod:`flow` function -------------------- .. module:: functions.vtable.flow .. function:: flow(query:None) Translates the input query results into sql statements if possible. :Returned table schema: - *query* text A complete sql query statement with the semicolon at the end .. note:: Input query results must be sql statements separated with semicolons in the first place. Using in the input query the :func:`~functions.vtable.file.file` operator any file with sql statements can be divided in sql query statements. Multiline comments are considered as statements. Examples: .. doctest:: >>> sql("select * from (flow file 'testing/testflow.sql') limit 1") # doctest: +NORMALIZE_WHITESPACE query ----------------------------------------------------------------------------------------------------------------------------------------------------------- /*====== countries: table of Country ISO codes , country names ===========*/ CREATE TABLE countries ( country2 PRIMARY KEY UNIQUE, country_name ); >>> sql("select * from (flow file 'testing/colpref.csv' limit 5) ") #doctest:+ELLIPSIS +NORMALIZE_WHITESPACE Traceback (most recent call last): ... OperatorError: Madis SQLError: Operator FLOW: Incomplete statement found : userid colid pr ... 41 416900.0 agr Test files: - :download:`testflow.sql <../../functions/vtable/testing/testflow.sql>` - :download:`colpref.csv <../../functions/vtable/testing/colpref.csv>` :mod:`fromeav` function ----------------------- .. module:: functions.vtable.fromeav .. function:: fromeav(query) -> Relational table Transforms the query input results to a relational table from an Entity-Attribute-Value (EAV) model. Examples:: >>> table1(''' ... 1 name James ... 1 city Chicago ... 1 job Programmer ... 1 age 35 ... 2 name Mark ... 2 city London ... 2 job Pilot ... 2 age 43 ... 5 name Lila ... 5 city 'New York' ... 5 job Teacher ... 5 age 29 ... ''') >>> sql("fromeav select * from table1 where a = 1") rid | row_id | name | city | job | age ------------------------------------------------- 0 | 1 | James | Chicago | Programmer | 35 >>> sql("fromeav select * from table1 where a in (1,5)") rid | row_id | name | city | job | age -------------------------------------------------- 0 | 1 | James | Chicago | Programmer | 35 1 | 5 | Lila | New York | Teacher | 29 >>> sql("fromeav select * from table1") rid | row_id | name | city | job | age -------------------------------------------------- 0 | 1 | James | Chicago | Programmer | 35 1 | 2 | Mark | London | Pilot | 43 2 | 5 | Lila | New York | Teacher | 29 >>> sql("fromeav select * from table1 where b in ('city', 'job')") rid | row_id | city | job ------------------------------------ 0 | 1 | Chicago | Programmer 1 | 2 | London | Pilot 2 | 5 | New York | Teacher :mod:`hidden` function ---------------------- .. module:: functions.vtable.hidden .. function:: hidden(query) -> query results Executes the query, without returning any of its rows. :Returned table schema: Same as input query schema. Examples:: >>> table1(''' ... James 10 2 ... Mark 7 3 ... Lila 74 1 ... ''') >>> sql("hidden select * from table1") a | b | c --------- >>> sql("hidden select * from table1 order by c") a | b | c --------- :mod:`jsonpipe` function ------------------------ .. module:: functions.vtable.jsonpipe .. function:: jsonpipe(query:None[,lines:t]) Executes *query* as a shell command and returns the standard output lines as rows of one column table. Setting *lines* parameter to *f* the command output will be returned in one table row. :Returned table schema: - *output* text Output of shell command execution Examples:: .. doctest:: >>> sql("pipe 'ls ./testing/*.csv' ") C1 --------------------- ./testing/colpref.csv >>> sql("pipe wc ./testing/colpref.csv") C1 --------------------------------- 19 20 463 ./testing/colpref.csv .. doctest:: :hide: >>> sql("pipe wc nonexistingfile") #doctest:+ELLIPSIS +NORMALIZE_WHITESPACE Traceback (most recent call last): ... OperatorError: Madis SQLError: Operator PIPE: Command 'wc nonexistingfile' failed to execute because: wc: nonexistingfile: No such file or directory :mod:`mysql` function --------------------- .. module:: functions.vtable.mysql .. function:: mysql(host, port, user, passwd, db, query:None) Connects to an MySQL DB and returns the results of query. Examples: >>> sql("select * from (mysql h:127.0.0.1 port:3306 u:root p:rootpw db:mysql select 5 as num, 'test' as text);") num | text ----------- 5 | test :mod:`oaiget` function ---------------------- .. module:: functions.vtable.oaiget .. function:: oaiget(url, verb, metadataPrefix,...) Fetches data from an OAIPMH service, using resumption tokens to fetch large datasets. - If no *verb* is provided then *verb* is assumed to be 'ListRecords'. - If no *metadataPrefix* is provided then *verb* is assumed to be 'ListMetadataFormats', which will list all metadata formats. :Returned table schema: Column C1 as text Examples: >>> sql("select * from oaiget('verb:ListRecords', 'metadataPrefix:ctxo')") # doctest:+ELLIPSIS +NORMALIZE_WHITESPACE Traceback (most recent call last): ... OperatorError: Madis SQLError: Operator OAIGET: An OAIPMH URL should be provided >>> sql("select * from (oaiget verb:ListRecords metadataPrefix:ctxo 'http://oaiurl' )") # doctest:+ELLIPSIS +NORMALIZE_WHITESPACE Traceback (most recent call last): ... OperatorError: Madis SQLError: Operator OAIGET: :mod:`oracle` function ---------------------- .. module:: functions.vtable.oracle .. function:: oracle(jdbc_connection_string, user, passwd, query:None) Connects to an Oracle DB and returns the results of query. The requirements for this to work are: - Install JPype Python package: Homepage: http://jpype.sourceforge.net/ In Debian based systems such as Ubuntu, install python-jpype using: sudo apt-get install python-jpype - Add the Oracle JDBC JAR (ojdbc5.jar) in directory madis/lib/jdbc/. - Have a JVM installed, and the JAVA_HOME environment variable set correctly. Examples: >>> sql("select * from (oracle jdbc:oracle:thin:@//127.0.0.1:6667/xe u:system p:password select 5 as num, 'test' as text);") num | text ----------- 5 | test :mod:`ordered` function ----------------------- .. module:: functions.vtable.ordered .. function:: ordered(query) -> query results Ordered virtual table returns its data as they are. The main difference with unindexed, is that it signals to the SQLite engine that the results are ordered in whatever order SQLite prefers, so a possible group by on the results will happen incrementally. :Returned table schema: Same as input query schema. Examples:: >>> table1(''' ... James 10 2 ... Mark 7 3 ... Lila 74 1 ... ''') The following query is calculated incrementally >>> sql("select a, count(*) from (ordered select * from table1) group by a") a | count(*) ---------------- James | 1 Mark | 1 Lila | 1 >>> sql("select * from (ordered select * from table1) order by c") a | b | c -------------- James | 10 | 2 Mark | 7 | 3 Lila | 74 | 1 Notice that the order by does not work as it should because igroup has fooled the SQLite engine into believing that the order of the results are in the correct order (they aren't). :mod:`output` function ---------------------- .. module:: functions.vtable.output .. function:: output formatting_options 'filename' query Writes in *filename* the output of *query* formatted according to *formatting* options. :Returned table schema: - *return_value* int Boolean value 1 indicating success. On failure an exception is thrown. Formatting options: .. toadd html In html mode table is formatted as an html table TODO ???? :mode: - plain *Default*. The columns are concatened and written together. - tsv Writes data in a tab separated format. *TSV* mode is autoselected when the filename ends in ".tsv". - csv Writes data in a comma separated format. *CSV* mode is autoselected when the filename ends in ".csv". - json Writes data in a line separated JSON format. Header is always added. *JSON* mode is autoselected when the filename ends in ".JSON". - db Writes data in a SQLite DB. *DB* mode is autoselected when the filename ends in ".db". - If pagesize:xxxxx option is given, set new DBs page size to parameter given, if not inherit page size from parent DB. - gtable In gtable mode table is formatted as a google Data Table for visualisation. - gjson In gjson mode table is formatted in a json format accepted by google visualisation widgets. If *mode* is not *csv* any given csv formatting options are ignored. :append: t/f If true the output is append in the file, ignored in compression mode. :compression: t/f If true the output will be compressed. Default compressor type is *gz*. :compressiontype: gz/zip Selects between the two compression types. :split: (number) It splits the input to many *db* or *json* files. Splitting only works when writting to a *db* or *JSON*. Splitting is done by using the first column of the input and it outputs all columns except the first one. If the *split* argument is greater than *1* then the output will always be splitted to the defined number of files. If the split argument is 1 or lower, then the output will only contain the parts of which a key were found on the first column of the input data. Detailed description of additional output formating options can be found in :func:`~functions.vtable.file.file` function description. Examples: >>> table1(''' ... James 10 2 ... Mark 7 3 ... Lila 74 1 ... ''') >>> sql("select * from table1") a | b | c -------------- James | 10 | 2 Mark | 7 | 3 Lila | 74 | 1 >>> sql("output file:../../tests/table1.csv delimiter:# header:t select a as name , b as age, c as rank from table1") return_value ------------ 1 >>> sql("file file:../../tests/table1.csv delimiter:# header:t") name | age | rank ------------------ James | 10 | 2 Mark | 7 | 3 Lila | 74 | 1 :mod:`pipe` function -------------------- .. module:: functions.vtable.pipe .. function:: pipe(query:None[,lines:t]) Executes *query* as a shell command and returns the standard output lines as rows of one column table. Setting *lines* parameter to *f* the command output will be returned in one table row. :Returned table schema: - *output* text Output of shell command execution Examples:: .. doctest:: >>> sql("pipe 'ls ./testing/*.csv' ") C1 --------------------- ./testing/colpref.csv >>> sql("pipe wc ./testing/colpref.csv") C1 --------------------------------- 19 20 463 ./testing/colpref.csv .. doctest:: :hide: >>> sql("pipe wc nonexistingfile") #doctest:+ELLIPSIS +NORMALIZE_WHITESPACE Traceback (most recent call last): ... OperatorError: Madis SQLError: Operator PIPE: Command 'wc nonexistingfile' failed to execute because: wc: nonexistingfile: No such file or directory :mod:`postgres` function ------------------------ .. module:: functions.vtable.postgres .. function:: postgres(host, port, user, passwd, db, query:None) Connects to an PostgreSQL DB and returns the results of query. Examples: >>> sql("select * from (postgres h:127.0.0.1 port:5432 u:root p:rootpw db:testdb select 5 as num, 'test' as text);") num | text ----------- 5 | test :mod:`queryplan` function ------------------------- .. module:: functions.vtable.queryplan .. function:: queryplan(query) -> Query plan Returns the query plan of the input query. Examples:: >>> sql("queryplan select 5") operation | paramone | paramtwo | databasename | triggerorview ------------------------------------------------------------------ SQLITE_SELECT | None | None | None | None :mod:`range` function --------------------- .. module:: functions.vtable.range .. function:: range([from:0,[to:10,[step:1]]]) Returns a range of integer numbers. :Returned table schema: - *value* int Number in range. .. note:: The parameters can be given both named or unnamed. In unnamed mode parameter order is from,to,step. Named parameters: :from: Number to begin from. Default is 0 :to: Number to reach. Default is 10. The *to* number is not returned :step: Step to augment the returned numbers. Default is 1 Examples:: >>> sql("select * from range()") C1 -- 0 1 2 3 4 5 6 7 8 9 >>> sql("select * from range('from:1','to:11')") C1 -- 1 2 3 4 5 6 7 8 9 10 >>> sql("select * from range('from:2','to:15','step:3')") C1 -- 2 5 8 11 14 >>> sql("select * from range(1,10,2)") C1 -- 1 3 5 7 9 >>> sql("select * from range(5)") C1 -- 1 2 3 4 5 :mod:`rowidvt` function ----------------------- .. module:: functions.vtable.rowidvt .. function:: rowidvt(query:None) Returns the query input result adding rowid number of the result row. :Returned table schema: Same as input query schema with addition of rowid column. - *rowid* int Input *query* result rowid. Examples:: >>> table1(''' ... James 10 2 ... Mark 7 3 ... Lila 74 1 ... ''') >>> sql("rowidvt select * from table1") rowid | a | b | c ---------------------- 1 | James | 10 | 2 2 | Mark | 7 | 3 3 | Lila | 74 | 1 >>> sql("rowidvt select * from table1 order by c") rowid | a | b | c ---------------------- 1 | Lila | 74 | 1 2 | James | 10 | 2 3 | Mark | 7 | 3 Note the difference with rowid table column. >>> sql("select rowid,* from table1 order by c") rowid | a | b | c ---------------------- 3 | Lila | 74 | 1 1 | James | 10 | 2 2 | Mark | 7 | 3 :mod:`sample` function ---------------------- .. module:: functions.vtable.sample .. function:: sample(sample_size query) -> samples rows from input Returns a random sample_size set of rows. :Returned table schema: Same as input query schema. Options: :size: Sample size Examples:: >>> table1(''' ... James 10 2 ... Mark 7 3 ... Lila 74 1 ... ''') >>> sql("sample '10' select * from table1") a | b | c -------------- James | 10 | 2 Mark | 7 | 3 Lila | 74 | 1 >>> sql("sample size:1 select * from table1") # doctest:+ELLIPSIS +NORMALIZE_WHITESPACE a | b | c ... >>> sql("sample size:0 select * from table1") :mod:`setschema` function ------------------------- .. module:: functions.vtable.setschema .. function:: setschema(query:None,schemadefinition) Returns the result of the input *query* with changed schema according to *schemadefinition* parameter. Parameter *schemadefinition* is text identical to schema definition between parenthesis of a CREATE TABLE SQL statement. Can perform renaming, typecasting and projection on some columns of the input *query* result. .. note:: This function can be used to avoid DynamicSchemaWithEmptyResultError caused by dynamic schema virtual tables on empty query input result. .. toadd link. :Returned table schema: As defined at *schemadefinition* parameter. Examples:: >>> sql("setschema 'col1 int,col2 text' select 5,6") col1 | col2 ----------- 5 | 6 >>> sql("select strsplitv(q) from (select 5 as q) where q!=5") #doctest:+ELLIPSIS +NORMALIZE_WHITESPACE Traceback (most recent call last): ... DynamicSchemaWithEmptyResultError: Madis SQLError: Operator EXPAND: Cannot initialise dynamic schema virtual table without data >>> sql("setschema 'a,b' select strsplitv(q) from (select 5 as q) where q!=5") a | b ----- >>> sql("select * from (file file:testing/colpref.csv dialect:csv header:t) limit 3") userid | colid | preference | usertype -------------------------------------- agr | | 6617580.0 | agr agr | a0037 | 2659050.0 | agr agr | a0086 | 634130.0 | agr The query below has constraints preference column to be less than an int value , but preference is text ( outcomes from :func:`~functions.vtable.file.file` are *text*), so an empty result is produced >>> sql("select * from (select * from (file file:testing/colpref.csv dialect:csv header:t) limit 3) where cast(preference as int) <634130") With setschema functions preference column is casted as float. >>> sql("select * from (setschema 'type,colid , pref float, userid' select * from (file file:testing/colpref.csv dialect:csv header:t) limit 3) where pref<634131") type | colid | pref | userid -------------------------------- agr | a0086 | 634130.0 | agr :mod:`skpredict` function ------------------------- .. module:: functions.vtable.skpredict .. function: skpredict(args,query:None) skpredict filename: "mymodel" select * from t; Loads a predictive model trained by sktrain operator from file (stored from sktrain operator) and classifies the new data provided selected from the query. It returns a table with the new predictions >>> table(''' ... 0.0 4.4 0 ... 2.1 2.2 2 ... -2.1 4.4 0 ... 2.1 2.2 0 ... 0.0 4.4 2 ... -4.2 4.4 2 ... -4.2 4.4 1 ... -2.1 -0.0 0 ... 2.1 -0.0 0 ... -2.1 -2.2 0 ... -4.2 -0.0 2 ... --- [0|Column names --- ... [1|C1 [2|C2 [3|C3 ... ''') >>> sql("skpredict filename:SVMmodel select C1,C2 from table;") id | prediction | prediction_probability_per_class ------------- 0 | 0 | [ 0.4101318 0.20131647 0.38855173] 1 | 0 | [ 0.41863251 0.20180877 0.37955871] 2 | 2 | [ 0.27520722 0.19621797 0.52857481] 3 | 0 | [ 0.4149133 0.20182841 0.3832583 ] 4 | 0 | [ 0.4101318 0.20131647 0.38855173] 5 | 2 | [ 0.90338454 0.01203995 0.08457551] 6 | 2 | [ 0.90338454 0.01203995 0.08457551] 7 | 0 | [ 0.27481114 0.19661277 0.52857609] 8 | 0 | [ 0.27504844 0.19632018 0.52863138] 9 | 0 | [ 0.27491203 0.19661313 0.52847484] 10 | 2 | [ 0.77210661 0.12397848 0.10391491] :mod:`sktrain` function ----------------------- .. module:: functions.vtable.sktrain .. function: sktrain(args,query:None) :NOTE: The operator requires the following packages: 'numpy', 'scipy', 'sklearn' numpy & scipy: https://www.scipy.org/scipylib/download.html sklearn: http://scikit-learn.org/stable/install.html Fits data from specific database relations into cross-validated predictive models. A supervised algorithm initialized by initstr is trained on the selected data and returns its predictions for each sample (either for Regression or Classification problems). The algorithm implements the validation step via cross-validation and extra parameters for the training can be provided as well. The model is also stored in disk for future use. (see skpredict operator) Returns: a table schema with the model's classification (predicted labels). In case user inserts the initstr parameter "probability=True", the table consists of two more columns: the probability of each prediction and one list with the probabilities for each sample to belong to each class (useful for evaluation metrics, ie: ROC curves). Parameters: :initstr(with optional parameters): Initialization string (from scikit-learn api, ie: DecisionTreeClassifier(max_depth=3) :classname: The Column name for the response variable we want to classify/predict :cv: k for k-fold cross validation Examples: >>> table(''' ... 0.0 4.4 0 ... 2.1 2.2 2 ... -2.1 4.4 0 ... 2.1 2.2 0 ... 0.0 4.4 2 ... -4.2 4.4 2 ... -4.2 4.4 1 ... -2.1 -0.0 0 ... 2.1 -0.0 0 ... -2.1 -2.2 0 ... -4.2 -0.0 2 ... --- [0|Column names --- ... [1|C1 [2|C2 [3|C3 ... ''') >>> sql("sktrain filename:SVMmodel initstr:SVC(kernel='linear') classname:C3 cv:10 select * from table;") sktrain filename:SVMmodel initstr:SVC(kernel='linear') cv:10 select * from table; //10-fold cross-validation id | predicted_label | prediction_probability | probs_per_class ------------------------------------------------------------------------------------------------------------------ 0 | 2 | 0.0 | [0.0, 0.0, 0.0] 1 | 0 | 0.0 | [0.0, 0.0, 0.0] 2 | 2 | 0.0 | [0.0, 0.0, 0.0] 3 | 0 | 0.410210360487 | [0.41021036048685278, 0.14907264577206564, 0.44071699374108164] 4 | 0 | 0.548051122534 | [0.54805112253403776, 0.14785556444024275, 0.30409331302571929] 5 | 1 | 0.193336225736 | [0.38875643772373958, 0.19333622573639794, 0.4179073365398624] 6 | 2 | 0.0 | [0.0, 0.0, 0.0] 7 | 0 | 0.416031694023 | [0.41603169402299173, 0.18204494673933225, 0.40192335923767586] 8 | 0 | 0.448463699747 | [0.44846369974736427, 0.1393806568854721, 0.41215564336716359] 9 | 2 | 0.216144116096 | [0.61342034424348868, 0.17043553966069536, 0.21614411609581588] 10 | 0 | 0.52171544466 | [0.52171544465978703, 0.20100090883455271, 0.27728364650566051] NOTE about cross-validation on classification/regression tasks: For integer/None inputs, if the estimator is a classifier and y is either binary or multiclass, StratifiedKFold() is used. Otherwise (like regression tasks), KFold() is used. Stratified cross-validation: Each set contains approximately the same percentage of samples of each target class as the complete set. Thus, it is ensured that relative class frequencies is approximately preserved in each train and validation fold. ------------------------------ :mod:`slidingwindow` function ----------------------------- .. module:: functions.vtable.slidingwindow .. function:: slidingwindow(window) -> query results Returns the query input results annotated with the window id as an extra column. The window parameter defines the size of the window. :Returned table schema: Same as input query schema. Examples:: >>> table1(''' ... James 10 ... Mark 7 ... Lila 74 ... Jane 44 ... ''') >>> sql("slidingwindow window:2 select * from table1") wid | a | b ---------------- 0 | James | 10 1 | James | 10 1 | Mark | 7 2 | Mark | 7 2 | Lila | 74 3 | Lila | 74 3 | Jane | 44 >>> sql("slidingwindow window:3 select * from table1") wid | a | b ---------------- 0 | James | 10 1 | James | 10 1 | Mark | 7 2 | James | 10 2 | Mark | 7 2 | Lila | 74 3 | Mark | 7 3 | Lila | 74 3 | Jane | 44 :mod:`sqlite` function ---------------------- .. module:: functions.vtable.sqlite .. function:: sqlite(dbfilename, query:None) Connects to an SQLite DB and returns the results of query. Examples: >>> sql("select * from (sqlite 'testdb.db' select 5 as num, 'test' as text);") num | text ----------- 5 | test :mod:`stdinput` function ------------------------ .. module:: functions.vtable.stdinput .. function:: stdinput() -> standard input stream Returns the standard input stream :Returned table schema: One column automatically named C1. Examples:: >>> sql("select * from stdinput()") c1 ------------- stdinputline1 stdinputline2 stdinputline3 :mod:`timeslidingwindow` function --------------------------------- .. module:: functions.vtable.timeslidingwindow .. function:: timeslidingwindow(timewindow, timecolumn) -> query results Returns the query input results annotated with the window id as an extra column. The following arguments can be passed as parameters: timewindow: It can be a numeric value that specifies the time length of the window (in seconds). timecolumn: It is the index of the temporal column (starting from 0) For the moment, we assume that the data is ordered by the temporal column that the user gives as input in ascending order. Examples:: >>> table1(''' ... "12.05.2010 00:00:00" ... "12.05.2010 00:01:00" ... "12.05.2010 00:02:00" ... "12.05.2010 00:03:00" ... "12.05.2010 00:04:00" ... ''') >>> sql("timeslidingwindow timewindow:180 timecolumn:0 select * from table1") wid | a ------------------------- 0 | 12.05.2010 00:00:00 0 | 12.05.2010 00:01:00 0 | 12.05.2010 00:02:00 0 | 12.05.2010 00:03:00 1 | 12.05.2010 00:01:00 1 | 12.05.2010 00:02:00 1 | 12.05.2010 00:03:00 1 | 12.05.2010 00:04:00 >>> table1(''' ... "12.05.2010 00:00:00" ... "12.05.2010 00:01:00" ... "12.05.2010 00:01:00" ... "12.05.2010 00:02:00" ... "12.05.2010 00:03:00" ... "12.05.2010 00:04:00" ... "12.05.2010 00:05:00" ... ''') ... ''') >>> sql("timeslidingwindow timewindow:120 timecolumn:0 select * from table1") wid | a ------------------------- 0 | 12.05.2010 00:00:00 0 | 12.05.2010 00:01:00 0 | 12.05.2010 00:01:00 0 | 12.05.2010 00:02:00 1 | 12.05.2010 00:01:00 1 | 12.05.2010 00:01:00 1 | 12.05.2010 00:02:00 1 | 12.05.2010 00:03:00 2 | 12.05.2010 00:02:00 2 | 12.05.2010 00:03:00 2 | 12.05.2010 00:04:00 3 | 12.05.2010 00:03:00 3 | 12.05.2010 00:04:00 3 | 12.05.2010 00:05:00 >>> table2(''' ... "12/05/2010 00:00:00" ... "12/05/2010 00:01:00" ... "12/05/2010 00:02:00" ... ''') ... ''') >>> sql("timeslidingwindow timewindow:180 timecolumn:0 select * from table2") wid | a ------------------------- 0 | 12/05/2010 00:00:00 0 | 12/05/2010 00:01:00 0 | 12/05/2010 00:02:00 :mod:`toeav` function --------------------- .. module:: functions.vtable.toeav .. function:: toeav(query) -> Entity Attribute Value table Transforms the query input results to an Entity Attribute Value model table. :Returned table schema: ID, Attribute, Value Examples:: >>> table1(''' ... James 10 2 ... Mark 7 3 ... Lila 74 1 ... ''') >>> sql("toeav select * from table1") rid | colname | val --------------------- James | b | 10 James | c | 2 Mark | b | 7 Mark | c | 3 Lila | b | 74 Lila | c | 1 :mod:`unindexed` function ------------------------- .. module:: functions.vtable.unindexed .. function:: unindexed(query) -> query results Returns the query input results without any change. UNINDEXED can be used as a barrier for SQLite's optimizer, for debugging etc. :Returned table schema: Same as input query schema. Examples:: >>> table1(''' ... James 10 2 ... Mark 7 3 ... Lila 74 1 ... ''') >>> sql("unindexed select * from table1") a | b | c -------------- James | 10 | 2 Mark | 7 | 3 Lila | 74 | 1 >>> sql("unindexed select * from table1 order by c") a | b | c -------------- Lila | 74 | 1 James | 10 | 2 Mark | 7 | 3 Note the difference with rowid table column. :mod:`unionalldb` function -------------------------- .. module:: functions.vtable.unionalldb .. function:: unionalldb(db_filename) This function returns the contents of a table that has been split using OUTPUT split functionality. Its input are DBs with names such as: dbname.0.db dbname.1.db ... It is assumed that inside each of above DBs, a table named as *dbname* will exist. All of these tables should have the same schema If a *start* or *end* argument is present then *unionalldb* will start scanning from the *start* numbered part and end scanning at *end* numbered db part (without including the *end* numbered part). Usage examples: select * from (unionalldb 'dbname'); select * from (unionalldb start:1 end:4 'dbname'); :mod:`variables` function ------------------------- .. module:: functions.vtable.variables .. function:: variables() Returns the defined variables with their values. :Returned table schema: - *variable* text Variable name. - *value* text Variable value .. toadd See also variables.. Examples: >>> sql("var 'env' 'testing' ") var('env','testing') -------------------- testing >>> sql("variables") variable | value ------------------- flowname | execdb | :memory: env | testing :mod:`webtable` function ------------------------ .. module:: functions.vtable.webtable .. function:: webtable(url[,tableNumber]) Returns the result of the first or the *tableNumber* HTML table from the *url*. :Returned table schema: Column names same as HTML table column headers. If there are no headers columns are named as C1,C2....Cn Examples: >>> sql("select * from webtable('http://en.wikipedia.org/wiki/List_of_countries_by_public_debt',2) order by 2 desc limit 3") Country | Public debt as % of GDP(CIA)[1] | Date1 | Gross government debt as % of GDP(IMF)[2] | Date2 | Region ----------------------------------------------------------------------------------------------------------------------------- Belize | 90.8 | 2012 est. | 81.003 | 2012 est. | North America Sudan | 89.3 | 2012 est. | 112.15 | 2012 est. | Africa France | 89.1 | 2012 est. | 89.97 | 2012 est. | Europe :mod:`whilevt` function ----------------------- .. module:: functions.vtable.whilevt .. function:: whilevt([from:0,[to:10, step:1]], query) Returns a range of integer numbers while a query's result is true. :Returned table schema: - *value* int Number in range. :from: Number to begin from. Default is 0 :to: Number to reach. Default is 10. The *to* number is not returned :step: Step to augment the returned numbers. Default is 1 Examples:: >>> sql("select * from range()") C1 -- 0 1 2 3 4 5 6 7 8 9 >>> sql("select * from range('from:1','to:11')") C1 -- 1 2 3 4 5 6 7 8 9 10 >>> sql("select * from range('from:2','to:15','step:3')") C1 -- 2 5 8 11 14 >>> sql("select * from range(1,10,2)") C1 -- 1 3 5 7 9 >>> sql("select * from range(5)") C1 -- 1 2 3 4 5 :mod:`xmlparse` function ------------------------ .. module:: functions.vtable.xmlparse .. function:: xmlparse([root:None, strict:1, namespace:False, xmlprototype], query:None) Parses an input xml stream. It starts parsing when it finds a root tag. A provided XML prototype fragment is used to create an schema, mapping from xml to a relational table. If multiple values are found for the same tag in the input stream, then all values are returned separated with a tab (use tab-set operators to process them). If no XML prototype is provided, then a jdict of the data is returned. If no *root* tag is provided, then the output is a raw feed of {path:data} pairs without any row aggregation. Rootless mode is usefull when trying to find what *root* tag to use. Is a *root* tag is provided then each returned row, contains a jdict of all the paths found below the specified *root* tag. :XML prototype: XML prototype may be: - a fragment of XML which will be matched with the input data. - a jpack. - a jdict. If a the characters **"*"** or **"$"** are provided as a value of any of these prototypes, then a full XML subtree of a path will be returned in the resulting data. :'namespace' or 'ns' option: Include namespace information in the returned jdicts. :'fast' option (default 0): Read input data in bulk. For some XML input files (having lots of small line lengths), it can speed up XML processing by up to 30%. The downside of this option, is that when an error occurs no last line information is returned, so use this option only when you are sure that the XML input is well formed. - fast:0 (default), parses the input stream in a conservative line by line way - fast:1 ,is the same as fast:0, but it doesn't return *Last line* information in the case of an error - fast:2 ,in this mode XMLPARSER doesn't convert HTML entities and doesn't skip ">> table1(''' ... 'row1val1row1val1brow1val1c' ... '' ... '' ... 'row2val1row2val' ... '' ... ''') >>> sql("select * from (xmlparse select * from table1)") # doctest: +NORMALIZE_WHITESPACE C1 ------------------- {"a/b":"row1val1"} {"a/b":"row1val1b"} {"a/b":"row1val1c"} {"a/b":"row2val1"} {"a/c/d":"row2val"} >>> sql("select jgroupunion(jdictkeys(c1)) from (xmlparse select * from table1)") # doctest: +NORMALIZE_WHITESPACE jgroupunion(jdictkeys(c1)) -------------------------- ["a/b","a/c/d"] >>> sql('''select * from (xmlparse '["a/b","a/c/d"]' select * from table1)''') # doctest: +NORMALIZE_WHITESPACE b | c_d -------------------------------------- row1val1 row1val1b row1val1c | row2val1 | row2val >>> sql("select * from (xmlparse 'val1val1val2' select * from table1)") # doctest: +NORMALIZE_WHITESPACE b | b1 | c_d ---------------------------------------- row1val1 | row1val1b row1val1c | row2val1 | | row2val >>> sql("select * from (xmlparse root:a 'val1val2' select * from table1)") # doctest: +NORMALIZE_WHITESPACE b | c_d -------------------------------------- row1val1 row1val1b row1val1c | row2val1 | row2val >>> table2(''' ... 'row1val1' ... '' ... '' ... 'row2val1asdfrow2val' ... '' ... ''') >>> sql("select * from (xmlparse 'vv' select * from table2)") b | b1 | c_d ----------------------------- attrval1 | row1val1 | | row2val1 | row2val >>> sql('''select * from (xmlparse '["a/@/b","a/b","a/c/d"]' select * from table2)''') b | b1 | c_d ----------------------------- attrval1 | row1val1 | | row2val1 | row2val >>> sql('''select * from (xmlparse '{"a/b":[1,2] ,"a/c/d":1}' select * from table2)''') b | b1 | c_d ----------------------- row1val1 | | row2val1 | | row2val >>> sql('''select * from (xmlparse '{"a/b":[1,2] ,"a/c":[1,"*"]}' select * from table2)''') b | b1 | c | c_$ ----------------------------------------- row1val1 | | | row2val1 | | asdf | asdfrow2val >>> sql('''select * from (xmlparse '["a/b", "a/c", "a/c/*"]' select * from table2)''') b | c | c_$ ------------------------------------ row1val1 | | row2val1 | asdf | asdfrow2val >>> sql('''select * from (xmlparse root:a '{"a/b":"", "a":"*"}' select * from table2)''') # doctest: +NORMALIZE_WHITESPACE b | $ ------------------------------------------------------ row1val1 | row1val1 row2val1 | row2val1asdfrow2val >>> sql("select * from (xmlparse 'v*' select * from table2)") b | c_$ ----------------------------- row1val1 | row2val1 | asdfrow2val >>> sql("select * from (xmlparse root:a select * from table2)") C1 ------------------------------------------------- {"a/@/b":"attrval1","a/b":"row1val1"} {"a/b":"row2val1","a/c/d":"row2val","a/c":"asdf"} >>> table2(''' ... 'row1val1' ... '' ... '' ... 'row2val1row2val' ... '' ... ''') >>> sql("select * from (xmlparse strict:0 'vv' select * from table2)") b | b1 | c_d ------------------------- attrval1 | row1val1 | >>> table3(''' ... 'row1val1' ... '' ... '' ... 'row2val1row2val' ... '' ... ''') >>> sql("select * from (xmlparse strict:2 'val1val2' select * from table3)") #doctest:+ELLIPSIS +NORMALIZE_WHITESPACE Traceback (most recent call last): ... OperatorError: Madis SQLError: Operator XMLPARSE: Undeclared path in XML-prototype was found in the input data. The path is: /b/@/np The data to insert into path was: np Last input line was: >>> table4(''' ... 'row1val1' ... 'row1val2' ... 'row1val1' ... 'row1val3/b>' ... 'row1val4' ... ''') >>> sql("select * from (xmlparse strict:-1 'val1val2' select * from table4)") C1 ---------------------- row1val2 row1val3/b> >>> table5(''' ... 'row1val1' ... 'row2val1' ... 'row3val1' ... 'row4val1row4val2' ... '' ... ''') >>> sql('''select * from (xmlparse '["a/b", "a/c"]' select * from table5)''') b | c ------------------- row1val1 | row2val1 | row3val1 | row4val1 | row4val2