Virtual table concept in madIS¶
Virtual table can be any piece of code that returns its results as list of lists. It will be used in madSQL in the FROM part of a query.
Example:
select * from file('data.tsv');
In this case data are produced from external sources. Many madis virtual tables take a query as an input, execute it and returns its results transformed. (See Query parameter). In that case the vtable “call” could be inverted in the query. (see SQL extensions).
Example:
rowidvt select * from table1 limit 3;
Creating a virtual table function¶
Virtual table fuctions are the most advanced operator in madIS. Existing virtual table functions can be organized by increasing implementation complexity in three categories:
- Single boolean output vt functions with query input: Virtual table functions that take an input, do sth and return 1 or 0. Example table fuctions are
exec
- Iterative output: Vts that take input and returns many data (iterator of list).
Its call of the iterator retuns a list with each row data. In this category falls
file
operator, as well asexpand
,rowidvt
, etc.
- Advanced: Advanced vt’s that need the index capability of virtual tables. In this category falls only
cache
functions. These functions must implement virtual tables from scratch as described in
All virtual table functions are created in XXXX folder. Every virtual table function must be single module. Detailed documentation of virtual table API could be found in APSW documentation. In madIS there are used mostly two specific types of Virtual table functions, that helper libraries have been implemented to simplify their implementation. So implementing a virtual table function, is feasible either by following the APSW API and adding the flags for madIS, either by using the helper libraries, as explained below. To continue reading, have a look to the Virtual table function API in APSW and play with toggle function and a virtual table call. Example:
mterm> toggle tracing;
tracing is now: True
Query executed in 0 min. 0 sec 4 msec
mterm> select * from (file 'data.csv');
executetrace(u"create virtual table temp.vt_144176563 using file('data.csv','automatic_vtable:1')")
executetrace(u'select * from (SELECT * FROM vt_144176563 );',None)
...
executetrace(u'drop table temp.vt_144176563;')
Query executed in 0 min. 0 sec 14 msec
mterm> toggle vtdebug;
executetrace(u"SELECT toggle('vtdebug');",None)
vtdebug is now: True
Query executed in 0 min. 0 sec 4 msec
mterm> select * from (file 'data.csv');
....
Toggle is a row function that toggles the value of madIS global settings
Important flags for madIS VT functions¶
registered=True # set this flag to register in madis the function, if set to false the module is not loaded.
external_stream=True # Set this flag if no query parameter can be used.
Impementing single boolean output virtual table function¶
The
def function() from madis.vtout import SourceNtoOne
- class SourceNtoOne:
- def __init__(self,func,boolargs=None,nonstringargs=None,needsescape=None,notsplit=None,connectionhandler=False,retalways=False):
To create iterative VT’s the madIS vtiterable could be used that hide some of the implementation details, or it can be build from scratch as in APSW, documentation. (http://apsw.googlecode.com/svn/publish/vtable.html) If vtiterable is not used, no standard parameter parsing will be available.
- MUST BE SET TO USE VT in madis:
One VT - one file in vtable folder registered=True ( as in all operators) external_stream=True (if the operator does not accept query parameter)
Source function must be implemented!!
#Cursor Iterator
class TableCursor: # INTERFACE
def __init__(self): # when called for first time, it must probably identify the schema, if it is not dynamic
pass
def __iter__(self):
pass
def next(self):
pass
def close(self):
pass
class TableVT: #INTERFACE
def __init__(self,envdict,largs,dictargs): #DO NOT DO ANYTHING HEAVY it, will be called before querying the table
pass
def getdescription(self):
"""
must return a list of tuples (column name, columntype), if type not available leave it empty or None eg.(col,)
example:
[('column1','int'),('column2',),('column3','text')]
"""
return schemalist
def open(self):
"""
After the first time it is called getdescription function must return table instance schema
Must returns VT cursor
"""
return TableCursor()
def destroy(self): # Called when Table instance is destroyed (on drop statement), OPTIONAL
pass
-
TableVT.
__init__
(self, envdict, largs, dictargs)¶
Parameters:
envdict: Dictionary with enviromental variables (correspond to APSW parameters of VTModule.Create) :tablename: name of the table that will be created. :db: instance of the connection to the database :dbname: database name :modulename: The string name under which the module was registered??????? largs: table function list arguments dictargs: dictionary table function named arguments eg. type:int -> dictargs[‘type’]=’int’
- def Source():
- return SourceVT(TableVT,boolargs,nonstringargs,needsescape,notsplit)
- class SourceVT:
- def __init__(self,table,boolargs=None,nonstringargs=None,needsescape=None,staticschema=False,notsplit=None):
Parameters
table: Class implementing TableVT interface boolargs: List of parameter names that will be chnged to True/False if they are accordingly t/f nonstringargs: Dictionary of parameter names that will be tranformed according to their value , eg. string input to dialect object {‘dialect’:{‘line’:line(),’tsv’:tsv(),’csv’:defaultcsv()}} needsescape: List of parameter names that that include escape sequences eg. n, t etc. staticschema: True if TableVT.open does not need to be executed before getdescription can be called notsplit: List of parameter names that should not be splitted in named parameters eg. [‘http’] for do not splitting http://server.com