mirror of
https://github.com/valitydev/redash.git
synced 2024-11-08 01:48:54 +00:00
95 lines
3.3 KiB
ReStructuredText
95 lines
3.3 KiB
ReStructuredText
Query Execution Model
|
|
#####################
|
|
|
|
Introduction
|
|
============
|
|
|
|
The first datasource which was used with re:dash was Redshift. Because
|
|
we had billions of records in Redshift, and some queries were costly to
|
|
re-run, from the get go there was the idea of caching query results in
|
|
re:dash.
|
|
|
|
This was to relieve stress from the Redshift cluster and also to improve
|
|
user experience.
|
|
|
|
How queries get executed and cached in re:dash?
|
|
===============================================
|
|
|
|
Server
|
|
------
|
|
|
|
To make sure each query is executed only once at any giving time, we
|
|
translate the query to a ``query hash``, using the following code:
|
|
|
|
.. code:: python
|
|
|
|
COMMENTS_REGEX = re.compile("/\*.*?\*/")
|
|
|
|
def gen_query_hash(sql):
|
|
sql = COMMENTS_REGEX.sub("", sql)
|
|
sql = "".join(sql.split()).lower()
|
|
return hashlib.md5(sql.encode('utf-8')).hexdigest()
|
|
|
|
When query execution is done, the result gets stored to
|
|
``query_results`` table. Also we check for all queries in the
|
|
``queries`` table that have the same query hash and update their
|
|
reference to the query result we just saved
|
|
(`code <https://github.com/EverythingMe/redash/blob/master/redash/models.py#L235>`__).
|
|
|
|
Client
|
|
------
|
|
|
|
The client (UI) will execute queries in two scenarios:
|
|
|
|
1. (automatically) When opening a query page of a query that doesn't
|
|
have a result yet.
|
|
2. (manually) When the user clicks on "Execute".
|
|
|
|
In each case the client does a POST request to ``/api/query_results``
|
|
with the following parameters: ``query`` (the query text),
|
|
``data_source_id`` (data source to execute the query with) and ``ttl``.
|
|
|
|
When loading a cached result, ``ttl`` will be the one set to the query
|
|
(if it was set). This is a relic from previous versions, and I'm not
|
|
sure if it's really used anymore, as usually we will fetch query result
|
|
using its id.
|
|
|
|
When loading a non cached result, ``ttl`` will be 0 which will "force"
|
|
the server to execute the query.
|
|
|
|
As a response to ``/api/query_results`` the server will send either the
|
|
query results (in case of a cached query) or job id of the currently
|
|
executing query. When job id received the client will start polling on
|
|
this id, until a query result received (this is encapsulated in
|
|
``Query`` and ``QueryResult`` services).
|
|
|
|
Ideas on how to implement query parameters
|
|
==========================================
|
|
|
|
Client side only implementation
|
|
-------------------------------
|
|
|
|
(This was actually implemented in. See pull request `#363 <https://github.com/EverythingMe/redash/pull/363>`__ for details.)
|
|
|
|
The basic idea of how to implement parametized queries is to treat the
|
|
query as a template and merge it with parameters taken from query string
|
|
or UI (or both).
|
|
|
|
When the caching facility isn't required (with queries that return in a
|
|
reasonable time frame) the implementation can be completly client side
|
|
and the backend can be "blind" to the parameters - it just receives the
|
|
final query to execute and returns result.
|
|
|
|
As one improvement over this, we can let the UI/user specify the TTL
|
|
value when making the request to ``/api/query_results``, in which case
|
|
caching will be availble too, while not having to make the server aware
|
|
of the parameters.
|
|
|
|
Hybrid
|
|
------
|
|
|
|
Another option, will be to store the list of possible parameters for a
|
|
query, with their default/optional values. In such case, the server can
|
|
prefetch all the options and cache them to provide faster results to the
|
|
client.
|