14.2.18. crate_anon.common.sql

crate_anon/common/sql.py


Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk).

This file is part of CRATE.

CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with CRATE. If not, see <https://www.gnu.org/licenses/>.


Low-level SQL manipulation functions.

These are about the manipulation of SQL as text (e.g. for query building assistance for researchers, or for interpreting SQL data types in data dictionaries), not about a higher-level approach like SQLAlchemy.

class crate_anon.common.sql.ColumnId(db: str = '', schema: str = '', table: str = '', column: str = '')[source]

Represents a database column.

__init__(db: str = '', schema: str = '', table: str = '', column: str = '') None[source]
Parameters:
  • db – database name

  • schema – schema name

  • table – table name

  • column – column name

property column: str

Returns the column part.

property db: str

Returns the database part.

property has_table_and_column: bool

Do we know about a table and a column?

property is_valid: bool

Do we know about a table and a column, at least?

property schema: str

Returns the schema part.

property schema_id: SchemaId

Returns a SchemaId for the schema of our column.

property table: str

Returns the table part.

property table_id: TableId

Returns a TableId for our table.

class crate_anon.common.sql.IndexCreationInfo(index_name: str, column: str | List[str], unique: bool = False)[source]
__init__(index_name: str, column: str | List[str], unique: bool = False) None
column: str | List[str]

Column name(s) to index

index_name: str

Name of the index

unique: bool = False

Make a unique index?

class crate_anon.common.sql.JoinInfo(table: str, join_type: str = 'INNER JOIN', join_condition: str = '')[source]

Object to represent a SQL join condition in a simple way.

__init__(table: str, join_type: str = 'INNER JOIN', join_condition: str = '') None[source]
Parameters:
  • table – table to be joined in

  • join_type – join method, e.g. "INNER JOIN"

  • join_condition – join condition, e.g. "ON x = y"

class crate_anon.common.sql.ReflectedColumnInfo(column: Column, override_comment: str | None = None, crate_annotation: str | None = None, values_info: str | None = None)[source]

Provides information about a column reflected from a database, with optional additional information from a CRATE data dictionary, +/- a description of values in that column (for researcher reports).

__init__(column: Column, override_comment: str | None = None, crate_annotation: str | None = None, values_info: str | None = None) None
property comment: str

The database comment, if present, or another that has been supplied.

property crate_annotation_str: str

Human-oriented version for report.

get_column_source_description(with_fk: bool = True) str[source]

Returns a description of where the column is from, used as a suffix for data dictionary comment generation.

Parameters:

with_fk – Include foreign key descriptions (helpful because CRATE doesn’t reproduce FK relationships in the destination DDL).

property values_info_str: str

Human-oriented version for report.

class crate_anon.common.sql.SchemaId(db: str = '', schema: str = '')[source]

Represents a database schema. This is a bit complex:

__init__(db: str = '', schema: str = '') None[source]
Parameters:
  • db – database name

  • schema – schema name

column_id(table: str, column: str) ColumnId[source]

Returns a ColumnId combining this schema and the specified table/column.

Parameters:
  • table – name of the table

  • column – name of the column

property db: str

Returns the database part.

classmethod from_schema_tag(tag: str) SchemaId[source]

Returns a SchemaId from a tag of the form db.schema.

identifier(grammar: SqlGrammar) str[source]

Returns an SQL identifier for this schema using the specified SQL grammar, quoting it if need be.

Parameters:

grammarcardinal_pythonlib.sql.sql_grammar.SqlGrammar

is_blank() bool[source]

Is this a blank/nonfunctional schema, with no database or schema part?

is_present() bool[source]

Is this a blank/nonfunctional schema, with no database or schema part?

property schema: str

Returns the schema part.

property schema_tag: str

String suitable for encoding the SchemaId e.g. in a single HTML form. Takes the format database.schema.

The __init__() function has already checked the assumption of no '.' characters in either part.

table_id(table: str) TableId[source]

Returns a TableId combining this schema and the specified table.

Parameters:

table – name of the table

class crate_anon.common.sql.TableId(db: str = '', schema: str = '', table: str = '')[source]

Represents a database table.

__init__(db: str = '', schema: str = '', table: str = '') None[source]
Parameters:
  • db – database name

  • schema – schema name

  • table – table name

column_id(column: str) ColumnId[source]

Returns a ColumnId combining this table and the specified column.

Parameters:

column – name of the column

database_schema_part(grammar: SqlGrammar) str[source]

Returns an SQL identifier for this table’s database/schema (without the table part) using the specified SQL grammar, quoting it if need be.

Parameters:

grammarcardinal_pythonlib.sql.sql_grammar.SqlGrammar

property db: str

Returns the database part.

identifier(grammar: SqlGrammar) str[source]

Returns an SQL identifier for this table using the specified SQL grammar, quoting it if need be.

Parameters:

grammarcardinal_pythonlib.sql.sql_grammar.SqlGrammar

property schema: str

Returns the schema part.

property schema_id: SchemaId

Returns a SchemaId for the schema of our table.

property table: str

Returns the table part.

table_part(grammar: SqlGrammar) str[source]

Returns an SQL identifier for this table’s table name (only) using the specified SQL grammar, quoting it if need be.

Parameters:

grammarcardinal_pythonlib.sql.sql_grammar.SqlGrammar

class crate_anon.common.sql.TransactionSizeLimiter(session: Session, max_rows_before_commit: int | None = None, max_bytes_before_commit: int | None = None)[source]

Class to allow us to limit the size of database transactions.

__init__(session: Session, max_rows_before_commit: int | None = None, max_bytes_before_commit: int | None = None) None[source]
Parameters:
  • session – SQLAlchemy database Session

  • max_rows_before_commit – how many rows should we insert before triggering a COMMIT? None for no limit.

  • max_bytes_before_commit – how many bytes should we insert before triggering a COMMIT? None for no limit.

commit() None[source]

Performs a database COMMIT and resets our counters.

(Measures some timing information, too.)

notify(n_rows: int, n_bytes: int, force_commit: bool = False) None[source]

Use this function to notify the limiter of data that you’ve inserted into the database. If the total number of rows or bytes exceeds a limit that we’ve set, this will trigger a COMMIT.

Parameters:
  • n_rows – number of rows inserted

  • n_bytes – number of bytes inserted

  • force_commit – force a COMMIT?

class crate_anon.common.sql.ViewMaker(viewname: str, engine: Engine, basetable: str, existing_to_lower: bool = False, rename: Dict[str, str] | None = None, userobj: Any | None = None, enforce_same_n_rows_as_base: bool = True, insert_basetable_columns: bool = True)[source]

View-building assistance class.

__init__(viewname: str, engine: Engine, basetable: str, existing_to_lower: bool = False, rename: Dict[str, str] | None = None, userobj: Any | None = None, enforce_same_n_rows_as_base: bool = True, insert_basetable_columns: bool = True) None[source]
Parameters:
  • viewname – name of the view

  • engine – SQLAlchemy database Engine

  • basetable – name of the single base table that this view draws from

  • existing_to_lower – translate column names to lower case in the view?

  • rename – optional dictionary mapping from_name: to_name to translate column names in the view

  • userobj – optional object (e.g. argparse.Namespace, dictionary…), not used by this class, and purely to store information for others’ benefit

  • enforce_same_n_rows_as_base – ensure that the view produces the same number of rows as its base table?

  • insert_basetable_columns – start drafting the view by including all columns from the base table?

add_from(element: str) None[source]

Add an element to the FROM clause of the draft view’s SQL statement.

add_select(element: str) None[source]

Add an element to the SELECT clause of the the draft view’s SQL (meaning: add e.g. a result column).

add_where(element: str) None[source]

Add an element to the WHERE clause of the draft view’s SQL statement.

create_view(engine: Engine) None[source]

Creates the view.

Whether we act or just print is conditional on previous calls to set_print_not_execute().

If enforce_same_n_rows_as_base is set, check the number of rows returned matches the base table.

Parameters:

engine – SQLAlchemy database Engine

drop_view(engine: Engine) None[source]

Drops the view.

Whether we act or just print is conditional on previous calls to set_print_not_execute().

Parameters:

engine – SQLAlchemy database Engine

get_index_request_dict() Dict[str, List[str]][source]

Returns all our recorded index requests, as a dictionary mapping each table name to a list of column names to be indexed. See request_index().

get_lookup_tables() List[str][source]

Returns all lookup tables that we have recorded. See record_lookup_table().

get_sql() str[source]

Returns the view-creation SQL.

record_lookup_table(table: str) None[source]

Keep a record of a lookup table. The framework may wish to suppress these from a data dictionary later (e.g. create a view, suppress the messier raw data). See get_lookup_tables().

Parameters:

table – table name

record_lookup_table_keyfield(table: str, keyfield: str | Iterable[str]) None[source]

Makes a note that a table is a lookup table, and its key field(s) should be indexed. See get_lookup_tables(), get_index_request_dict().

Parameters:
  • table – table name

  • keyfield – field name, or iterable (e.g. list) of them

record_lookup_table_keyfields(table_keyfield_tuples: Iterable[Tuple[str, str | Iterable[str]]]) None[source]

Make a note of a whole set of lookup table / key field groups. See record_lookup_table_keyfield().

Parameters:

table_keyfield_tuples – iterable (e.g. list) of tuples of the format tablename, keyfield. Each will be passed to record_lookup_table_keyfield().

request_index(table: str, column: str) None[source]

Note a request that a specific column be indexed. The framework can use the ViewMaker to keep a note of these requests, and then add index hints to a data dictionary if it wishes. See get_index_request_dict().

Parameters:
  • table – table name

  • column – column name

class crate_anon.common.sql.WhereCondition(column_id: ColumnId | None = None, op: str = '', datatype: str = '', value_or_values: Any | None = None, raw_sql: str = '', from_table_for_raw_sql: TableId | None = None)[source]

Ancillary class for building SQL WHERE expressions from our web forms.

The essence of it is WHERE column op value_or_values.

__init__(column_id: ColumnId | None = None, op: str = '', datatype: str = '', value_or_values: Any | None = None, raw_sql: str = '', from_table_for_raw_sql: TableId | None = None) None[source]
Parameters:
  • column_idColumnId for the column

  • op – operation (e.g. =, <, <=, etc.)

  • datatype – data type string that must match values in our querybuilder.js; see source code. We use this to know how to build SQL literal values. (Not terribly elegant, but it works; SQL injection isn’t a particular concern because we let our users run any SQL they want and ensure the connection is made read-only.)

  • value_or_valuesNone, single value, or list of values. Which is appropriate depends on the operation. For example, IS NULL takes no value; = takes one; IN takes many.

  • raw_sql – override any thinking we might wish to do, and just return this raw SQL

  • from_table_for_raw_sql – if we are using raw SQL, provide a TableId for the relevant table here

property column_id: ColumnId

Returns the ColumnId provided at creation.

sql(grammar: SqlGrammar) str[source]

Returns the WHERE clause (without WHERE itself!) for our condition, in the specified SQL grammar. Some examples might be:

  • somecol = 3

  • othercol IN (6, 7, 8)

  • thirdcol IS NOT NULL

  • textcol LIKE '%paracetamol%'

  • MATCH (fulltextcol AGAINST 'paracetamol') (MySQL)

  • CONTAINS(fulltextcol, 'paracetamol') (SQL Server)

Parameters:

grammarcardinal_pythonlib.sql.sql_grammar.SqlGrammar

property table_id: TableId

Returns a TableId:

  • for raw SQL, our from_table_for_raw_sql attribute

  • otherwise, the table ID extracted from our column_id attribute

table_str(grammar: SqlGrammar) str[source]

Returns the table identifier in the specified SQL grammar.

Parameters:

grammarcardinal_pythonlib.sql.sql_grammar.SqlGrammar

crate_anon.common.sql.add_columns(engine: Engine, table: Table, columns: List[Column]) None[source]

Adds columns to a table.

Whether we act or just print is conditional on previous calls to set_print_not_execute().

Parameters:
  • engine – SQLAlchemy database Engine

  • table – SQLAlchemy Table object

  • columns – SQLAlchemy Column objects to add to the table

Behaviour of different database systems:

This function therefore operates one at a time.

SQLAlchemy doesn’t provide a shortcut for this.

crate_anon.common.sql.add_indexes(engine: Engine, table: Table, index_info_list: Iterable[IndexCreationInfo]) None[source]

Adds indexes to a table.

Whether we act or just print is conditional on previous calls to set_print_not_execute().

Parameters:
  • engine – SQLAlchemy database Engine

  • table – SQLAlchemy Table object

  • index_info_list – Index(es) to create: list of IndexCreationInfo objects.

crate_anon.common.sql.assert_view_has_same_num_rows(engine: Engine, basetable: str, viewname: str) None[source]

Ensures that a view gives the same number of rows as a table. (For use in situations where this should hold; views don’t have to do this in general!)

Parameters:
  • engine – SQLAlchemy database Engine

  • basetable – name of the table that this view should have a 1:1 relationship to

  • viewname – view name

Raises:

AssertionError

crate_anon.common.sql.coltype_length_if_text(column_type: str, dialect: str) int | None[source]

Find the length of an sql text column type.

Parameters:
  • column_type – SQL column type as a string, e.g. "VARCHAR(50)"

  • dialect – the SQL dialect the column type is from

Returns:

length of the column or None if it’s not a text column.

crate_anon.common.sql.columns_to_table_column_hierarchy(columns: List[ColumnId], sort: bool = True) List[Tuple[TableId, List[ColumnId]]][source]

Converts a list of column IDs :param columns: list of ColumnId objects :param sort: sort by table, and column within table?

Returns:

a list of tuples, each table, columns, where table is a TableId and columns is a list of ColumnId

crate_anon.common.sql.create_view(engine: Engine, viewname: str, select_sql: str) None[source]

Creates a view.

Whether we act or just print is conditional on previous calls to set_print_not_execute().

Parameters:
  • engine – SQLAlchemy database Engine

  • viewname – view name

  • select_sql – SQL SELECT statement for this view

crate_anon.common.sql.decorate_index_name(idxname: str, tablename: str | None = None, engine: Engine | None = None) str[source]

Amend the name of a database index. Specifically, this is because SQLite (which we won’t use much, but do use for testing!) won’t accept two indexes with the same names applying to different tables.

Parameters:
  • idxname – The original index name.

  • tablename – The name of the table.

  • engine – The SQLAlchemy engine, from which we obtain the dialect.

Returns:

The index name, amended if necessary.

crate_anon.common.sql.drop_columns(engine: Engine, table: Table, column_names: Iterable[str]) None[source]

Drops columns from a table.

Whether we act or just print is conditional on previous calls to set_print_not_execute().

Parameters:
  • engine – SQLAlchemy database Engine

  • table – SQLAlchemy Table object

  • column_names – names of columns to drop

Columns are dropped one by one.

crate_anon.common.sql.drop_indexes(engine: Engine, table: Table, index_names: Iterable[str]) None[source]

Drops indexes from a table.

Whether we act or just print is conditional on previous calls to set_print_not_execute().

Parameters:
  • engine – SQLAlchemy database Engine

  • table – SQLAlchemy Table object

  • index_names – names of indexes to drop

crate_anon.common.sql.drop_view(engine: Engine, viewname: str, quiet: bool = False) None[source]

Drops a view.

Whether we act or just print is conditional on previous calls to set_print_not_execute().

Parameters:
  • engine – SQLAlchemy database Engine

  • viewname – view name

  • quiet – don’t announce this to the Python log

crate_anon.common.sql.dumb_make_identifier(database: str | None = None, schema: str | None = None, table: str | None = None, column: str | None = None) str[source]

Makes an SQL-style identifier by joining all the parts with ., without bothering to quote them.

Parameters:
  • database – database name

  • schema – schema name

  • table – table name

  • column – column name

Returns:

a string as above in the order “database, schema, table, column”, but omitting any missing parts

crate_anon.common.sql.ensure_columns_present(engine: Engine, tablename: str, column_names: Iterable[str]) None[source]

Ensure all these columns are present in a table, or raise an exception.

Operates in case-insensitive fashion.

Parameters:
  • engine – SQLAlchemy database Engine

  • tablename – name of the table

  • column_names – names of required columns

Raises:

ValueError

crate_anon.common.sql.escape_percent_for_python_dbapi(sql: str) str[source]

Escapes % by converting it to %%. Use this for SQL within Python where % characters are used for argument placeholders.

crate_anon.common.sql.escape_percent_in_literal(sql: str) str[source]

Escapes % by converting it to \%. Use this for LIKE clauses.

crate_anon.common.sql.escape_quote_in_literal(s: str) str[source]

Escape '. We could use '' or \'. Let’s use \. for consistency with percent escaping.

crate_anon.common.sql.escape_sql_string_literal(s: str) str[source]

Escapes SQL string literal fragments against quotes and parameter substitution.

crate_anon.common.sql.escape_sql_string_or_int_literal(s: str | int) str[source]

Converts an integer or a string into an SQL literal (with single quotes and escaping in the case of a string).

crate_anon.common.sql.execute(engine: Engine, sql: str) None[source]

Executes plain SQL in a transaction.

Whether we act or just print is conditional on previous calls to set_print_not_execute().

Parameters:
  • engine – SQLAlchemy database Engine

  • sql – raw SQL to execute (or print)

crate_anon.common.sql.format_sql_for_print(sql: str) str[source]

Very simple SQL formatting.

Remove blank lines and trailing spaces from an SQL statement. Converts tabs to spaces.

crate_anon.common.sql.get_column_fk_description(c: Column) str[source]

Standardized description of a column’s foreign keys.

Parameters:

c – SQLAlchemy Column

crate_anon.common.sql.get_column_names(engine: Engine, tablename: str, to_lower: bool = False, sort: bool = False) List[str][source]

Reads columns names afresh from the database, for a specific table (in case metadata is out of date).

Parameters:
  • engine – SQLAlchemy database Engine

  • tablename – name of the table

  • to_lower – convert view names to lower case?

  • sort – sort view names?

Returns:

list of column names

crate_anon.common.sql.get_first_from_table(parsed: ParseResults, match_db: str = '', match_schema: str = '', match_table: str = '') TableId[source]

Given a set of parsed results from a SELECT statement, returns the db, schema, table tuple representing the first table in the FROM clause.

Optionally, the match may be constrained with the match* parameters.

Parameters:
  • parsed – a pyparsing.ParseResults result

  • match_db – optional database name to constrain the result to

  • match_schema – optional schema name to constrain the result to

  • match_table – optional table name to constrain the result to

Returns:

a TableId, which will be empty in case of failure

crate_anon.common.sql.get_index_names(engine: Engine, tablename: str, to_lower: bool = False, sort: bool = False) List[str][source]

Reads index names from the database, for a specific table.

Parameters:
  • engine – SQLAlchemy database Engine

  • tablename – name of the table

  • to_lower – convert index names to lower case?

  • sort – sort index names?

Returns:

list of index names

crate_anon.common.sql.get_table_names(engine: Engine, to_lower: bool = False, sort: bool = False) List[str][source]

Returns all table names for the database.

Parameters:
  • engine – SQLAlchemy database Engine

  • to_lower – convert table names to lower case?

  • sort – sort table names?

Returns:

list of table names

crate_anon.common.sql.get_view_names(engine: Engine, to_lower: bool = False, sort: bool = False) List[str][source]

Returns all view names for the database.

Parameters:
  • engine – SQLAlchemy database Engine

  • to_lower – convert view names to lower case?

  • sort – sort view names?

Returns:

list of view names

crate_anon.common.sql.is_sql_column_type_textual(column_type: str, min_length: int = 1) bool[source]

Does an SQL column type look textual?

Parameters:
  • column_type – SQL column type as a string, e.g. "VARCHAR(50)"

  • min_length – what’s the minimum string length we’ll say “yes” to?

Returns:

is it a textual column (of the minimum length or more)?

Note:

  • For SQL Server’s NVARCHAR(MAX), crate_anon.crateweb.research.research_db_info._schema_query_microsoft() returns “NVARCHAR(-1)”

crate_anon.common.sql.make_identifier(grammar: SqlGrammar, database: str | None = None, schema: str | None = None, table: str | None = None, column: str | None = None) str[source]

Makes an SQL identifier by quoting its elements according to the style of the specific SQL grammar, and then joining them with ..

Parameters:
  • grammarcardinal_pythonlib.sql.sql_grammar.SqlGrammar

  • database – database name

  • schema – schema name

  • table – table name

  • column – column name

Returns:

a string as above in the order “database, schema, table, column”, but omitting any missing parts

crate_anon.common.sql.make_string_literal(s: str) str[source]

Converts a Python string into an SQL single-quoted (and escaped) string literal.

crate_anon.common.sql.matches_fielddef(table: str, field: str, fielddef: str | List[str]) bool[source]

Does the table/field name match the wildcard-based field definition?

Parameters:
  • table – table name

  • field – fieldname

  • fielddeffnmatch-style pattern (e.g. "system_table.*" or "*.nhs_number"), or list of them

crate_anon.common.sql.matches_tabledef(table: str, tabledef: str | List[str]) bool[source]

Does the table name match the wildcard-based table definition?

Parameters:
  • table – table name

  • tabledeffnmatch-style pattern (e.g. "patient_address_table_*"), or list of them

crate_anon.common.sql.parser_add_from_tables(parsed: ParseResults, join_info_list: List[JoinInfo], grammar: SqlGrammar) ParseResults[source]

Takes a parsed SQL statement of the form

SELECT a, b, c
FROM sometable
WHERE conditions;

and adds one or more join columns, e.g. JoinInfo("othertable", "INNER JOIN", "ON table.key = othertable.key"), to give

SELECT a, b, c
FROM sometable
INNER JOIN othertable ON table.key = othertable.key
WHERE conditions;

Presupposes that there at least one table already in the FROM clause.

Parameters:
  • parsed – a pyparsing.ParseResults result

  • join_info_list – list of JoinInfo objects

  • grammarcardinal_pythonlib.sql.sql_grammar.SqlGrammar

Returns:

a pyparsing.ParseResults result

crate_anon.common.sql.parser_add_result_column(parsed: ParseResults, column: str, grammar: SqlGrammar) ParseResults[source]

Takes a parsed SQL statement of the form

SELECT a, b, c
FROM sometable
WHERE conditions;

and adds a result column, e.g. d, to give

SELECT a, b, c, d
FROM sometable
WHERE conditions;

Presupposes that there is at least one column already in the SELECT statement.

Parameters:
  • parsed – a pyparsing.ParseResults result

  • column – column name

  • grammarcardinal_pythonlib.sql.sql_grammar.SqlGrammar

Returns:

a pyparsing.ParseResults result

crate_anon.common.sql.set_distinct(sql: str, grammar: SqlGrammar, action: str = 'set', formatted: bool = True, debug: bool = False, debug_verbose: bool = False) str[source]

Takes an SQL statement (as a string) and modifies its DISTINCT status.

Parameters:
  • sql – SQL statment as text

  • grammarcardinal_pythonlib.sql.sql_grammar.SqlGrammar

  • action – one of "set", "clear", "toggle"; see set_distinct_within_parsed()

  • formatted – pretty-format the result?

  • debug – show debugging information to the Python log

  • debug_verbose – be verbose when debugging

Returns:

the modified SQL statment, as a string

crate_anon.common.sql.set_distinct_within_parsed(p: ParseResults, action: str = 'set') None[source]

Modifies (in place) the DISTINCT status of a parsed SQL statement.

Parameters:
  • p – a pyparsing.ParseResults result

  • action"set" to turn DISTINCT on; "clear" to turn it off; or "toggle" to toggle it.

crate_anon.common.sql.set_print_not_execute(print_not_execute: bool) None[source]

Sets a nasty global flag: should we print DDL, rather than executing it, when we issue DDL commands from this module?

Parameters:

print_not_execute – print (not execute)?

crate_anon.common.sql.split_db_schema_table(db_schema_table: str) TableId[source]

Converts a simple SQL-style identifier string into a TableId.

Parameters:

db_schema_table – one of: database.schema.table, schema.table, table

Returns:

a TableId

Raises:

ValueError

crate_anon.common.sql.split_db_schema_table_column(db_schema_table_col: str) ColumnId[source]

Converts a simple SQL-style identifier string into a ColumnId.

Parameters:

db_schema_table_col – one of: database.schema.table.column, schema.table.column, table.column, column

Returns:

a ColumnId

Raises:

ValueError

crate_anon.common.sql.sql_fragment_cast_to_int(expr: str, big: bool = True, dialect: Dialect | None = None, viewmaker: ViewMaker | None = None) str[source]

Takes an SQL expression and coerces it to an integer. For Microsoft SQL Server.

Parameters:
  • expr – starting SQL expression

  • big – use BIGINT, not INTEGER?

  • dialect – optional sqlalchemy.engine.interfaces.Dialect. If None and we have a viewmaker, use the viewmaker’s dialect. Otherwise, assume SQL Server.

  • viewmaker – optional ViewMaker

Returns:

modified SQL expression

Notes

Conversion to INT:

Note that the regex-like expression supported by LIKE is extremely limited.

SQL Server does not support a REGEXP command directly.

So the best bet is to have the LIKE clause check for a non-integer:

CASE
    WHEN something LIKE '%[^0-9]%' THEN NULL
    ELSE CAST(something AS BIGINT)
END

… which doesn’t deal with spaces properly, but there you go. Could also strip whitespace left/right:

CASE
    WHEN LTRIM(RTRIM(something)) LIKE '%[^0-9]%' THEN NULL
    ELSE CAST(something AS BIGINT)
END

That only works for positive integers.

LTRIM/RTRIM are not ANSI SQL. Nor are unusual LIKE clauses; see https://stackoverflow.com/questions/712580/list-of-special-characters-for-sql-like-clause

The other, for SQL Server 2012 or higher, is TRY_CAST:

TRY_CAST(something AS BIGINT)

… which returns NULL upon failure; see https://msdn.microsoft.com/en-us/library/hh974669.aspx

Therefore, our method is as follows:

  • If the database supports TRY_CAST, use that.

  • Otherwise if we’re using SQL Server, use a CASE/CAST construct.

  • Otherwise, raise ValueError as we don’t know what to do.

crate_anon.common.sql.toggle_distinct(sql: str, grammar: SqlGrammar, formatted: bool = True, debug: bool = False, debug_verbose: bool = False) str[source]

Takes an SQL statement and toggles its DISTINCT status.

Parameters:
  • sql – SQL statment as text

  • grammarcardinal_pythonlib.sql.sql_grammar.SqlGrammar

  • formatted – pretty-format the result?

  • debug – show debugging information to the Python log

  • debug_verbose – be verbose when debugging

Returns:

the modified SQL statment, as a string

crate_anon.common.sql.translate_sql_qmark_to_percent(sql: str) str[source]

This function translates SQL using ? placeholders to SQL using %s placeholders, without breaking literal '?' or '%', e.g. inside string literals.

Notes

I prefer ?, because % is used in LIKE clauses, and the databases we’re using like it.

So:

  • We use %s when using cursor.execute() directly, via Django.

  • We use ? when talking to users, and cardinal_pythonlib.sql.sql_grammar.SqlGrammar objects, so that the visual appearance matches what they expect from their database.