.. crate_anon/docs/source/misc/python_from_sql_server.rst .. Copyright (C) 2015, University of Cambridge, Department of Psychiatry. Created by Rudolf Cardinal (rnc1001@cam.ac.uk). . This file is part of CRATE. . CRATE is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. . CRATE is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. . You should have received a copy of the GNU General Public License along with CRATE. If not, see . .. _pandas: https://pandas.pydata.org/ Calling Python from SQL Server ------------------------------ Prerequisites ~~~~~~~~~~~~~ - SQL Server, obviously. See https://www.microsoft.com/sql-server/sql-server-downloads. - SQL Server 2017 Machine Learning Services; see https://docs.microsoft.com/en-us/sql/advanced-analytics/install/sql-machine-learning-services-windows-install?view=sql-server-2017. - Once you have SQL Server 2017 installed, run "SQL Server Installation Center". - If it appears to be missing run it as e.g. ``C:\SQLServer2017Media\Developer_ENU\setup.exe``. - Choose :menuselection:`Installation --> New SQL Server stand-alone installation or add features to an existing installation`. - If you think you don't have the media that it asks for because you used an online installer... re-run that installer; be reminded that it saved everything (by default) to ``C:\SQLServer2017Media``; provide that as the location! In this situation, the folder you need is e.g. ``C:\SQLServer2017Media\Developer_ENU`` (for an English Developer edition). - Click through. When you get to a choice of performing a new installation or adding features to an existing instance, choose the latter. - Add ``Machine Learning Services / Python``, and while we're at it, ``Machine Learning Services / R``. And since it sounds interesting, ``Full-Text and Semantic Extractions for Search``. You probably want to add them as "instance features". - Accept some license terms and put the coffee on. Basic communication with Python ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Check Python is installed and SQL Server can talk to it; see https://docs.microsoft.com/en-us/sql/advanced-analytics/tutorials/quickstart-python-verify?view=sql-server-2017 - Run SQL Server Management Studio, connect to your server, and try the following script: .. code-block:: sql EXECUTE sp_execute_external_script @language = N'Python', @script = N' import sys print(sys.version) '; GO - If you get this: .. code-block:: none Msg 39023, Level 16, State 1, Procedure sp_execute_external_script, Line 1 [Batch Start Line 0] 'sp_execute_external_script' is disabled on this instance of SQL Server. Use sp_configure 'external scripts enabled' to enable it. then, as per https://docs.microsoft.com/en-us/sql/database-engine/configure-windows/external-scripts-enabled-server-configuration-option?view=sql-server-2017: #. Execute .. code-block:: sql sp_configure 'external scripts enabled', 1; RECONFIGURE WITH OVERRIDE; #. Restart SQL Server (via Windows Services). .. tip:: As and when you restart SQL Server, restart "SQL Server Launchpad (MSSQLSERVER)". - Note the version of Python. In my case the result was this: .. code-block:: none STDOUT message(s) from external script: 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] so this is a custom version of Python 3.5.2. Install some external Python code ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ It looks like we can't have arbitrary multiple virtual environments. But there is one virtual environment. This is implied by https://docs.microsoft.com/en-us/sql/advanced-analytics/python/install-additional-python-packages-on-sql-server?view=sql-server-2017. So to install something using ``pip``, run an **administrative**-authority command prompt (because we are going to be modifying files that live within the ``C:\Program Files`` tree), then: .. code-block:: bat cd C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\Scripts pip install --upgrade pip pip install cardinal_pythonlib You may need to stop SQL Server first [the service named "SQL Server (MSSQLSERVER)"] to prevent further "access denied" errors. But if you have used an administrative command prompt *and* stopped the SQL Server first, I don't know why it complains; just repeat the command that failed. Now try this: .. code-block:: sql EXECUTE sp_execute_external_script @language = N'Python', @script = N' from cardinal_pythonlib.psychiatry.drugs import ( drug_name_to_generic, drug_names_to_generic ) # You cannot do "import *"; it says "import * only allowed at module level" # Watch out: no unescaped apostrophes within the Python code! print(drug_name_to_generic("UNKNOWN")) print(drug_name_to_generic("UNKNOWN", unknown_to_default=True)) print(drug_names_to_generic([ "citalopram", "Citalopram", "Cipramil", "Celexa", "olanzepine", # typo "dextroamphetamine", "amitryptyline", ])) '; GO See what packages are installed locally ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ See https://docs.microsoft.com/en-us/sql/advanced-analytics/tutorials/quickstart-python-verify?view=sql-server-2017. Specifically: .. code-block:: sql EXECUTE sp_execute_external_script @language =N'Python', @script=N' import pip for i in pip.get_installed_distributions(): print(i) '; GO You should see packages that you installed above. Structured data flow ~~~~~~~~~~~~~~~~~~~~ Let's go beyond stdout and have data flow from an SQL Server table to Python, and back from Python to a result set (or another table). See https://docs.microsoft.com/en-us/sql/advanced-analytics/tutorials/quickstart-python-inputs-and-outputs?view=sql-server-2017. The basic messages are as follows: - By default, SQL Server translates the parameter called ``@input_data_1`` into a Python variable called ``InputDataSet``, runs the Python, and then translates the Python variable called ``OutputDataSet`` into an SQL result set, according to a schema that you specify with the ``WITH RESULT SETS`` clause. - The data format on the Python side is a pandas_ data frame. - So the basic setup is: .. code-block:: sql EXECUTE sp_execute_external_script @language = N'Python' , @script = N' # PYTHON CODE from somewhere import somefunc OutputDataSet = somefunc(InputDataSet) ' , @input_data_1 = N' -- SOURCE SQL SELECT * FROM sometable; ' WITH RESULT SETS ( -- DEFINE OUTPUT DATA FORMAT HERE ( -- first (and in this case only) result set definition [first_column] INT NOT NULL, [second_column] NVARCHAR(MAX) -- etc. ) ); - The ``WITH RESULT SETS`` syntax is described at https://docs.microsoft.com/en-us/sql/t-sql/language-elements/execute-transact-sql?view=sql-server-2017. - You can rename the input/output parameters if you wish. Example to find two antidepressants "episodes" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - We'll use an algorithm from our core Python assistance library, described at https://cardinalpythonlib.readthedocs.io/. - Create a dummy table and some **test data**: .. code-block:: sql USE rnctestdb; -- or whatever it's called -- DROP TABLE dummy_drug_data; CREATE TABLE dummy_drug_data ( brcid VARCHAR(255) NOT NULL, -- or INT, etc.... generic_drug VARCHAR(100) NOT NULL, document_date DATE NOT NULL ); INSERT INTO dummy_drug_data (brcid, generic_drug, document_date) VALUES -- Bob: mixture switch; should pick mirtaz -> sert ('Bob', 'venlafaxine', '2018-01-01'), ('Bob', 'mirtazapine', '2018-01-01'), ('Bob', 'venlafaxine', '2018-02-01'), ('Bob', 'mirtazapine', '2018-02-01'), ('Bob', 'venlafaxine', '2018-03-01'), ('Bob', 'sertraline', '2018-03-02'), ('Bob', 'venlafaxine', '2018-04-01'), ('Bob', 'sertraline', '2018-05-01'), ('Bob', 'sertraline', '2018-06-01'), -- Alice: two consecutive switches; should pick the first, c -> f -- ... goes second in the data; should be sorted to first ('Alice', 'citalopram', '2018-01-01'), ('Alice', 'citalopram', '2018-02-01'), ('Alice', 'fluoxetine', '2018-03-01'), ('Alice', 'fluoxetine', '2018-04-01'), ('Alice', 'mirtazapine', '2018-05-01'), ('Alice', 'mirtazapine', '2018-06-01'), -- Chloe: courses just too short; should give nothing ('Chloe', 'fluoxetine', '2018-01-01'), ('Chloe', 'fluoxetine', '2018-01-27'), ('Chloe', 'venlafaxine', '2018-02-01'), ('Chloe', 'venlafaxine', '2018-01-27'), -- Dave: courses just long enough ('Dave', 'fluoxetine', '2018-01-01'), ('Dave', 'fluoxetine', '2018-01-28'), ('Dave', 'venlafaxine', '2018-02-01'), ('Dave', 'venlafaxine', '2018-02-28'), -- Elsa: courses overlap; invalid ('Elsa', 'citalopram', '2018-01-01'), ('Elsa', 'citalopram', '2018-02-05'), ('Elsa', 'mirtazapine', '2018-02-01'), ('Elsa', 'mirtazapine', '2018-02-28'), -- Fred: courses overlap, same day; invalid ('Fred', 'citalopram', '2018-01-01'), ('Fred', 'citalopram', '2018-02-01'), ('Fred', 'mirtazapine', '2018-02-01'), ('Fred', 'mirtazapine', '2018-02-28'), -- Grace: multiple potentials; should pick 'citalopram' -> 'fluoxetine' ('Grace', 'citalopram', '2018-01-01'), ('Grace', 'citalopram', '2018-01-28'), ('Grace', 'fluoxetine', '2018-02-01'), ('Grace', 'venlafaxine', '2018-02-02'), ('Grace', 'fluoxetine', '2018-02-28'), ('Grace', 'venlafaxine', '2018-03-01'), ('Grace', 'mirtazapine', '2018-04-01'), ('Grace', 'mirtazapine', '2018-04-28'); GO - Debug the input: .. code-block:: sql USE rnctestdb; -- or whatever it's called EXECUTE sp_execute_external_script @language = N'Python' , @script = N' # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Start Python # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ print(repr(InputDataSet)) # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # End Python # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ' , @input_data_1 = N'SELECT * FROM dummy_drug_data;' ; - Discover that dates are not acceptable... .. code-block:: none Unsupported input data type in column 'document_date'. Supported types: bit, tinyint, smallint, int, bigint, uniqueidentifier, real, float, char, varchar, nchar, nvarchar, varbinary. SqlSatelliteCall error: Unsupported input data type in column 'document_date'. Supported types: bit, tinyint, smallint, int, bigint, uniqueidentifier, real, float, char, varchar, nchar, nvarchar, varbinary. See https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/sp-execute-external-script-transact-sql?view=sql-server-2017, only that suggests that ``DATE`` and ``DATETIME`` should be OK. Clearly they aren't. - Check that data is coming in once converted via ``CAST``: .. code-block:: sql USE rnctestdb; -- or whatever it's called EXECUTE sp_execute_external_script @language = N'Python' , @script = N' # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Start Python # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ print(repr(InputDataSet)) # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # End Python # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ' , @input_data_1 = N' -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- Start source SQL -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SELECT brcid, generic_drug, CAST(document_date AS VARCHAR(10)) AS document_date FROM dummy_drug_data; -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- End source SQL -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ '; - **Final query.** Process it properly, including converting that column back to a date on the Python side. We'll also create it as a stored procedure called ``py_generate_two_antidepressant_episodes``, which will allow us to view the result directly or stash it back into a table: .. code-block:: sql USE rnctestdb; -- or whatever it's called DROP PROCEDURE IF EXISTS [dbo].[py_generate_two_antidepressant_episodes]; GO CREATE PROCEDURE [dbo].[py_generate_two_antidepressant_episodes] AS BEGIN EXECUTE sp_execute_external_script @language = N'Python' , @script = N' # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Start Python # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Imports import logging import sys from cardinal_pythonlib.psychiatry.treatment_resistant_depression import ( two_antidepressant_episodes, ) import pandas as pd # Constants governing our algorithm COURSE_LENGTH_DAYS = 28 EXPECT_RESPONSE_BY_DAYS = 56 SYMPTOM_ASSESSMENT_TIME_DAYS = 180 # Make Python log output go to stdout as well as stderr SHOW_LOG_OUTPUT = True VERBOSE = True if SHOW_LOG_OUTPUT: loglevel = logging.DEBUG if VERBOSE else logging.INFO rootlog = logging.getLogger() rootlog.setLevel(loglevel) handler = logging.StreamHandler(sys.stdout) handler.setLevel(loglevel) formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s") handler.setFormatter(formatter) rootlog.addHandler(handler) # Convert date-as-text columns to proper dates InputDataSet["document_date"] = pd.to_datetime( InputDataSet["document_date"], format="%Y-%m-%d" ) # Run our algorithm of interest result = two_antidepressant_episodes( patient_drug_date_df = InputDataSet, # data in here patient_colname = "brcid", drug_colname = "generic_drug", date_colname = "document_date", course_length_days = COURSE_LENGTH_DAYS, expect_response_by_days = EXPECT_RESPONSE_BY_DAYS, symptom_assessment_time_days = SYMPTOM_ASSESSMENT_TIME_DAYS ) # We cannot send dates back out to SQL, so convert to standard text format: for datecolname in ["drug_a_first", "drug_a_second", "drug_b_first", "drug_b_second", "expect_response_to_b_by", "end_of_symptom_period"]: result[datecolname] = result[datecolname].dt.strftime("%Y-%m-%d") # Give the final result set its expected name OutputDataSet = result # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # End Python # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ' , @input_data_1 = N' -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- Start source SQL -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ SELECT brcid, generic_drug, CAST(document_date AS VARCHAR(10)) AS document_date -- YYYY-MM-DD FROM dummy_drug_data; -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- End source SQL -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ' WITH RESULT SETS ( ( -- Python uses Unicode strings, therefore NVARCHAR. [patient_id] NVARCHAR(255) NOT NULL, [drug_a_name] NVARCHAR(255), [drug_a_first] DATE, [drug_a_second] DATE, [drug_b_name] NVARCHAR(255), [drug_b_first] DATE, [drug_b_second] DATE, [expect_response_to_b_by] DATE, [end_of_symptom_period] DATE ) ) END; GO - Create a table to receive results: .. code-block:: sql USE rnctestdb; -- or whatever it's called -- DROP TABLE two_antidepressant_results; CREATE TABLE two_antidepressant_results ( [patient_id] NVARCHAR(255) NOT NULL, [drug_a_name] NVARCHAR(255), [drug_a_first] DATE, [drug_a_second] DATE, [drug_b_name] NVARCHAR(255), [drug_b_first] DATE, [drug_b_second] DATE, [expect_response_to_b_by] DATE, [end_of_symptom_period] DATE ); - If we want to view the results, we can do this: .. code-block:: sql USE rnctestdb; -- or whatever it's called EXEC [dbo].[py_generate_two_antidepressant_episodes] - If we want to stash the results, we can do this: .. code-block:: sql USE rnctestdb; -- or whatever it's called -- DELETE FROM two_antidepressant_results INSERT INTO two_antidepressant_results EXEC [dbo].[py_generate_two_antidepressant_episodes]