Welcome to i2bmi’s documentation!¶
Indices and tables¶
-
i2bmi.
assign_comorbidities
(df, column_code, column_version, columns_id, verbose=False)¶ Assign elixhauser/charlson comorbidity and comorbidity scores from diagnosis dataframe
Parameters: - df (pandas.DataFrame) – Input dataframe containing diagnosis codes
- column_code (str) – name of column containing diagnosis code.
- column_version (int or str) – if int, 9 or 10 indicating ICD version if str, name of column containing ICD version (9 or 10)
- columns_id (list of str) – list of names of columns to be used as identifier
Returns: - df_long (pandas.DataFrame) – long-form dataframe showing the mapping from icd code to comorbidity systems
- df_wide (pandas.DataFrame) – wide-form dataframe showing comorbidities and comorbidity score per identifier
Examples
>>> df_diagnosis_long,df_diagnosis_wide = assigncomorbidities(df_diagnosis,'ICD_CODE','ICD_VERSION',['ID']) >>> df_diagnosis_long,df_diagnosis_wide = assigncomorbidities(df_diagnosis,'ICD_CODE',9,['MRN','CSN'])
-
i2bmi.
boxcox
(df, invert=None)¶ Forward and inverse boxcox transformation
Parameters: - df (pandas.DataFrame) – Input dataframe with numeric columns to be boxcox transformed
- invert (dict) – used to perform inverse transformation.
Returns: - pandas.DataFrame – boxcox transformed input dataframe
- dict – contains information regarding forward transformation which can be used to perform inverse transformation dict of str (boxcox-transformed column name):dict, which contains as keys ‘min’ and ‘lmbda’
Examples
>>> Transformed_DataFrame,Transformation_dict = boxcox(DataFrame) >>> Inverse_Transformed_DataFrame = boxcox(Transformed_DataFrame,invert=Transformation_dict)
-
i2bmi.
cohort_comparison
(df, groups, include=[], p_thres=0.01, test_cat=<function _chi2>, test_cont=<function _ks>)¶ Generates cohort comparison table
Parameters: - df (pandas.DataFrame) – pandas dataframe in the form of [samples x features] where features include group(s) to be compared
- groups (str or list of str) – name(s) of columns to be used as groups for comparison if a str, the function will compare those who had True vs. False in the column
- include (list of str) – list of features to be compared. If empty list, all features will be compared.
- p_thres (float) – p-value threshold for significance
- test_cat (function) – statistical test for comparing categorical or boolean variables. Only pre-existing option is _chi2.
- test_cont (function) – statistical test for comparing continuous (numeric) variables. Pre-existing option are _anova and _ks.
Returns: cohort comparison table
Return type: pandas.DataFrame
Examples
>>> cohortcomparison(processed_dataframe,'In-hospital mortality')
-
i2bmi.
dataframe_summary
(df, column_item, column_value, stripchars='+-<> ')¶ Characterize long dataframe containing longitudinal variables e.g. for mapping purposes
Parameters: - df (pandas.DataFrame) – dataframe containing longitudinal variables in long form
- column_item (str) – name of column indicating measurement type
- column_value (int) – name of column indicating measurement result
- stripchars (str) – characters to remove prior to converting to numeric
Returns: Summary of input dataframe - # of measurements, % numeric, quantiles (0.1,0.25,0.5,0.75,0.9), and top 20 most common results
Return type: pandas.DataFrame
Examples
>>> dataframe_summary(dataframe_laboratoryresults,'LAB_TEST','LAB_RESULT_VALUE')
-
i2bmi.
jupyter_widen
()¶ Increases width of jupyter cells to use more of the realestate available in the browser
-
i2bmi.
onehotify
(df, sep='|')¶ Wrapper for one-hot encoding all (explicitly) categorical columns in a dataframe
Parameters: - df (pandas.DataFrame) – Input dataframe with categorical columns to be one-hot encoded. Some pre-processing may be required as this function does not group low-frequency categories.
- sep (str) – Separator. The returned dataframe will contain columns formatted as variable name followed by separator followed by category name.
Returns: Input dataframe but with categoriacl columns split out into one-hot encoded columns
Return type: pandas.DataFrame
Examples
>>> onehotify(dataframe_demographics)
-
i2bmi.
performance_metrics
(y_true, y_score)¶ Generate performance metrics dataframe with threshold as index
Parameters: - y_true (list-like or pandas.Series) – true y labels
- y_score (list-like or pandas.Series) – predicted probability
Returns: performance_metrics dataframe
Return type: pd.DataFrame
Examples
>>> performance_metrics(y_train,y_train_pred)
-
i2bmi.
plot_calibration
(y_true, y_score, figpath=None)¶ Calibration plot
Parameters: - y_true (list-like or pandas.Series) – true y labels
- y_score (list-like or pandas.Series) – predicted probability
- figpath (str) – path for saving figure
Returns: Return type: None
Examples
>>> plot_calibration(y_train,y_train_pred)
-
i2bmi.
plot_prc
(y_true, y_score, figpath=None)¶ Precision Recall Curve plot
Parameters: - y_true (list-like or pandas.Series) – true y labels
- y_score (list-like or pandas.Series) – predicted probability
- figpath (str) – path for saving figure
Returns: Return type: None
Examples
>>> plot_prc(y_train,y_train_pred)
-
i2bmi.
plot_roc
(y_true, y_score, figpath=None)¶ Receiver Operating Curve plot
Parameters: - y_true (list-like or pandas.Series) – true y labels
- y_score (list-like or pandas.Series) – predicted probability
- figpath (str) – path for saving figure
Returns: Return type: None
Examples
>>> plot_roc(y_train,y_train_pred)
-
i2bmi.
plot_temporal
(series_value, series_time, num_bins=20, figpath=None)¶ Triplet plot for characterizing longitudinal variables
Parameters: - series_value (pandas.Series) – variable value
- series_time (pandas.Series) – variable documentation datetime
- num_bins (int) – number of bins for all subplots
- figpath (str) – path for saving figure
Returns: Return type: None
Examples
>>> plot_temporal(df['VALUE'],df['TIME'],num_bins=30,figpath='./figure.png')
-
i2bmi.
plot_threshold
(y_true, y_score, figpath=None)¶ Threshold plot
Parameters: - y_true (list-like or pandas.Series) – true y labels
- y_score (list-like or pandas.Series) – predicted probability
- figpath (str) – path for saving figure
Returns: Return type: None
Examples
>>> plot_threshold(y_train,y_train_pred)
-
i2bmi.
quantile
(n)¶ Wrapper for pandas quantile for use in groupby
Parameters: n (int) – Quantile Returns: quantile function that can be used in a groupby Return type: function Examples
The series on which to apply the returned quantile function must be numeric
>>> DataFrame.groupby('MEASURE_NAME').agg({'VALUE':['size',quantile(0.25)]})
-
i2bmi.
standardize
(df, invert=None)¶ Forward and inverse standardization transformation (mean=0, std=1)
Parameters: - df (pandas.DataFrame) – Input dataframe with numeric columns to be standardized
- invert (dict) – used to perform inverse transformation.
Returns: - pandas.DataFrame – standardized input dataframe
- dict – contains information regarding forward transformation which can be used to perform inverse transformation dict of str (boxcox-transformed column name):dict, which contains as keys ‘std’ and ‘mean’
Examples
>>> Transformed_DataFrame,Transformation_dict = standardize(DataFrame) >>> Inverse_Transformed_DataFrame = standardize(Transformed_DataFrame,invert=Transformation_dict)
-
i2bmi.
value_counts
(n)¶ Wrapper for pandas value_counts for use in groupby
Parameters: n (int) – Number of most common responses Returns: value_counts function that can be used in a groupby Return type: function Examples
>>> DataFrame.groupby('MEASURE_NAME').agg({'VALUE':['size',value_counts(20)]})