Common Module Documentation

The common module provides generic functions to support a Delta Lakehouse implementation

`alter_delta_table_properties`(spark, ...)	This function alters the properties of a Delta table in a database.
`create_aggregate_dataframe`(spark, dataframe, ...)	This function aggregates a dataframe based on the aggregation definition provided
`create_anti_merge_predicate`(spark, ...)	Creates the anti-merge key based on the 'merge_predicate' provided and logs the function's start and end time using 'post_la_data'.
`create_catalog_not_exists`(spark, catalogName)	A function to create a Databricks Unity Catalog within the Databricks Workspace
`create_database_not_exists`(spark, databaseName)	A function to create a Hive Catalog within the spark workspace
`create_delta_table`(spark, table_def, ...)	This function is used to create a delta table in the Delta Lakehouse and returns the schema to be used when creating a synapse serverless view
`create_folder_structure`(spark, lake_file_struct)	A function to create the folder depth structure for querying nested files based on the lake file structof the data being queried
`create_merge_dict`(spark, columns)	Create a dictionary of the form {"column1": "source.column1", "column2": "source.column2", ...}, where "column1", "column2", etc.
`create_merge_predicate`(spark, merge_predicate)	Creates the merge key based on the 'merge_predicate' provided and logs the function's start and end time using 'post_la_data'.
`create_mount`(spark, mounting_point, ...[, ...])	A function to create a mount point within the Databricks Cluster
`create_path_not_exists`(spark, path)	A function to create a folder path in the datalake
`create_type_two_condition`(spark, type_two_keys)	Creates the Type 2 condition based on the 'type_two_keys' provided and logs the function's start and end time using 'post_la_data'.
`delete_files_in_folder`(spark, mount_point, ...)	A function to recursively delete all files in a folder within a mount point
`delta_merge`(spark, dataframe, partition_by, ...)	This function performs merge opertation for delta tables
`do_vacuum_and_optimize`(spark, delta_db, ...)	A function to Check to see if Table has enough version or is old enough to optimize and vacuum
`drop_all_tables`(spark, schema)	A function to drop all hive tables within a hive database/schema
`generate_delta_load_rowsOutput`(spark, ...)	a function to return the number of rows output rows from a delta table operation
`generate_delta_load_stats`(spark, schema, ...)	a function to return the statistics from a delta table insert/update/delete operation, will always return the most recent history expluding optimize operations
`generate_expectation_meta_behaviour`(spark, ge_df)	This function generates expectation meta data to control how the calling pipeline behaves on expectation failure
`generate_expectation_results`(spark, ...[, ...])	A function to generate expectation results defined in a YAML definition
`generate_sql_expectation_table_validation_result_queries`(...)	generate queries for expectation set
`get_all_tables`(spark, schema)	A function to return all Hive tables in the hive database
`get_delta_floor_pred_for_sql`(spark, ...[, ...])	A function to get the ceiling value for a data element in the data lake in incremental data loading strategies
`get_partition_key_vals_for_pruning`(spark, ...)	A function to generate a the partition keys string used when mergeing into a partitioned table
`optimize_table`(spark, delta_db, delta_table)	A function to optimize a delta table
`read_data_lake_query`(spark, query[, using])	A function to read data from the data lake using a hive sql query, the data must be available as a hive table
`read_datalake_file`(spark, mounting_point, ...)	A function to read a set of files or file in the data lake
`repartition_delta_table`(spark, schema, ...)	A function to repartition a delta table
`rollback_delta_table_by_version`(spark, ...)	A function to truncate all of the Hive tables within a hive database/schema
`sql_data_type_replace`(type_string)	This function is used to replace none support sql types when generating a schema for sql serverless views
`truncate_all_tables`(spark, schema)	A function to truncate all of the Hive tables within a hive database/schema
`vacumm_and_optimize_all_tables`(spark, delta_db)	A function to vacuum and optimize all delta tables in a specific hive database/schema
`vacuum_table`(spark, delta_db, delta_table[, ...])	A function to vacumm a delta table
`jinga_reference`(config_mount, input)	This function evaluates input string and applies jinga macros found in the macro_functions.j2 file in the config/04-macros folder in the data lake
`write_file_to_datalake`(spark, data, ...)	The function is to write files in datalake