lakehouse.databricks.common.delta_merge
- lakehouse.databricks.common.delta_merge(spark, dataframe, partition_by, merge_type, merge_predicate, type_2_keys, delta_file_path, delta_db, dest_table, table_def=None, identity_col=None, type_1_column_exclusions=[], truncate_dest=False, **kwargs)
This function performs merge opertation for delta tables
Parameters
- sparkspark context
Spark context object passed from the calling Spark instance.
- dataframedataframe
the source dataframe to be merged into the destination
- partition_byarray
array of columns used in the partition strategy for the destination table
- merge_typestring
- the type of merge that the function will perform, supported merge types include :
overwrite : default : overwrites the destination with the source table append : appends the data from the source table to the destination table overwrite-by-key : deletes all rows from the destination where they exist in the source, then appends the source to the destination type-1 : inserts new rows into destination where key(s) in source and destination do not match
updates rows in destination where key(s) in source and destination match
- type-1-identityinserts new rows into destination where identity-key in source and destination do not match
updates rows in destination where identity-key in source and destination match excluded identity-key from the update fields array
- type-2inserts new rows into destination where identity-key in source and destination do not match
- inserts new rows into destination where identity-key in source and destination match and type-2-keys do not match
invalidate and expire prior valid version of the record
updates rows in destination where identity-key in source and destination match and type-2-keys match
- merge_predicatearray
an array of columns to be used as merge keys when performing the merge
- type_2_keysarray
an array of columns to be used as tyep 2 merge keys when performing the merge
- delta_file_pathstring
location in the delta Lakehouse where the destination table will be/is stored
- delta_dbstring
name of the db/schema where the table will be created
- dest_tablestring
name of the delta table to be created
- table_defobject, default=None
json object that describes the schema of the table to be created this value is defaults to None
- identity_colstring, default=None
identity column in the table being merged Defaults to None.
- type_1_column_exclusionslist, default=[]
columns to be excluded from updating when performing an update operation on the destination table Defaults to [].
- truncate_destbool, default=False
determines whether the destination table is truncated prior to the merge operation
Returns
- dataframe :
returns the source dataframe that was used in the delta merge operation