lakehouse.databricks.common.delta_merge

lakehouse.databricks.common.delta_merge(spark, dataframe, partition_by, merge_type, merge_predicate, type_2_keys, delta_file_path, delta_db, dest_table, table_def=None, identity_col=None, type_1_column_exclusions=[], truncate_dest=False, **kwargs)

This function performs merge opertation for delta tables

Parameters

sparkspark context

Spark context object passed from the calling Spark instance.

dataframedataframe

the source dataframe to be merged into the destination

partition_byarray

array of columns used in the partition strategy for the destination table

merge_typestring
the type of merge that the function will perform, supported merge types include :

overwrite : default : overwrites the destination with the source table append : appends the data from the source table to the destination table overwrite-by-key : deletes all rows from the destination where they exist in the source, then appends the source to the destination type-1 : inserts new rows into destination where key(s) in source and destination do not match

updates rows in destination where key(s) in source and destination match

type-1-identityinserts new rows into destination where identity-key in source and destination do not match

updates rows in destination where identity-key in source and destination match excluded identity-key from the update fields array

type-2inserts new rows into destination where identity-key in source and destination do not match
inserts new rows into destination where identity-key in source and destination match and type-2-keys do not match

invalidate and expire prior valid version of the record

updates rows in destination where identity-key in source and destination match and type-2-keys match

merge_predicatearray

an array of columns to be used as merge keys when performing the merge

type_2_keysarray

an array of columns to be used as tyep 2 merge keys when performing the merge

delta_file_pathstring

location in the delta Lakehouse where the destination table will be/is stored

delta_dbstring

name of the db/schema where the table will be created

dest_tablestring

name of the delta table to be created

table_defobject, default=None

json object that describes the schema of the table to be created this value is defaults to None

identity_colstring, default=None

identity column in the table being merged Defaults to None.

type_1_column_exclusionslist, default=[]

columns to be excluded from updating when performing an update operation on the destination table Defaults to [].

truncate_destbool, default=False

determines whether the destination table is truncated prior to the merge operation

Returns

dataframe :

returns the source dataframe that was used in the delta merge operation