Thought Leadership

Utilizing Snowflake’s new ARRAY_DISTINCT function

Teresa Kovich

Principal Consultant

November 17, 2022
Featured image for “Utilizing Snowflake’s new ARRAY_DISTINCT function”

As a Snowflake Services Elite Partner, DAS42 remains on top of Snowflake’s weekly releases to ensure our teams are delivering quality solutions with the most up to date functionality. The most recently published 6.36 release included an ARRAY_DISTINCT function. According to the documentation, this function takes in an array and returns a new array without any duplicate values. This can be used alongside ARRAY_CAT to combine two arrays and return an array containing only the distinct values.

SELECT
    ARRAY_CAT(
        ARRAY_CONSTRUCT('A', 'B', 'C'),
        ARRAY_CONSTRUCT('A', 'C', 'C', 'D', null)
    ) pre_distinct,
    ARRAY_DISTINCT(
        ARRAY_CAT(
            ARRAY_CONSTRUCT('A', 'B', 'C'),
            ARRAY_CONSTRUCT('A', 'C', 'C', 'D', null)
        )
    ) with_distinct
;
DAS42 Image

Before this new release, to get an array with only distinct values you would have to flatten out  the array and extract the distinct values only to put them back into an array.

SELECT ARRAY_AGG(DISTINCT array."VALUE") AS ARR_COL 
FROM TABLE(FLATTEN( 
  ARRAY_CAT(ARRAY_CONSTRUCT('A', 'B'), 
            ARRAY_CONSTRUCT('A', 'C')) 
)) array;
DAS42 Image

It is important to note when using ARRAY_DISTINCT that objects must be identical to be considered duplicates.

SELECT ARRAY_DISTINCT([{'A': 1, 'B': 2}, {'A': 1, 'B': 2}, {'A': 1, 'B': 3}]);
DAS42 Image

As you can see {‘A’: 1, ‘B’: 3} is not considered a duplicate because B has a value of 3 and not 2. 

This function considers null values to be known values. When nulls are inputted into ARRAY_DISTINCT they are returned as “undefined”.

SELECT ARRAY_DISTINCT(['A', 'A', 'B', null, null]);
DAS42 Image

ARRAY_DISTINCT is similar to ARRAY_AGG in that both functions output a single array. However, ARRAY_AGG’s inputs are non-array values and it is an aggregating function, meaning it takes inputs from multiple rows and provides a consolidated output. Another major difference is the way that null values are handled. When using the ARRAY_AGG with FLATTEN method, nulls are removed from the returned array.

When Snowflake releases new functionality, they always have their customer and partner needs in mind, offering new functionality that optimizes processes and reduces time to deployment. Check out our recent blog post about the Snowpark for Python release for another example of Snowflake’s continuous improvement.

Download our step-by-step walkthrough to help you better understand the array datatype in Snowflake, the major functions and constants used to create arrays, and the benefit of the ARRAY_DISTINCT function.