As a Snowflake Elite Services Partner, we keep an eye on new releases to ensure we’re providing our clients with quality services and solutions. One of Snowflake’s most recent releases, 7.1, introduced several exciting new capabilities and functions, one of which really caught our attention: MIN_BY / MAX_BY.
What are the new MIN_BY / MAX_BY functions?
With these new functions, you can quickly return the value(s) in one column conditionally based on another column’s maximum or minimum value. Answering this question has always been possible with often complicated subqueries or Common Table Expressions (CTEs). With MIN_BY and MAX_BY, we get a clean function that performs the same task with much less code. We recommend reviewing Snowflake’s documentation for more details, but an overview of the syntax to use in your SELECT statement appears below.
MIN_BY(<col_to_return>, <col_containing_minimum> [, <maximum_number_of_values_to_return> ] )
MAX_BY(<col_to_return>, <col_containing_maximum> [, <maximum_number_of_values_to_return> ] )
If you’re replicating these queries, note the schema: TPCH_SF100. Learn more about the process here.
Let’s assume we want to return the order key from the order with the highest total price. Without the new MIN_BY / MAX_BY functions, we would have to use a subquery or a CTE to return our answer. Before this release, you may have answered the question with a CTE:
WITH order_price as (
SELECT max(o_totalprice) as total_price
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.ORDERS )
SELECT o_orderkey as o_orderkey_cte
FROM order_price
LEFT JOIN SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.ORDERS as o
ON order_price.total_price = o.o_totalprice;
The result returned from this query is:
Now, let’s return the same answer with the MAX_BY function. Using far fewer lines of code, our query and result are now:
SELECT max_by(o_orderkey, o_totalprice)
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.ORDERS;
One important point to remember is that this function is not deterministic. That is, if your <col_containing_minimum> or <col_containing_maximum> has multiple records with the same minimum or maximum value, the function may return any one of the corresponding values from the <col_to_return>. To overcome this, the function’s optional parameter allows you to return up to a certain number of values.
A bug with the optional parameter prevents users from invoking the function when the column you seek to return has a NOT NULL constraint applied. We are communicating with Snowflake support on this and will provide an update when it is resolved. Below is a further explanation by way of an example, so that you know exactly when you can and cannot use this function until the bug is fixed.
Create a table:
CREATE or REPLACE TABLE TEST.TEST.T (
A VARCHAR(500) ,
B VARCHAR(500) NOT NULL,
C TIMESTAMP_NTZ(9) NOT NULL
);
Using the column (B) with NOT NULL fails:
SELECT max_by(B, C, 1) FROM TEST.TEST.T;
Using the column (A) without the constraint succeeds:
SELECT max_by(A, C, 1) FROM TEST.TEST.T;
MIN_BY / MAX_BY – The bottom line
Utilizing Snowflake’s new MIN_BY / MAX_BY functions, we produced the same result using half as much code. These improvements towards cleaner, simpler code are far more readable and can save engineers and analysts substantial time. With more updates on the horizon, the opportunities for optimizing your data are near endless.
Curious to hear more about how DAS42 can help you make the most of your Snowflake instance? Reach out at hello@das42.com, and we’d be happy to help!