Leveraging Snowflake’s New AI Features for a Nuanced Sentiment Analysis at Scale

Published on July 24, 2024

/

Leveraging Snowflake’s New AI Features for a Nuanced Sentiment Analysis at Scale

Published on July 24, 2024 | 1 mins read

From Basic Polarity to In-Depth Text Categorization

When we think of “sentiment analysis”, what most commonly comes to mind is probably something along the lines of measuring the positive or negative attitudes of freetext feedback, such as satisfaction surveys or product or service reviews. In short, ideally, “LOVE!!!! This is the best tool I’ve ever used” would score 1, “it’s fine” could be a 0, and “if I could give 0 stars, I would. Absolute garbage” would be a -1.

For a large number of use cases, this is exactly what’s needed and gets the job done, and Snowflake Cortex’s SENTIMENT function will make jobs simpler and allow analysts and data scientists to do that analysis directly in Snowflake, streamlining workflows. For example, this script assesses the positive or negative sentiment of incoming Jira tickets:

with jira as (
    select distinct
        description, id
    from 
        internal_db.jira.issue
)

select 
    id as ticket_id,
    SNOWFLAKE.CORTEX.SENTIMENT(description) as desc_sentiment
from 
     jira;

However, when you’re looking to do a deeper analysis of sentiment, not just using positive/negative but a further sub-categorization of specific characteristics of text, look no further than Snowflake Cortex COMPLETE.

In the example below, I want to assess certain qualities of product reviews. In this script, I begin with a table ‘reviews’ with two freetext columns, ‘rev_title’ and ‘rev_content’. I first select distinct on these elements, since if there happen to be any identical reviews, I only need to assess that content once:

with rev as (
    select distinct
      rev_title,
      rev_content
    from reviews
),

Then, I create a text prompt for my large language model, giving it instructions on how to respond specifically that will allow me to programmatically parse its response. I feed it my questions and also the text of each unique review.

rev_prompt as (
  select distinct
      *,
      concat('You are a tool that only responds "yes" or "no" to each of prompt questions. Analyze the sentiment of this content, which consists of reviews for consumer products, and answer with only "yes" or "no" for each question, providing no other context. Your prompt questions are: is this content informational? is this content helpful? is this content apologetic? is this content sarcastic? is this content threatening? does this content provoke fear? does this content provoke amusement? does this content tell a story? The content context is: Title: ', rev_title, 'content: ', rev_content) as response_prompt
  from rev
),

Then I run this prompt through Snowflake Cortex’s COMPLETE function, specifying which large language model I want to use. You can learn more about supported LLMs, read about cost considerations, and confirm which are available for your Snowflake instance’s cloud and region. I definitely suggest tweaking your prompt and experimenting with different models to find a result that’s optimal, i.e. consistent, similar enough to how a trained human would assess it, and easy to parse! I found that llama3-8b gave consistent results while not using as many tokens – as much spend – as the models Snowflake classifies as “large”.

Notice that my prompt is explicit about the format in which I want my results, even repeating that instruction in multiple ways to increase compliance. I have it formatted to answer multiple questions per row, using a single prompt – but as I add questions, I may get less reliable answers. For my use case, I’ve found a balance of “bang for my buck”. Slight variations in prompt (or context) will result in different answers, so keep tweaking if you aren’t getting the result you want.

rev_complete as (
  select 
    rev_title,
    rev_content,
    lower(snowflake.cortex.complete(
            'llama3-8b',
            rev_prompt.response_prompt))
            as response
  from 
    rev_prompt
)

Finally, I parse the response:

select 
    rev_title,
    rev_content,
    trim(substring(split_part(response, '? ',2),1,3)) as informational,
    trim(substring(split_part(response, '? ',3),1,3)) as helpful,
    trim(substring(split_part(response, '? ',4),1,3)) as apologetic,
    trim(substring(split_part(response, '? ',5),1,3)) as sarcastic,
    trim(substring(split_part(response, '? ',5),1,3)) as threatening,
    trim(substring(split_part(response, '? ',6),1,3)) as fear_provoking,
    trim(substring(split_part(response, '? ',8),1,3)) as amusement_provoking,
    trim(substring(split_part(response, '? ',7),1,3)) as tells_a_story
from rev_complete

And just like that, each review is categorized as having or not having each of the eight identified characteristics. I can use this information to identify customer satisfaction trends, the same way I would with the SENTIMENT function, but I can also identify good reviews to feature and get a more nuanced understanding of how customers are responding to a given product.

There are so many ways to use Snowflake Cortex’s COMPLETE function, and this example shows the tool’s scalability, allowing you to get answers to multiple questions and multiple pieces of content by running a single query.

DAS42: Your Partner in Leveraging Snowflake’s Innovations

Need help understanding the latest Snowflake features and how to use them? DAS42, a Snowflake Elite Services Partner, can help you unlock “the art of the possible” with your data, optimize your existing workflows, and migrate from legacy platforms into a holistic, secure, and customized cloud-based solution.

Get Started Today

DAS42 is a premier data and analytics consultancy with a modern point of view. We specialize in solving some of the most complex business challenges for the world’s most successful companies. As a Snowflake Elite Partner, DAS42 crafts customized strategies that create a single source of truth and enable enhanced and faster decision-making. DAS42 has a presence across the U.S. with primary offices in New York City and Denver. Connect with us at das42.com and stay updated on LinkedIn. Join us today on our journey to help you realize the possibilities of transforming your business through data and analytics.