The Snowflake DSAC02 exam preparation guide is designed to provide candidates with necessary information about the SnowPro Advanced  Data Scientist exam. It includes exam summary, sample questions, practice test, objectives and ways to interpret the exam objectives to enable candidates to assess the types of questionsanswers that may be asked during the Snowflake Certified SnowPro Advanced  Data Scientist exam.
It is recommended for all the candidates to refer the DSAC02 objectives and sample questions provided in this preparation guide. The Snowflake SnowPro Advanced  Data Scientist certification is mainly targeted to the candidates who want to build their career in Advance domain and demonstrate their expertise. We suggest you to use practice exam listed in this cert guide to get used to with exam environment and identify the knowledge areas where you need more work prior to taking the actual Snowflake SnowPro Advanced  Data Scientist exam.
Snowflake DSAC02 Exam Summary:
Snowflake SnowPro Advanced  Data Scientist Syllabus:
Section 
Objectives 
Weight 
Data Science Concepts 
 Define machine learning concepts for data science workloads.

Machine Learning
 Supervised learning
 Unsupervised learning
 Outline machine learning problem types.

Supervised Learning
1. Structured Data
 Linear regression
 Binary classification
 Multiclass classification
 Timeseries forecasting
2. Unstructured Data
 Image classification
 Segmentation

Unsupervised Learning
 Clustering
 Association models
 Summarize the machine learning lifecycle.

Data collection

Data visualization and exploration

Feature engineering

Training models

Model deployment

Model monitoring and evaluation (e.g., model explainability, precision, recall, accuracy, confusion matrix)

Model versioning
 Define statistical concepts for data science.

Normal versus skewed distributions (e.g., mean, outliers)

Central limit theorem

Z and T tests

Bootstrapping

Confidence intervals

1520% 
Data Pipelining 
 Enrich data by consuming data sharing sources.

Snowflake Marketplace

Direct Sharing

Shared database considerations
 Build a data science pipeline.

Automation of data transformation with streams and tasks

Python UserDefined Functions (UDFs)

Python UserDefined Table Functions (UDTFs)

Python stored procedures

Integration with machine learning platforms (e.g., connectors, ML partners, etc.)

1520% 
Data Preparation and Feature Engineering 
 Prepare and clean data in Snowflake.

Use Snowpark for Python and SQL
 Aggregate
 Joins
 Identify critical data
 Remove duplicates
 Remove irrelevant fields
 Handle missing values
 Data type casting
 Sampling data
 Perform exploratory data analysis in Snowflake.

Snowpark and SQL
 Identify initial patterns (i.e., data profiling)
 Connect external machine learning platforms and/or notebooks (e.g., Jupyter)

Use Snowflake native statistical functions to analyze and calculate descriptive data statistics.
 Window Functions
 MIN/MAX/AVG/STDEV
 VARIANCE
 TOPn
 Approximation/High Performing function

Linear Regression
 Find the slope and intercept
 Verify the dependencies on dependent and independent variables
 Perform feature engineering on Snowflake data.

Preprocessing
 Scaling data
 Encoding
 Normalization

Data Transformations
 Data Frames (i.e, Pandas, Snowpark)
 Derived features (e.g., average spend)

Binarizing data
 Binning continuous data into intervals
 Label encoding
 One hot encoding
 Visualize and interpret the data to present a business case.

Statistical summaries
 Snowsight with SQL
 Streamlit
 Interpret opensource graph libraries
 Identify data outliers

Common types of visualization formats
 Bar charts
 Scatterplots
 Heat maps

3035% 
Model Development 
 Connect data science tools directly to data in Snowflake.

Connecting Python to Snowflake
 Snowpark
 Python connector with Pandas support
 Spark connector

Snowflake Best Practices
 One platform, one copy of data, many workloads
 Enrich datasets using the Snowflake Marketplace
 External tables
 External functions
 Zerocopy cloning for training snapshots
 Data governance
 Train a data science model.

Hyperparameter tuning

Optimization metric selection (e.g., log loss, AUC, RMSE)

Partitioning
 Cross validation
 Train validation holdout

Down/Upsampling

Training with Python stored procedures

Training outside Snowflake through external functions

Training with Python UserDefined Table Functions (UDTFs)
 Validate a data science model.

ROC curve/confusion matrix
 Calculate the expected payout of the model

Regression problems

Residuals plot
 Interpret graphics with context

Model metrics
 Interpret a model.

Feature impact

Partial dependence plots

Confidence intervals

1520% 
Model Deployment 
 Move a data science model into production.

Use an external hosted model
 External functions
 Prebuilt models

Deploy a model in Snowflake
 Vectorized/Scalar Python User Defined Functions (UDFs)
 Prebuilt models
 Storing predictions
 Stage commands
 Determine the effectiveness of a model and retrain if necessary.

Metrics for model evaluation
1. Data drift /Model decay
 Data distribution comparisons
> Do the data making predictions look similar to the training data?
> Do the same data points give the same predictions once a model is deployed?

Area under the curve

Accuracy, precision, recall

User defined functions (UDFs)
 Outline model lifecycle and validation tools.

Streams and tasks

Metadata tagging

Model versioning with partner tools

Automation of model retraining

1520% 