Notebooks
S
Snowflake
Feature Store API Overview

Feature Store API Overview

data-sciencenotebookmachine-learningsnowflake-demo-notebooksdata-engineeringFeature Store API OverviewPythonsql
  • Last updated on: 8/26/2024
  • Required snowflake-ml-python version: >=1.6.1

Feature Store API Overview

This notebook provides an overview of Feature Store APIs. It demonstrates how to manage Feature Store, Feature Views, Feature Entities and how to retrieve features and generate training datasets etc. The goal is to provide a quick walkthrough of the most common APIs. For a full list of APIs, please refer to API Reference page.

Note: there may be a delay in the availability of the newest snowflake-ml-python package in the Snowflake Conda channel. To install the latest snowflake-ml-python package which includes all of necessary components used in this notebook, please follow the install instructions here.

Set up connection and test dataset

Let's start with setting up out test environment. We will create a session and a schema. The schema FS_DEMO_SCHEMA will be used as the Feature Store. It will be cleaned up at the end of the demo. You need to fill the connection_parameters with your Snowflake connection information. Follow this guide for more details about how to connect to Snowflake.

[1]
[2]
[Row(status='Schema SNOWFLAKE_FEATURE_STORE_NOTEBOOK_DEMO successfully created.')]

We have prepared some examples which you can find in our open source repo. Each example contains the source dataset, feature view and entity definitions which will be used in this demo. ExampleHelper (included in snowflake-ml-python) will setup everything with simple APIs and you don't have to worry about the details.

[3]

We can quickly look at the newly generated source tables.

[4]
"REGTEST_DB".SNOWFLAKE_FEATURE_STORE_NOTEBOOK_DEMO.citibike_trips:

Manage features in Feature Store

Now we're ready to create a Feature Store. The sections below showcase how to create a Feature Store, entities, feature views and how to work with them.

Initialize a Feature Store

Firstly, we create a new (or connect to an existing) Feature Store.

[5]

Create entities

Before we can create feature views, we need to create entities. The cell below registers the entities that are pre-defined for this example, and loaded by helper.load_entities().

[6]
--------------------------------------------------------------------------------
|"NAME"          |"JOIN_KEYS"         |"DESC"                     |"OWNER"     |
--------------------------------------------------------------------------------
|END_STATION_ID  |["END_STATION_ID"]  |The id of an end station.  |REGTEST_RL  |
|TRIP_ID         |["TRIP_ID"]         |The id of a trip.          |REGTEST_RL  |
--------------------------------------------------------------------------------

You can get registered entities by name from Feature Store.

[7]

Create feature views

Next, we can register feature views. Feature views also are pre-defined in our repository. You can find the definitions here.

[8]
----------------------------------------------------------------------------------
|"NAME"     |"VERSION"  |"DESC"                                 |"REFRESH_FREQ"  |
----------------------------------------------------------------------------------
|F_STATION  |1.0        |Station features refreshed every day.  |1 day           |
|F_TRIP     |1.0        |Static trip features                   |NULL            |
----------------------------------------------------------------------------------

Note that you can specify feature view versions and attach descriptive comments in the “DESC” field to make search and discovery of features easier.

Add feature view versions

We can also add new versions in a feature view by using the same name as an existing feature view but a different version.

[9]
/var/folders/kw/c3pzglr908q2p0w5w9vzhy0m0000gn/T/ipykernel_22612/1306387384.py:2: UserWarning: You must call register_feature_view() to make it effective. Or use update_feature_view(desc=<new_value>).
  fv.desc = f'{fv.name}/2.0 with new desc.'
----------------------------------------------------------------------------------
|"NAME"     |"VERSION"  |"DESC"                                 |"REFRESH_FREQ"  |
----------------------------------------------------------------------------------
|F_STATION  |1.0        |Station features refreshed every day.  |1 day           |
|F_STATION  |2.0        |F_STATION/2.0 with new desc.           |1 day           |
|F_TRIP     |1.0        |Static trip features                   |NULL            |
|F_TRIP     |2.0        |F_TRIP/2.0 with new desc.              |NULL            |
----------------------------------------------------------------------------------

Update feature views

After a feature view is registered, it is materialized to Snowflake backend. You can still update some metadata for a registered feature view with update_feature_view. Below cell updates the desc of a managed feature view. You can check our API reference page to find the full list of metadata that can be updated.

[10]
----------------------------------------------------------------------------------------------
|"NAME"     |"VERSION"  |"DESC"                        |"REFRESH_FREQ"  |"SCHEDULING_STATE"  |
----------------------------------------------------------------------------------------------
|F_STATION  |1.0        |Updated desc for f_station.   |1 day           |ACTIVE              |
|F_STATION  |2.0        |F_STATION/2.0 with new desc.  |1 day           |ACTIVE              |
----------------------------------------------------------------------------------------------

Operate feature views

For managed feature views, you can suspend, resume, or manually refresh the backend pipelines. A managed feature view is an automated feature pipeline that computes the features on a given schedule. You create a managed feature view by setting the refresh_freq. In contrast, a static feature view is created when refresh_freq is set to None.

[11]
----------------------------------------------------------------------------------------------
|"NAME"     |"VERSION"  |"DESC"                        |"REFRESH_FREQ"  |"SCHEDULING_STATE"  |
----------------------------------------------------------------------------------------------
|F_STATION  |1.0        |Updated desc for f_station.   |1 day           |SUSPENDED           |
|F_STATION  |2.0        |F_STATION/2.0 with new desc.  |1 day           |ACTIVE              |
|F_TRIP     |1.0        |Static trip features          |NULL            |NULL                |
|F_TRIP     |2.0        |F_TRIP/2.0 with new desc.     |NULL            |NULL                |
----------------------------------------------------------------------------------------------

[12]
----------------------------------------------------------------------------------------------
|"NAME"     |"VERSION"  |"DESC"                        |"REFRESH_FREQ"  |"SCHEDULING_STATE"  |
----------------------------------------------------------------------------------------------
|F_STATION  |1.0        |Updated desc for f_station.   |1 day           |ACTIVE              |
|F_STATION  |2.0        |F_STATION/2.0 with new desc.  |1 day           |ACTIVE              |
|F_TRIP     |1.0        |Static trip features          |NULL            |NULL                |
|F_TRIP     |2.0        |F_TRIP/2.0 with new desc.     |NULL            |NULL                |
----------------------------------------------------------------------------------------------

[13]
----------------------------------------------------------------------------------------------------------------------
|"NAME"         |"STATE"    |"REFRESH_START_TIME"              |"REFRESH_END_TIME"                |"REFRESH_ACTION"  |
----------------------------------------------------------------------------------------------------------------------
|F_STATION$1.0  |SUCCEEDED  |2024-08-06 09:41:17.171000-07:00  |2024-08-06 09:41:17.547000-07:00  |INCREMENTAL       |
|F_STATION$1.0  |SUCCEEDED  |2024-08-06 09:42:56.835000-07:00  |2024-08-06 09:42:57.612000-07:00  |INCREMENTAL       |
|F_STATION$1.0  |SUCCEEDED  |2024-08-06 09:43:34.390000-07:00  |2024-08-06 09:43:34.884000-07:00  |INCREMENTAL       |
|F_STATION$1.0  |SUCCEEDED  |2024-08-06 09:44:10.294000-07:00  |2024-08-06 09:44:10.860000-07:00  |INCREMENTAL       |
----------------------------------------------------------------------------------------------------------------------

The cell below manually refreshes a feature view. It triggers the feature computation on the latest source data. You can check the refresh history with get_refresh_history() and you will see updated results from previous get_refresh_history().

[14]
----------------------------------------------------------------------------------------------------------------------
|"NAME"         |"STATE"    |"REFRESH_START_TIME"              |"REFRESH_END_TIME"                |"REFRESH_ACTION"  |
----------------------------------------------------------------------------------------------------------------------
|F_STATION$1.0  |SUCCEEDED  |2024-08-06 09:41:17.171000-07:00  |2024-08-06 09:41:17.547000-07:00  |INCREMENTAL       |
|F_STATION$1.0  |SUCCEEDED  |2024-08-06 09:42:56.835000-07:00  |2024-08-06 09:42:57.612000-07:00  |INCREMENTAL       |
|F_STATION$1.0  |SUCCEEDED  |2024-08-06 09:43:34.390000-07:00  |2024-08-06 09:43:34.884000-07:00  |INCREMENTAL       |
|F_STATION$1.0  |SUCCEEDED  |2024-08-06 09:44:10.294000-07:00  |2024-08-06 09:44:10.860000-07:00  |INCREMENTAL       |
|F_STATION$1.0  |SUCCEEDED  |2024-08-06 09:44:48.016000-07:00  |2024-08-06 09:44:48.449000-07:00  |INCREMENTAL       |
----------------------------------------------------------------------------------------------------------------------

Retrieve values from a feature view

You can read the feature value of a registered feature view with read_feature_view().

[15]
------------------------------------------------------------------------
|"END_STATION_ID"  |"F_COUNT"  |"F_AVG_LATITUDE"  |"F_AVG_LONGTITUDE"  |
------------------------------------------------------------------------
|505               |483        |40.74901271       |-73.98848395        |
|161               |429        |40.72917025       |-73.99810231        |
|347               |440        |40.72873888       |-74.00748842        |
|466               |425        |40.74395411       |-73.99144871        |
|459               |456        |40.746745         |-74.007756          |
|247               |241        |40.73535398       |-74.00483090999998  |
|127               |481        |40.73172428       |-74.00674436        |
|2000              |121        |40.70255088       |-73.98940236        |
|514               |272        |40.76087502       |-74.00277668        |
|195               |219        |40.70905623       |-74.01043382        |
------------------------------------------------------------------------

Generate training data

We can generate training data easily from Feature Store and output it either as a Dataset object, or as Snowpark DataFrame. The cell below creates a spine dataframe by randomly sampling some entity keys from source table. generate_dataset() then creates a Dataset object by populating the spine_df with respective feature values from selected feature views.

[16]

Use generate_dataset() to output a Dataset object.

[17]

Convert dataset to Pandas DataFrame and look at the first 10 rows.

[18]

Dataset object materializes data in Parquet files on internal stages. Alternatively, you can use generate_training_set() to output training data as a DataFrame.

[19]
------------------------------------------------------------------------
|"END_STATION_ID"  |"F_COUNT"  |"F_AVG_LATITUDE"  |"F_AVG_LONGTITUDE"  |
------------------------------------------------------------------------
|195               |219        |40.70905623       |-74.01043382        |
|398               |69         |40.69165183       |-73.9999786         |
|329               |361        |40.72043411       |-74.01020609        |
|498               |368        |40.74854862       |-73.98808416        |
|319               |252        |40.71336124       |-74.00937622        |
|369               |265        |40.73224119       |-74.00026394        |
|459               |456        |40.746745         |-74.007756          |
|311               |228        |40.7172274        |-73.98802084        |
|480               |242        |40.76669671       |-73.99061728        |
|127               |481        |40.73172428       |-74.00674436        |
------------------------------------------------------------------------

Delete feature views

Feature views can be deleted via delete_feature_view().

Warning: Deleting a feature view may break downstream dependencies for other feature views or models that depend on the feature view being deleted.

[20]
----------------------
|"NAME"  |"VERSION"  |
----------------------
|        |           |
----------------------

Delete entities

You can delete entity with delete_entity(). Note it will check whether there are feature views registered on this entity before it gets deleted, otherwise the deletion will fail.

[21]
-------------------------------------------
|"NAME"  |"JOIN_KEYS"  |"DESC"  |"OWNER"  |
-------------------------------------------
|        |             |        |         |
-------------------------------------------

Cleanup Feature Store (experimental)

Currently we provide an experimental API to delete all entities and feature views in a Feature Store for easy cleanup. If "dryrun" is set to True (the default) then fs._clear() only prints the objects that will be deleted. If "dryrun" is set to False, it performs the deletion.

[22]
/tmp/snowml/snowflake/ml/feature_store/feature_store.py:190: UserWarning: It will clear ALL feature views and entities in this Feature Store. Make sure your role has sufficient access to all feature views and entities. Insufficient access to some feature views or entities will leave Feature Store in an incomplete state.
  return f(self, *args, **kargs)

Clean up notebook

[23]
[Row(status='SNOWFLAKE_FEATURE_STORE_NOTEBOOK_DEMO successfully dropped.')]