Creating and Documenting Snowplow Schemas with ChatGPT

Jake McCambley
5 min readFeb 8, 2024

--

In the realm of web analytics, capturing nuanced user interactions is crucial for enhancing user experience and driving informed product decisions. Snowplow Analytics stands out as a robust platform for tracking custom events, offering deep insights into user behaviors. Recently I utilized ChatGPT to aid in the process of creating a snowplow schema for data collection, crafting a JSDoc type interface to bring this schema into our codebase, and writing a README to translate this jargon-heavy schema into a format easily consumed by end users of this data. At the end of this process I had three major wins:

  1. Useful data: I had a robust schema for tracking user behavior that would allow my team to gather more information about our homepage than ever before
  2. Type safety: I created a typesafe code implementation that allowed me to work quickly in implementing this data collection at multiple touchpoints or our expansive application
  3. Effective communication: I was able to present detailed information on how this data could be utilized most effectively to team members who don’t have the time or context to dive into code

The following is an anonymized example on how to do the get these same wins.

Photo by Madara Parma on Unsplash

1. Writing a Snowplow Schema Based on Things to Track

To kickstart your schema creation, identify the user interactions you wish to monitor, such as clicks, form submissions, or navigation events. For instance, if tracking interactions with an e-commerce search bar, you might want to capture the search context, action type, and the search query text.

Prompt:

I need a Snowplow schema to track users’ behaviors on our e-commerce platform. Our website features a search bar prominently displayed on both the homepage and the products page. Users can interact with the search bar by typing queries, selecting filters, or clicking the search button.

I need to track information regarding the following things:

- The context in which the search was initiated (either the homepage or the products page).

- The type of action performed by the user (typing a query, selecting a filter, or clicking the search button).

- The search query text entered by the user.

Could you please generate a Snowplow schema that captures these user interactions?

Example Snowplow Schema Output:

{
"$schema": "<http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#>",
"description": "Schema for tracking user interactions with a generic e-commerce search bar",
"self": {
"vendor": "com.generic",
"name": "ecommerce_search_interaction",
"format": "jsonschema",
"version": "1-0-0"
},
"type": "object",
"properties": {
"search_context": {
"type": "string",
"enum": ["homepage", "products_page"],
"description": "Where the search was initiated."
},
"action_type": {
"type": "string",
"enum": ["type_query", "select_filter", "click_search"],
"description": "The type of action the user performed."
},
"query_text": {
"type": "string",
"description": "The search query entered by the user."
}
},
"additionalProperties": true
}

2. Writing a JSDoc Type Interface That Describes the Schema Payload

Documenting the expected data structure with JSDoc provides clarity on how to implement tracking correctly, ensuring data consistency with the schema.

My preferred method of utilization is to create a closure for each self describing event tracked by snowplow. In this case, I might create a trackSearchEvents helper function that encloses trackSelfDescribingEvent then type the solitary argument of the enclosing function with the output JSDoc. Reducing the need for boilerplate every time the native snowplow function is used is an easy way to reduce potential for errors. Keep it DRY. Depending on your IDE, you’ll get great intellisense tips, and depending on your linting, you’ll get great inline errors if mistakes are made.

Prompt:

I have a Snowplow schema for tracking user interactions with our e-commerce search bar. Now, I need to create a JSDoc type interface that describes the structure of the schema payload. The schema captures information such as the search context (homepage or products page), the action type (type_query, select_filter, click_search), and the search query text entered by the user.

Please generate a JSDoc type interface that accurately describes the schema payload provided below. The interface should include annotations for each property, specifying the data type, possible values (if applicable), and a brief description of each field.

Here is the Snowplow schema:

{ schema }

Please generate the corresponding JSDoc type interface based on this schema.

Example JSDoc Type Interface:

/**
* @typedef {Object} EcommerceSearchInteraction
* @property {'homepage' | 'products_page'} search_context - Indicates where the search was initiated.
* @property {'type_query' | 'select_filter' | 'click_search'} action_type - The type of action performed by the user.
* @property {string} [query_text] - The search query text entered by the user. Optional.
*/

Any time a payload is passed to Snowplow’s trackSelfDescribingEvent function, engineers will see some helpful intellisense, and useful errors if properties are missing or invalid.

3. Using ChatGPT to Create a README

A README tailored for non-engineer stakeholders can translate technical schema details into actionable insights, explaining what data is collected and its potential uses.

Prompt:

Generate a readme file that describes what this schema tracks. Include the following sections.

## Overview
### Events tracked
### Important definitions

Additional section ideas to add to the prompt:

### Groupings
### Where do these events occur
### Example data
### Schema

Example README Content:

Overview

This README outlines the tracking of interactions with the search bar on our e-commerce platform, aiming to refine search functionality and user satisfaction.

Events Tracked

  • Search Context: Distinguishes whether the search is initiated from the homepage or the products page.
  • Action Type: Specifies the user’s interaction, including typing queries, selecting filters, or executing searches.

Important Definitions

  • Search Context: Indicates the website section where the search begins, possible values being homepage and products_page.
  • Action Type: Describes the interaction type, providing insights into user search preferences and behaviors.

4. (Bonus) Using Linear to Attach This README to a Feature Ticket

This README can be pasted into any rich text editor for useful documentation to be shared with stakeholders. In my case, I’m fortunate enough to be on a team that utilized Linear for issue tracking and documentation. Pasting this into the relevant issue, or as a document attached to a project, allows my team to stay up to date on how we track useful data across our site.

By following these steps to create a Snowplow schema, document it with JSDoc, and compile a README for broader team understanding, you bridge the gap between technical implementation and strategic product insights. The bonus step of integrating this documentation into Linear streamlines collaboration and keeps everyone aligned on project goals and metrics.

--

--

Jake McCambley

Learning to code by teaching others — Living at the cross section of tech, music, and the outdoors — Currently studying Web Development at Practicum by Yandex.