Exploring using Athena and QuickSight

Introduction

This lab introduces you to AWS Glue, Amazon Athena, and Amazon QuickSight. AWS Glue is a fully managed data catalog and ETL service. Amazon Athena queries data; and Amazon QuickSight provides visualization of the data you import.

Steps

  • Prerequisites
  • Getting Started
  • Query Data with Amazon Athena
  • Build an Amazon QuickSight Dashboard
  • Set up QuickSight
  • Create QuickSight Charts
  • Create QuickSight Parameters
  • Create a QuickSight Filter
  • Add Calculated Fields
  • Amazon QuickSight ML-Insights (Optional)
  • Athena Workgroups to Control Query Access and Costs (Optional)
  • Workflow setup to separate workloads
  • Explore the features of workgroups
  • Managing Query Usage and Cost
  • Cost Allocation Tags  

Prerequisites

The DMS Lab and Glue ETL lab is a prerequisite for this lab.

Getting Started

In this lab, you will complete the following tasks:

  • Query data and create a view with Amazon Athena
  • Athena Workgroups to Control Query Access and Costs
  • Build a dashboard with Amazon QuickSight

Query Data with Amazon Athena

  • In the AWS services console, search for Athena. Make sure to choose Sydney region.quick
  • If you are using Athena first time, click on “Get Started” button in introduction screen.
  • Setup the S3 buckets to store the query results. For this, Navigate to S3 console, click on bucket created as part of student lab (for e.g: ). Inside bucket, click on “create folder”. Name folder as “athenaquery”. Click on “Save”.
  • Note down the path of S3 folder created above and save it.
  • In Athena Console, click on Settings.
  • Copy paste the folder path as shown below:
  • In the Query Editor, select your newly created database e.g., “ticketdata”.
  • Click the table named “parquet_sporting_event_ticket” to inspect the fields.

The type for fields id, sporting_event_id and ticketholder_id should be (double).

Next, we will query across tables parquet_sporting_event, parquet_sport_team, and parquet_sport location.

  • Copy the following SQL syntax into the New Query 1 tab and click Run Query.
SELECT 
    e.id AS event_id,
    e.sport_type_name AS sport,
    e.start_date_time AS event_date_time,
    h.name AS home_team,
    a.name AS away_team,
    l.name AS location,
    l.city
FROM parquet_sporting_event e,
    parquet_sport_team h,
    parquet_sport_team a,
    parquet_sport_location l
WHERE 
    e.home_team_id = h.id 
    AND e.away_team_id = a.id 
    AND e.location_id = l.id;

The results appear beneath the query window.

  • As shown above Click Create and then select Create view from query
  • Name the view sporting_event_info and click Create. Your new view is created
  • Copy the following SQL syntax into the New Query 3 tab and click on Save Query.
SELECT t.id AS ticket_id,
    e.event_id,
    e.sport,
    e.event_date_time,
    e.home_team,
    e.away_team,
    e.location,
    e.city,
    t.seat_level,
    t.seat_section,
    t.seat_row,
    t.seat,
    t.ticket_price,
    p.full_name AS ticketholder
FROM sporting_event_info e,
    parquet_sporting_event_ticket t,
    parquet_person p
WHERE 
    t.sporting_event_id = e.event_id
    AND t.ticketholder_id = p.id

  • Click on Save as button Give this query a name: create_view_sporting_event_ticket_info and some description and then, click on Save.

Back to the query editor, you will see the query name changed. Now, click on Run Query.

The results appear beneath the query window.

  • As shown above Click Create view from query.
  • Name the view “sporting_event_ticket_info” and click Create.

  • Copy the following SQL syntax into the New Query 3 tab.
SELECT 
  sport, 
  count(distinct location) as locations, 
  count(distinct event_id) as events, 
  count(*) as tickets,
  avg(ticket_price) as avg_ticket_price
FROM sporting_event_ticket_info 
GROUP BY 1
ORDER BY 1;

Click on Save Query and give this query name: analytics_sporting_event_ticket_info and some description and then, click on Save.

The name of the New Query 3 will be changed to one assigned in previous step. Click on Run Query.

You query returns two results in approximately five seconds. The query scans 25 MB of data, which prior to converting to parquet, would have been 1.59GB of CSV files.

The purpose of saving the queries is to have clear distinction between the results of the queries running on one view. Otherwise, your query results will be saved under “Unsaved” folder within the S3 bucket location provided to Athena to store query results. Please navigate to S3 bucket to observe these changes, as shown below:

Build an Amazon QuickSight Dashboard

Set up QuickSight
  • In the AWS services console, search for QuickSight.
  • Click Sign up for QuickSight.
  • For account type, choose Standard. If you plan to complete the Bonus Exercise, please choose Enterprise Version
  • Click Continue.
  • On the Create your QuickSight account page, fill out your name and email address.
  • Select region and the check boxes to enable autodiscovery, Amazon Athena, and Amazon S3.
  • Click Choose S3 buckets and select your DMS bucket (e.g., “dmslab-student-").
  • Click Finish.
  • On the QuickSight landing page, on the top right corner, click on “Manage QuickSight.
  • Choose “Security and Permissions” and under “QuickSight access to AWS Services, click on “Add or Remove” button.
  • If you will observe there is an unchecked box against S3 buckets for “dmslab-student-“, please check the box.
  • Select Amazon S3. Select the S3 bucket created as part of student lab (for e.g: dmslab-student-dmslabs3bucket-xg1hdyq60ibs) which will have all the folders for your source data.
  • Then, click on Finish
  • You will observe that now there is a check mark in the checkbox for Amazon S3. This confirms that QuickSight has required permissions. Then, click on “Update”.
  • Navigate to QuickSight landing page by clicking on the QuickSight logo on the top left. On the top right corner, click Manage Data.
  • Click New Data Set.
  • On the Create a Data Set page, select Athena as the data source.
  • For Data source name, type “ticketdata-qs” and click Validate connection.
  • Click Create data source.
  • In the Database drop-down list, select the database name you created in the AWS Glue lab.
  • Choose the “sporting_event_ticket_info” table and click Select.
  • To finish data set creation, choose the option Import to SPICE for quicker analytics and click Visualize.
  • You will now be taken to the QuickSight Visualize interface where you can start building your dashboard.

Note: The SPICE dataset will take a few minutes to be built, but you can continue to create some charts on the underlying data.

Create QuickSight Charts

In this section we will take you through some of the different chart types.

  • In the Fields list, click the ticket_price column to populate the chart.
  • Click the expand icon in corner of ticket_price field and select format as currency to show numbers in dollar amount.
  • You can add new visual and keep building your dashboard by clicking Add button at top left corner of screen. In the Visual types area, choose the Vertical bar chart icon. This layout requires a value for the X-axis. In Fields list, select the “event_date_time” field and you should see the visualization update. For Y-axis, select “ticket_price” from the Field list.
  • Add new Visual and you can drag and move other visuals to adjust space in dashboard. In the Fields list, click and drag the seat_level field to the Group/Color box in the Field wells pane. You can also use the slider below the x axis to fit all of the data.

Let’s build on this one step further by changing the chart type to “Clustered bar combo chart” and adding in the ticketholder field for the Lines. 5. In the Visual types area, choose the Clustered bar combo chart icon. 6. In the Fields list, click and drag the ticketholder field to the Lines box in the Field wells pane. 7. In the Field wells pane, click the Lines box and choose Count Distinct for Aggregate. You can then see the y-axis update on the right-hand side.

  • Click on insight icon on the left tabs section and explore insight information in simple English.

Feel free to experiment with other chart types and different fields to get a sense of the data.

Create QuickSight Parameters

In the next section we are going to create some parameters with controls for the dashboard, then assign these to a filter for all the visuals.

  • In the left navigation menu, select Parameters.
  • Click Create one to create a new parameter with a Name.
  • Name: EventFrom.
  • Data type: Datetime.
  • Default value select the value from calendar as start date available in your graph for event_date_time. For example, 2020-01-01 00:00.
  • Click Create, and then close the Parameter Added dialog box.
  • Create another parameter with the following attributes:
    • Name: EventTo
    • Data type: Datetime
    • For Default value, select the value from calendar as end date available in your graph for event_date_time. For example, 2021-01-01 00:00
    • Click Create
  • In next window, you can select any option to perform any operation with the parameter. Alternatively, you can click the drop-down menu for the EventFrom parameter and choose Add control.
  • Click on Control
  • For Display name, specify Event From and click Add.
  • Repeat the process to add a control for EventTo with display name Event To. You should now be able to see and expand the Controls section above the chart.

Create a QuickSight Filter

To complete the process, we will wire up a filter to these controls for all visuals.

  • In the left navigation menu, choose Filter.
  • Click the plus icon (+) to add a filter for the field event_date_time.
  • Click this filter to edit the properties.
  • Choose to make this filter apply to All visuals.
  • For Filter type, choose Time range and Between.
  • Select option Use Parameter.
  • For Start date parameter, choose EventFrom.
  • For End date parameter, choose EventTo.
  • Click Apply.

Add Calculated Fields

In the next section, you will learn, how to add calculated fields for “day of week” and “hour of day” to your dataset and a new scatter plot for these two dependent variables.

  • Click the Add button on the top left and select Add a calculated field.
  • For Calculated field name type event_day_of_week.
  • For Formula, type extract("WD",{event_date_time}) Note: extract returns a specified portion of a date value. Requesting a time-related portion of a date that doesn’t contain time information returns 0. WD: This returns the day of the week as an integer, with Sunday as 1.
  • Click Create.
  • Add another calculated field with the following attributes: a. Calculated field name: “event_hour_of_day” b. Formula: extract(“HH”,{event_date_time}) Note: HH: This returns the hour portion of the date.
  • Click Add button in the top left and choose Add visual.
  • For field type, select the scatter plot.
  • In the Fields list, select and drag the following attributes to the Field wells pane to set the graph attributes: event_hour_of_day” event_hour_of_day”
    • X-axis: event_hour_of_day
    • Y-axis: event_day_of_week
    • Size: ticket_price Since now you have completed your dashboard then you can publish it by clicking on top right corner of screen.

A dashboard is a read-only snapshot of an analysis that you can share with other Amazon QuickSight users for reporting purposes. In Dashboard other users can still play with visuals and data but that will not modify dataset.

You can share an analysis with one or more other users with whom you want to collaborate on creating visuals. Analysis provide other uses to write and modify data set.