UF1: Unified data with Redshift
- Login to SAP transaction RSO2 in SAP and create a generic data-source for your SAP sales order tables.
- Enter desired name for your extractor (ex: ZSEPM_SALES) and click Create.
- In the next screen, select New_Hier_Root as application component and update the description fields with the desired texts.
- Enter the table SEPM_SDDL_SO_JOI as the data-source in the view/table field to extract the Sales order header and item details.
- Click “Generic Delta” in the menu choose “CHANGED_AT” as the field to filter changed records. You do this to extract delta or changed records after the initial load and Click Save.
- Validate your data-source using Data-Source Test Extraction option from the menu.
Creating Source Datastore in SAP Data Services
- Create a new project by choosing New from the SAP Data Services Designer Application menu and provide a desired name for your project.
- From the Data Services object library, create a new data-store with a desired name for Source DataStore and Datastore Type as SAP Applications from the drop down menu.
- Enter your SAP application server name, login credentials and update the SAP client in the advanced tab and click OK.
- Right-click on import by name from your new data-store ODP objects to import the data-source “ZSEPM_SALES” you created in SAP.
- Click next and assign the project name to the project created in step one.
- Choose “Changed-data capture (CDC) option and leave the field name blank and Click Import to import the SAP ODP object to the data-store.
Once you import the datasource, you will see the datasource under the “ODP objects” tree list in the Datastores.
➡️ You can use the SAP Landscape Transformation (SLT) Replication Server based Extractor as your Source data-store. See SAP Data Services and SAP LT Server for near real-time replication lab for SLT based extractor.
Creating the Redshift tables
- Create the Redshift target table in Redshift using the Redshift Query Editor or an SQL Client using the below DDL scripts. See Amazon Redshift Lab for steps to create Amazon Redshift Cluster.
CREATE TABLE SALESRS.SALESORDER (
Creating the Target Datastore in SAP Data Services
- From the Data Services object library, create a new data-store with a desired name for Target DataStore and choose Datastore Type as “Amazon Redshift”.
- Click the ODBC connection setup for the connection to your Redshift cluster.
- From your DataSource name, click the ODBC Admin feature and click the configure option under the System DSN.
- Enter the Amazon Redshift Server, Port, Database name and user credentials for Auth type standard.
- Validate the connection by clicking Test. Then Click OK.
- Choose the new DataSource Name you created for your target Datastore and Click OK.
- From your new datastore, right click to choose “Import By Name” option to import the Amazon Redshift database table definition.
Creating data flow in SAP Data services
- From the SAP Data Services Project Area, right click to create “New Batch Job” with desired job name.
- Click on the new job created and drag the data flow option from the SAP Data Service Palette to create new data flow with desired data flow name.
Configure work flow in SAP Data services
- Click on the data flow created in the previous step, to access the data flow work space.
- Create a workflow by dragging and dropping the source ODP datasource (example: ZSEPM_SALES) from the datastore.
- Drag the query object from the transforms tab to the data flow workspace and connect it to the ODP object.
- Drag the query object the target Redshift SalesOrder datastore tables to the data flow workspace and connect the objects.
Ingest data to Amazon Redshift
- Double click the source datastore in the workflow, set the “Initial load” to Yes for the initial load.
- Click on the Query Transformation, and map the source fields to the target fields by “Map to Output” option.
- Now you can execute the Sales Order job to ingest the data to Amazon Redshift, and you can check the data that is loaded in the target Redshift database.
You can transform the data on the way to the target using the query transformation. Here we will add a regular expression to remove any special characters in the dataset before ingesting to Amazon Redshift Target. e.g To remove special characters in HEADER_NOTE_TEXT you can use a regular expression function regex_replace(<input_string/column_name>, ‘[^a-z0-9]’, ”, ‘CASE_INSENSITIVE’)
To add a transformation, choose the Query transformation, and ‘item_note_text’ in the Schema Out section. Under Mapping tab add the below regular expression function
regex_replace(ZSEPM_SALESORDER_S4HANATOREDSHIFT.HEADER_NOTE_TEXT, '\[^a-z0-9\]', '', 'CASE_INSENSITIVE')
➡️ Change Data Capture with SAP Data Services (CDC)
Both Source and Target Based CDC are supported by SAP Data Services for databases. See SAP documentation to learn about CDC options.
Note: SAP Data Services also supports SAP database level extraction from SAP as a source to access the underlying SAP tables. See SAP documentation to configure native ODBC data sources.
Amazon Connect setup to stream data to Amazon Redshift
➡️ Amazon Connect setup to stream data to Amazon Redshift
Amazon Connect can be integrated with Amazon Kinesis to stream Contact Trace Records (CTRs) into Amazon Redshift. See Amazon Connect QuickStart documentation on setting up and integrating Amazon Connect data.
Also, see Amazon Connect administration guide to review the Contact Trace Records (CTR) data model. You can ingest the contact trace records into Amazon Redshift, and join this data with the Sales Order details that you replicated from SAP source systems.
You can add any key value pairs to a CTR using contact attributes in the contact flow. You can add the SalesOrderId as a key value pair into a CTR using contact attribute “SAPOrder”. See,Amazon Connect administration guidefor information on how to use contact attributes to personalize the customer experience. For this example, we will use SalesOrderID to link the data in Amazon Connect to the Sales orders from SAP.
CTR JSON Data
- For this post, we will use the JSON data that is already created with a custom attribute and available in S3 location here.
- This is a JSON dataset from Amazon Connect workflow with a custom attribute “SAPOrder” added for the Agent to enter the OrderId related to the enquiry.
Creating Glue catalog by crawling the CTR data
- Upload the sample CTR JSON file to Amazon S3 bucket in your Amazon account.
- Open AWS Glue in AWS console, and create a new crawler by choosing “Crawler” and “Add Crawler”
Adding data store in AWS Glue
- You can use desired name for your crawler, for this example we will use amazonconnectsaplink as crawler name and choose Next in the next two screens.
- In Add a data store screen, enter or choose the Amazon S3 bucket name where you uploaded the Amazon Connect CTR file and choose Next.
Executing AWS Crawler
- Select no for Add another data store, and choose Next.
- Select Create an IAM role, AWSGlueServiceRole-ConnectSAP, and enter a name for the IAM role and choose Next.
- Select Run on demand for Frequency and choose Next.
- Choose Add database option and enter a desired database name under which you want to create the Glue catalog tables. For this example, we will enter amazonconnect.
- Click Finish.
- You should see the new crawler you just created. Select the Crawler and click Run crawler option.
- After the crawler completes running, you should see the table amznconnectsapordlink in your Glue Database.
Data modeling in Amazon Redshift
Amazon Redshift is a columnar store, massively parallel processing (MPP) data warehouse service, which provides integration to your data lake to enable you to build highly scalable analytics solutions.
Similar to calculation views in SAP HANA, you can create Amazon Redshift database views to build business logic combining tables. Redshift extends this capability to materialized views, where you can store the pre-computed results of queries and efficiently maintain them by incrementally processing the latest changes made to the source tables. Queries executed on the materialized views use the pre-computed results to run much faster.
- Create a new Amazon Redshift cluster. See Amazon Redshift Lab for steps to create Amazon Redshift Cluster.
- Ensure that your AWS IAM roles attached to Redshift cluster has necessary read and write IAM policies to access AmazonS3, AWS Glue.
➡️ Note: AmazonS3 should allow export of data from Amazon Redshift to Amazon S3. AWSGlueConsoleFullAccess is required to create a Glue catalog and enable query of CTR data using Amazon Redshift Spectrum.
- Login to Amazon Redshift using query editor. You can also use a SQL Client Tool ike DBeaver in this case. See, Amazon Redshift documentation to configure connection between SQL client tools and Amazon Redshift.
- Execute the external schema statement from the below code snippet. Alter the code snippet to use the appropriate AWS Glue data catalog database and the IAM role ARN you created in previous steps.
- Validate the new external schema and the external table in your Amazon Redshift database which should point to the Amazon Connect CTR data in S3.
drop schema if exists amazonconnect_ext;
create external schema amazonconnect_ext
from data catalog database 'amazonconnect'
iam_role 'arn:aws:iam:: <AWSAccount>:role/mySpectrumRole';
Consolidating data for unified view
- Execute the query from the below code snippets to combine the SAP Sales order data loaded from Data services and Amazon Connect CTR data in S3 and add transformations for call initiationtimestamp and add dimensions week, month, year, dayofmonth to enable ad-hoc analysis.
create table sapsales.amznconnectsaporder
as select amzcsap.awsaccountid, amzcsap.saporder , amzcsap.agent.AgentInteractionDuration as aidur,
amzcsap.awscontacttracerecordformatversion, amzcsap.channel, amzcsap.queue.arn as arn,
TO_TIMESTAMP(amzcsap.connectedtosystemtimestamp,'YYYY-MM-DD HH24:MI:SS') callconnectedtosystemtimestamp,
TO_TIMESTAMP(amzcsap.initiationtimestamp,'YYYY-MM-DD HH24:MI:SS') callinitiationtimestamp,
date_part(w, to_date(amzcsap.initiationtimestamp,'YYYY-MM-DD')) as callinitweek,
date_part(mon, to_date(amzcsap.initiationtimestamp,'YYYY-MM-DD')) as callinitmonth,
date_part(dow, to_date(amzcsap.initiationtimestamp,'YYYY-MM-DD')) as callinitdow,
date_part(yr, to_date(amzcsap.initiationtimestamp,'YYYY-MM-DD')) as callinityear,
date_part(d, to_date(amzcsap.initiationtimestamp,'YYYY-MM-DD')) as callinitdom
from amazonconnect_ext.amznconnectsapordlink amzcsap;
Note: You can execute the query from the below code snippet to create a view with desired data for visualization. This query can also be created as a materialized view to improve performance of this view
create view sapsales.v_orderenquiryanalytics
as select amzcsap.awsaccountid, amzcsap.saporder, amzcsap.aidur,
amzcsap.awscontacttracerecordformatversion, callconnectedtosystemtimestamp, callinitiationtimestamp,callinitiationdate,
amzcsap.channel, amzcsap.arn, sapord.sales_order_key, sapord.buyer_id, sapord.buyer_name,
sapord.item_gross_amount, sapord.item_product_id, sapord.item_delivery_date, sapord.created_at,
from sapsales.amznconnectsapord amzcsap, sapsales.salesorder sapord
where amzcsap.saporder = sapord.sales_order_id;
Validate the data output
Execute the query from the below code snippet to validate the data output for your analytics.
a. Top 10 buyers with most frequency and duration of calls
select buyer_id, buyer_name, sum(aidur) as callduration
group by buyer_id, buyer_name
order by callduration desc
b. Products causing more connect calls
select item_product_id, sum(aidur) as callduration
group by item_product_id
order by callduration desc
Visualization using Amazon QuickSight
- Login to Amazon QuickSight from the AWS Management Console.
- Choose Data Sets, and create a new data set.
- Enter a desired name for your data source.
- Chose the Amazon Redshift Instance ID, and enter the Database name, Username and Password.
- Choose Create Data Source.
- Choose the schema (example: sapsales) created in Amazon Redshift earlier.
- Choose the Redshift table (example: orderenquiryanalytics) you created in the previous step.
Setting up the Visualization
- Choose Directly query your data and click Visualize.
- On the visualize screen, choose +Add and click Add visual on the top left of the screen multiple times to create 4 visuals.
- For the first visual type, choose Vertical bar chart. Choose the first visual select aidur for y-axis and buyer_name for x-axis from the fields list.
- For the second visual, choose Pie chart for visual type and visual select aidur, and month.
- For the third visual, choose Horizontal bar chart for visual type and select aidur for x-axis and item_product_id for y-axis.
- For the fourth visual, choose Scatter plot chart and select dow for x-axis and aidur for y-axis.
- Finally, you should see the visualization as shown below.
Optional - Modeling using SAP HANA Studio
If you want to use HANA Studio or SAP Analytics Cloud as your modeling and front-end tool of choice, then you can connect to Amazon Redshift using Smart Data Access (SDA). Amazon Redshift view or materialized view can be accessed from SAP HANA Studio using the Simba connector.
➡️ For more details on SDA, see SAP documentation on SAP HANA Smart Data Access.
Optional- Visualize unified data using SAP Analytics Cloud
You can use the unload command to store the results your Redshift query as csv files to an Amazon S3 bucket for SAP Analytics Cloud (SAC) visualization. See [Amazon Redshift documentation] (https://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html) for unload examples.