Deployment of On-Demand Cluster

Create and Configure Azure Data Factory

In this lab, you will learn how to create on-demand Azure Databricks cluster and run jobs using Azure Data Factory.

Setup

Setting up Azure blob Storage

1.In the Azure portal tab in your browser and click  Create a resource.

In the Storage category, click Storage account.
Create a new storage account with the following settings:

Name: Specify a unique name (and make a note of it)
Deployment model: Resource manager
Account kind: Storage (general purpose v1) -Location: Choose the same location as your Databricks workspace
Replication: Locally-redundant storage (LRS)
Performance: Standard
Secure transfer required: Disabled
Subscription: Choose your Azure subscription
Resource group: Choose the existing resource group for your Databricks workspace •-Virtual networks: Disabled

Wait for the resource to be deployed. Then view the newly deployed storage account.
In the blade for your storage account, click Blobs.
In the Browse blobs blade, click Container, and create a new container with the following settings: • Name: spark • Public access level: Private
In the Settings section of the blade for your blob store, click Access keys and note the Storage account name and key1 values on this blade – you will need these in the next procedure.
Go to Storage explorer ( preview) and then create a folder data inside the container spark
Upload the file IISlog.txt

Task 1

Import the Notebook

Go to the Azure Databricks workspace
Click import and import ProcessLog.py
Go to the Account
Go to user settings and click Generate new token
Note down the token

Task 2

Create an Azure data factory

• Go to Create a Resource | Analytics | Data Factory

• Provide the details a. Unique name b. Select the resource group already used in the lab c. Use location as west Europe

Task 3

Use Edge \Chrome

Create a Linked Service

• Go to the newly created resource and click on author and monitoring

• A new window will open in the browser. Wait for few minutes to open

• Click on Author

• Click on Connections and under linked services , click new

• Select the compute tab and choose Azure Databricks and then select continue

• Provide the following details a. Unique name b. Provide the ADB access token c. Rest as follows

• Go to advance and under spark conf settings , add two name value pair

i. Name: fs.azure.account.key.databrickshacks.blob.core.windows.net

Value : XXX Storage account Key

ii. Name : spark.hadoop.fs.azure.account.key.databrickshacks.blob.core.windows.net

Value : XXX storage account key

d. Click Finish

Task 4

Create a Pipeline

Go to authoring page and click on pipeline and select Add Pipeline
Select DataBricks and Drag Notebook
Give a unique notebook name
Under Azure Databricks, select the newly created ADB linked service
Go to Settings and click browse.
To add notebook , traverse and add Processlog
Click Publish All

Task 5

Run and monitor

Once published Successfully , add trigger and trigger now.
Click finish
Go to Monitor
Monitor the pipeline and observe to the pipeline
Go to the blob\spark\data and find a part_ file that got created.
Download and view the data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deployment of On-Demand Cluster

Create and Configure Azure Data Factory

Setup

Setting up Azure blob Storage

Task 1

Import the Notebook

Task 2

Create an Azure data factory

Task 3

Create a Linked Service

Task 4

Create a Pipeline

Task 5

Run and monitor

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
IISlog.txt		IISlog.txt
ProcessLog.py		ProcessLog.py
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Deployment of On-Demand Cluster

Create and Configure Azure Data Factory

Setup

Setting up Azure blob Storage

Task 1

Import the Notebook

Task 2

Create an Azure data factory

Task 3

Create a Linked Service

Task 4

Create a Pipeline

Task 5

Run and monitor

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages