In this lab, you will learn how to create on-demand Azure Databricks cluster and run jobs using Azure Data Factory.
1.In the Azure portal tab in your browser and click Create a resource.
-
In the Storage category, click Storage account.
-
Create a new storage account with the following settings:
- Name: Specify a unique name (and make a note of it)
- Deployment model: Resource manager
- Account kind: Storage (general purpose v1) -Location: Choose the same location as your Databricks workspace
- Replication: Locally-redundant storage (LRS)
- Performance: Standard
- Secure transfer required: Disabled
- Subscription: Choose your Azure subscription
- Resource group: Choose the existing resource group for your Databricks workspace •-Virtual networks: Disabled
-
Wait for the resource to be deployed. Then view the newly deployed storage account.
-
In the blade for your storage account, click Blobs.
-
In the Browse blobs blade, click Container, and create a new container with the following settings: • Name: spark • Public access level: Private
-
In the Settings section of the blade for your blob store, click Access keys and note the Storage account name and key1 values on this blade – you will need these in the next procedure.
-
Go to Storage explorer ( preview) and then create a folder data inside the container spark
-
Upload the file IISlog.txt
-
Go to the Azure Databricks workspace
-
Click import and import ProcessLog.py
-
Go to the Account
-
Go to user settings and click Generate new token
-
Note down the token
• Go to Create a Resource | Analytics | Data Factory
• Provide the details a. Unique name b. Select the resource group already used in the lab c. Use location as west Europe
Use Edge \Chrome
• Go to the newly created resource and click on author and monitoring
• A new window will open in the browser. Wait for few minutes to open
• Click on Author
• Click on Connections and under linked services , click new
• Select the compute tab and choose Azure Databricks and then select continue
• Provide the following details a. Unique name b. Provide the ADB access token c. Rest as follows
• Go to advance and under spark conf settings , add two name value pair
i. Name: fs.azure.account.key.databrickshacks.blob.core.windows.net
Value : XXX Storage account Key
ii. Name : spark.hadoop.fs.azure.account.key.databrickshacks.blob.core.windows.net
Value : XXX storage account key
d. Click Finish
-
Go to authoring page and click on pipeline and select Add Pipeline
-
Select DataBricks and Drag Notebook
-
Give a unique notebook name
-
Under Azure Databricks, select the newly created ADB linked service
-
Go to Settings and click browse.
-
To add notebook , traverse and add Processlog
-
Click Publish All
-
Once published Successfully , add trigger and trigger now.
-
Click finish
-
Go to Monitor
-
Monitor the pipeline and observe to the pipeline
-
Go to the blob\spark\data and find a part_ file that got created.
-
Download and view the data