Azure Databricks Questions And Answers

Welcome to Azure Databricks Questions and Answers quiz that would help you to check your knowledge and review the Microsoft Learning Path: Data engineering with Azure Databricks.

Please, provide your Name and Email to get started!

Please, enter your Full Name

Please, enter your Email

When creating a new cluster in the Azure Databricks workspace, what happens behind the scenes?

When an Azure Databricks workspace is deployed, you are allocated a pool of VMs. Creating a cluster draws from this pool.

Azure Databricks provisions a dedicated VM that processes all jobs, based on your VM type and size selection.

Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections.

None

In Azure Databricks, High Concurrency clusters terminates automatically after 120 minutes, by default.

True

False

None

Use ................. to return records from a Dataframe to the driver of the cluster

select()

list()

collect()

None

What happens to Databricks activities (notebook, JAR, Python) in Azure Data Factory if the target cluster in Azure Databricks isn't running when the cluster is called by Data Factory?

The Databricks activity will fail in Azure Data Factory – you must always have the cluster running

If the target cluster is stopped, Databricks will start the cluster before attempting to execute

Simply add a Databricks cluster start activity before the notebook, JAR, or Python Databricks activity

None

Azure Databricks is a ................... platform.

Compute & Storage

Compute

Storage

None

You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day. You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times. What should you include in the solution?

Sink to Azure Queue storage.

Use a JSON format for physical data storage.

Include a watermark column.

Partition by DateTime fields.

None

How do you list files in DBFS within a notebook?

%fs ls /my-file-path

%fs dir /my-file-path

ls /my-file-path

None

You are a data engineer implementing a lambda architecture on Microsoft Azure. You use an open-source big data solution to collect, process, and maintain data. The analytical data store performs poorly. You must implement a solution Provide data warehousing, Reduce ongoing management activities and Deliver SQL query responses in less than one second, Which type of cluster should you create?

Apache HBase

Interactive Query

Apache Spark

Apache Hadoop

None

In Azure Databricks platform architecture, the web app and cluster manager is part of the ...................

Control Plane

Control Panel

Data Panel

Data Plane

None

10.

You can create a cluster without creating a Databricks workspace

True

False

None

11.

In Azure Databricks platform architecture, the ................hosts Databricks jobs, notebooks with query results, the cluster manager, web application, Hive metastore, and security access control lists (ACLs) and user sessions.

Control Panel

Data Panel

Control Plane

Data Plane

None

12.

In Azure Databricks, the default cluster mode is ...............

Single Node

High Concurrency

Standard

None

13.

..................... allows Azure to terminate the cluster after a specified number of minutes of inactivity.

Auto-stop

Auto-termination

Auto-scales

None

14.

What happens if the command option("checkpointLocation", pointer-to-checkpoint directory) is not specified?

The streaming job will function as expected since the checkpointLocation option does not exist

When the streaming job stops, all state around the streaming job is lost, and upon restart, the job must start from scratch

It will not be possible to create more than one streaming query that uses the same streaming source since they will conflict

None

15.

Which of the following statements describes a wide transformations?

A wide transformation applies data transformation over a large number of columns

A wide transformation requires sharing data across workers. It does so by shuffling data.

A wide transformation can be applied per partition/worker with no need to share or shuffle data to other workers

None

16.

Which DataFrame method do you use to create a temporary view?

createOrReplaceTempView()

createTempViewDF()

createTempView()

None

17.

In Azure Databricks, A .................. is recommended for a single user, it can run workloads developed in any language: Python, SQL, R, and Scala.

High Concurrency cluster

Single Node cluster

Standard cluster

None

18.

A Serverless Pool is self-managed pool of cloud resources that is auto-configured for interactive Spark workloads.

False

True

None

19.

In Delta Lake architecture, Which tables contain raw data ingested from various sources (JSON files, RDBMS data, IoT data, etc.)?

Silver tables

Bronze tables

Gold tables

None

20.

You are creating a new notebook in Azure Databricks that will support R as the primary language but will also support Scala and SQL. Which switch should you use to switch between languages?

%(language)

%[]

None

21.

You have an Azure Databricks workspace named workspace1 in the Standard pricing tier. You need to configure workspace1 to support autoscaling all-purpose clusters. The solution must automatically scale down workers when the cluster is underutilized for three minutes, minimize the time it takes to scale to the maximum number of workers, and minimize costs. What should you do first?

Set Cluster Mode to High Concurrency.

Enable container services for workspace1.

Create a cluster policy in workspace1.

Upgrade workspace1 to the Premium pricing tier.

None

22.

What is the Python syntax for defining a DataFrame in Spark from an existing Parquet file in DBFS?

IPGeocodeDF = spark.parquet.read("dbfs:/mnt/training/ip-geocode.parquet")

IPGeocodeDF = parquet.read("dbfs:/mnt/training/ip-geocode.parquet")

IPGeocodeDF = spark.read.parquet("dbfs:/mnt/training/ip-geocode.parquet")

None

23.

Scaling vertically means .................

we can simply add new "nodes" to the cluster almost endlessly.

It is limited to a finite amount of RAM, Threads and CPU speeds.

None

24.

In Azure Databricks, High Concurrency clusters can run workloads developed in Scala.

False

True

None

25.

.................. is a unified processing engine that can analyze big data using SQL, machine learning, graph processing, or real-time stream analysis

Apache Storm

Google BigQuery

Apache Flink

Spark

None

26.

A pipeline is a logical grouping of activities that together perform a task

True

False

None

27.

...................... is a transactional storage layer designed specifically to work with Apache Spark and Databricks File System (DBFS).

Azure File storage

Azure Blob storage

Data Lake

None

28.

In Azure Databricks, only ............. supports table access control.

Single Node cluster

Standard cluster

High Concurrency cluster

None

29.

The driver and the executors are

Java processes

Python processes

C# processes

None

30.

You can only run up to ................. concurrent jobs in Databricks workspace

150

250

100

None

31.

Use ........... and .............. to remove duplicate data in a DataFrame

deleteDuplicates

dropDuplicates

removeDuplicates

distinct()

32.

When doing a write stream command, what does the outputMode("append") option do?

The append outputMode allows records to be added to the output sink

The append mode allows records to be updated and changed in place

The append mode replaces existing records and updates aggregates

None

33.

................. is a scalable real-time data ingestion service that processes millions of data in a matter of seconds.

Azure Event Hubs

Azure Real-Time

Azure Streaming

None

34.

VNet peering is only required if using the standard deployment without VNet injection.

True

False

None

35.

What is an Azure Key Vault-backed secret scope?

A Databricks secret scope that is backed by Azure Key Vault instead of Databricks

It is a method by which you create a secure connection to Azure Key Vault from a notebook and directly access its secrets within the Spark session

It is the Key Vault Access Key used to securely connect to the vault and retrieve secrets

None

36.

The ............... is a big data processing architecture that combine both batch- and real-time processing methods.

MapReduce

Lambda architecture

Delta Lake

None

37.

You can deploy Azure Databricks using ....................

Azure Resource Manager templates

Azure portal

CLI

38.

In which modes does Azure Databricks provide data encryption?

At-rest and in-transit

At-rest only

In-transit only

None

39.

In the cluster, the second level of parallelization is the .....................

Slot

Driver

Task

Executor

None

40.

You need to find the average of sales transactions by storefront. Which of the following aggregates would you use?

df.groupBy(col("storefront")).avg("completedTransactions")

df.select(col("storefront")).avg("completedTransactions")

df.groupBy(col("storefront")).avg(col("completedTransactions"))

None

41.

What's the primary supported language in Apache Spark?

Scala

Python

JAVA

None

42.

Databricks was founded by the creators of ..........................

Microsoft

Apache Spark, Delta Lake, and MLflow

Google

Amazon

None

43.

How do you create a DataFrame object?

Use the createDataFrame() function

Introduce a variable name and equate it to something like myDataFrameDF =

Use the DF.create() syntax

None

44.

To parallelize work, the unit of distribution is a Spark Cluster. Every Cluster has a Driver and one or more executors. Work submitted to the Cluster is split into what type of object?

Jobs

Arrays

Stages

Slot

None

45.

Which statement about the Azure Databricks Data Plane is true?

The Data Plane contains the Cluster Manager and coordinates data processing jobs

The Data Plane is hosted within the client subscription and is where all data is processed and stored

The Data Plane is hosted within a Microsoft-managed subscription

None

46.

You plan to build a structured streaming solution in Azure Databricks. The solution will count new events in five-minute intervals and report only events that arrive during the interval. The output will be sent to a Delta Lake table. Which output mode should you use?

Update

Merge

Complete

Append

None

47.

The maximum number of jobs that a Databricks workspace can create in an hour is ..............

150

100

1000

None

48.

Delta Lake integrates tightly with Apache Spark, and uses an open format that is based on ....................

Parquet

XML

YAML

JSON

None

49.

In Spark Structured Streaming, what method should be used to read streaming data into a DataFrame?

spark.readStream

spark.read

spark.stream.read

None

50.

What steps are required to authorize Azure DevOps to connect to and deploy notebooks to a staging or production Azure Databricks workspace?

Create a new Access Token within the user settings in the production Azure Databricks workspace, then use the token as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline

In the production or staging Azure Databricks workspace, enable Git integration to Azure DevOps, then link to the Azure DevOps source code repo

Create an Azure Active Directory application, copy the application ID, then use that as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline

None

51.

Apache Spark Structured Streaming is a fast, scalable, and fault-tolerant stream processing API

False

True

None

52.

In Azure Databricks, A .................. provide fine-grained sharing for maximum resource utilization and minimum query latencies, it can an run workloads developed in SQL, Python, and R.

High Concurrency cluster

Standard cluster

Single Node cluster

None

53.

In cluster, each parallelized action is referred to as ................

Job

Slot

Task

None

54.

In Azure Databricks, You can change the cluster mode after a cluster is created.

False

True

None

55.

Each Driver has a number of Slots to which parallelized Tasks can be assigned to it by the Executor.

True

False

None

56.

You can read and write data that's stored in Delta Lake by using ............

T-SQL

PowerShell

Streaming APIs

Apache Spark SQL batch

57.

A ............ is a collection of cells to execute code, to render formatted text, or to display graphical visualizations.

Notebook

Power BI Dashboard

Notepad

Power BI Report

None

58.

How do you perform UPSERT in a Delta dataset?

Use UPSERT INTO my-table

Use UPSERT INTO my-table /MERGE

Use MERGE INTO my-table USING data-to-upsert

None

59.

In Databricks, maximum of number of Azure Databricks API calls/hour is ........

1500

1000

150

None

60.

............... is a fully-managed version of the open-source Apache Spark analytics and data processing engine.

Azure Databricks

Azure Data Factory

Azure Spark Pool

Azure Synapse Service

None

61.

Which method for renaming a DataFrame's column is incorrect?

df.select(col("timestamp").alias("dateCaptured"))

df.toDF("dateCaptured")

df.alias("timestamp", "dateCaptured")

None

62.

In Databricks, the first user to login and initialize the workspace is the .............

Workspace global admin

Workspace admin

Workspace owner

None

63.

What command should be issued to view the list of active streams?

Invoke spark.streams.show

Invoke spark.streams.active

Invoke spark.view.active

None

64.

............... is a data ingestion and transformation service that allows you to load raw data from over 70 different on-premises or cloud sources.

Azure Data Studio

Azure Data Factory

Azure Data Lake

None

65.

You create an Azure Databricks cluster and specify an additional library to install. When you attempt to load the library to a notebook, the library in not found. You need to identify the cause of the issue. What should you review?

Notebook logs

Workspace logs

Global init scripts logs

Cluster event logs

None

66.

You plan to perform batch processing in Azure Databricks once daily. Which type of Databricks cluster should you use?

Standard

Interactive

Automated

None

67.

You can only attach a notebook for a running cluster.

False

True

None

68.

Use the ............... function to display a small set of rows from a larger DataFrame

TopN

limit()

Where clause

None

69.

Use the .................... function to display a DataFrame in the Notebook

display()

show()

select()

None

70.

........................ is the authorization system you use to manage access to Azure resources.

Azure role-based access control (RBAC)

MFA

Access control (IAM)

None

71.

How can parameters be passed into an Azure Databricks notebook from Azure Data Factory?

Deploy the notebook as a web service in Databricks, defining parameter names and types

Use notebook widgets to define parameters that can be passed into the notebook

Use the new API endpoint option on a notebook in Databricks and provide the parameter name

None

72.

The Delta Lake Architecture is a vast improvement upon the traditional Lambda architecture.

False

True

None

73.

Every Cluster has a one executor and one or more drivers.

True

False

None

74.

The first level of parallelization is the ........................ that is a JVM running on a node, typically, one executor instance per node.

Driver

Task

Slot

Executor

None

75.

As Azure Limit, Virtual Machines (VMs) per subscription per region is ..............

20000

30000

25000

None

76.

................ is an open-source format, supported by many data platforms, including Azure Synapse Analytics.

XML

Data Lake

Delta Lake

JSON

None

77.

Which feature of Spark determines how your code is executed?

Java Garbage Collection

Tungsten Record Format

Catalyst Optimizer

None

78.

You have an Azure Databricks resource. You need to log actions that relate to compute changes triggered by the Databricks resources. Which Databricks services should you log?

Jobs

Clusters

Workspaces

DBFS

None

79.

What is required to specify the location of a checkpoint directory when defining a Delta Lake streaming query?

.writeStream.format("delta").option("checkpointLocation", checkpointPath)

.writeStream.format("parquet").option("checkpointLocation", checkpointPath)

.writeStream.format("delta").checkpoint("location", checkpointPath)

None

80.

In cluster, each Job is broken down into ..............

Driver

Stages

Task

Slot

None

81.

What're the characteristics of the "3 Vs of Big Data"?

High volume, High speed, and Variety

High volume, High velocity, and Variety

Big size, High speed, and Variety

None

82.

................ is the number of which is determined by the number of cores and CPUs of each node.

Executor

Slot

Driver

Task

None

83.

Scaling horizontally means .................

we can simply add new "nodes" to the cluster almost endlessly.

It is limited to a finite amount of RAM, Threads and CPU speeds.

None

84.

In Azure Databricks, ............................. requires at least one Spark worker node in addition to the driver node to execute Spark jobs.

Standard cluster

Single Node cluster

High Concurrency cluster

None

85.

Which cluster modes are supported in Azure Databricks?

Single Node

Multiple Node

Standard

High Concurrency

Enterprise

86.

It is possible to put an Azure Databricks Notebook under Version Control in an Azure Devops repo

True

False

None

87.

Which command orders by a column in descending order?

df.orderBy("requests desc")

df.orderBy(col("requests").desc())

df.orderBy("requests").desc()

None

88.

The supported Databricks notebook format is the ................ file type.

DBC

.notebook

.spark

None

89.

Which is the correct syntax for overwriting data in Azure Synapse Analytics from a Databricks notebook?

df.write.mode("overwrite").option("...").option("...").save()

df.write.format("com.databricks.spark.sqldw").overwrite().option("...").option("...").save()

df.write.format("com.databricks.spark.sqldw").mode("overwrite").option("...").option("...").save()

None

90.

How many drivers does a Cluster have?

None

91.

................ automates your release process up to the point where human intervention is needed, by clicking a button.

Continuous Integration

Continuous Deployment

Continuous Delivery

None

92.

In Spark Application, the ................ is the JVM in which our application runs.

Task

Slot

Driver

Executor

None

93.

In Databricks, each workspace is identified by a globally unique 53-bit number, called .................

Workspace ID or Tenant ID.

Workspace ID or Subscriber ID.

Workspace ID or Organization ID.

None

94.

What is a lambda architecture and what does it try to solve?

An architecture that employs the latest Scala runtimes in one or more Databricks clusters to provide the most efficient data processing platform available today

An architecture that defines a data processing pipeline whereby microservices act as compute resources for efficient large-scale data processing

An architecture that splits incoming data into two paths - a batch path and a streaming path. This architecture helps address the need to provide real-time processing in addition to slower batch computations.

None

95.

In Azure Databricks, ............................. has no workers and runs Spark jobs on the driver node.

Standard cluster

Single Node cluster

High Concurrency cluster

None

96.

In Delta Lake architecture, Which tables provide a more refined view of our data?

Gold tables

Bronze tables

Silver tables

None

97.

As Azure Limit, Resource groups per subscription is ..............

900

950

980

None

98.

What's the purpose of linked services in Azure Data Factory?

To link data stores or computer resources together for the movement of data between resources

To represent a processing step in a pipeline

To represent a data store or a compute resource that can host execution of an activity

None

99.

In Azure Databricks platform architecture, the ................ contains all the Databricks runtime clusters hosted within the workspace.

Data Panel

Data Plane

Control Panel

Control Plane

None

100.

The first step to using Azure Databricks is to create and deploy a .........................

Databricks Workspace

Databricks Cluster

Databricks Spark

None

101.

Use ............... to remove columns from a DataFrame

delete()

drop()

remove()

None

102.

..................... enables you to capture the audit logs and make then centrally available and fully searchable.

Azure Monitor integration

Azure Search integration

Azure Log integration

None

103.

Driver programs access Apache Spark through a .................... object regardless of deployment location.

SparkNode

SparkCluster

SparkSession

None

104.

You can use your notebook to run a code without attaching it to a cluster

False

True

None

105.

As Azure Limit, the Storage accounts per region per subscription is .............

250

350

150

None

106.

What are the two prerequisites for connecting Azure Databricks with Azure Synapse Analytics that apply to the Azure Synapse Analytics instance?

Add the client IP address to the firewall's allowed IP addresses list and use the correctly formatted ConnectionString

Create a database master key and configure the firewall to enable Azure services to connect

Use a correctly formatted ConnectionString and create a database master key

None

107.

When using the Column Class, which command filters based on the end of a column value? For example, a column named verb and filtered by words ending with "ing".

df.filter(col("verb").endswith("ing"))

df.filter().col("verb").like("%ing")

df.filter("verb like '%ing'")

None

108.

Spark has optimized sharing data from one worker to another operation by using a format called ............

SafeFormat

RowData

Tungsten

None

109.

................. is a fundamental isolation unit in Databricks.

Executer

Cluster

Workspace

None

110.

To use your notebook with another cluster, what should you do?

You can attach one notebook with multiple clusters

Detach your notebook from a cluster and attach it to another

Create a notebook for each cluster

None

111.

What is the Databricks Delta command to display metadata?

SHOW SCHEMA tablename

MSCK DETAIL tablename

DESCRIBE DETAIL tableName

None

112.

In Azure Databricks, You can use both Python 2 and Python 3 notebooks on the same cluster.

True

False

None

113.

In Databricks, the notebook interface is the .............. program

Executer

Workor

Driver

Slot

None

114.

How do you infer the data types and column names when you read a JSON file?

spark.read.inferSchema("true").json(jsonFile)

spark.read.option("inferSchema", "true").json(jsonFile)

spark.read.option("inferData", "true").json(jsonFile)

None

115.

A Cluster can have drivers running in parallel.

False

True

None

116.

.................. is the blade that you use to assign roles to grant access to Azure resources.

Azure role-based access control (RBAC)

MFA

Access control (IAM)

None

117.

................... are groups of computers that are treated as a single computer and handle the execution of commands issued from notebooks.

Clusters

Drivers

Slot

Worker

None

118.

Which languages are supported in Apache Spark?

Scala

JAVA

Node.JS

Python

119.

...................... takes a step further by removing the human intervention and relying on automated tests to automatically determine whether the build should be deployed into production.

Continuous Delivery

Continuous Deployment

Continuous Integration

None

120.

....................... is the in-memory storage format for Spark SQL, DataFrames & Datasets.

SafeRow

SafeColumn

UnsafeRow

UnsafeColumn

None

121.

............. is currently the most secure way to access Azure data services from Azure Databricks.

Azure Private Link

Azure Public Link

Azure Protected Link

None

122.

In cluster, the results of each Job (parallelized/distributed action) is returned to the ...............

Driver

Slot

Task

Executer

None

123.

How do you cache data into the memory of the local executor for instant access?

.save().inMemory()

.cache()

.inMemory().save()

None

124.

What sort of pipeline is required in Azure DevOps for creating artifacts used in releases?

An Artifact pipeline

A Release pipeline

A Build pipeline

None

125.

To set a Spark configuration property called password to the value of the secret stored in secrets/apps/acme-app/password, Which syntax is correct?

spark.password {{secrets/apps/acme-app/password}}

spark.password {(secrets/apps/acme-app/password)}

spark.password ({secrets/apps/acme-app/password})

None

126.

Which command specifies a column value in a DataFrame's filter? Specifically, filter by a productType column where the value is equal to book?

df.col("productType").filter("book")

df.filter("productType = 'book'")

df.filter(col("productType") == "book")

None

127.

In Databricks, The maximum number of notebooks or execution contexts attached to a cluster is ............

100

150

None

128.

What does Azure Data Lake Storage (ADLS) Passthrough enable?

Automatically mounting ADLS accounts to the workspace that are added to the managed resource group

Commands running on a configured cluster can read and write data in ADLS without configuring service principal credentials

User security groups that are added to ADLS are automatically created in the workspace as Databricks groups

None

129.

Use ............ to select a subset of columns from a DataFrame

Where clause

limit()

select()

None

130.

In Delta Lake architecture, Which tables provide business level aggregates often used for reporting and dashboarding?

Gold tables

Bronze tables

Silver tables

None

131.

If you create a DataFrame that will read some data from Azure Blob Storage, and then you create another DataFrame by filtering the initial DataFrame. What feature of Spark causes these transformation to be analyzed?

Tungsten Record Format

Lazy Execution

Java Garbage Collection

None

132.

In Cluster, the first level of parallelization is the .....................

Task

Driver

Executor

Slot

None

1 out of 14

Are you sure, you would like to submit your responses on Azure Databricks Questions and Answers and view your results?

Time's up

Azure Databricks Questions and Answers

Leave a Reply Cancel reply

Subscribe to Our Newsletter