Azure Databricks Questions And Answers

Welcome to Azure Databricks Questions and Answers quiz that would help you to check your knowledge and review the Microsoft Learning Path: Data engineering with Azure Databricks.

Please, provide your Name and Email to get started!

Please, enter your Full Name

Please, enter your Email

In Azure Databricks platform architecture, the ................ contains all the Databricks runtime clusters hosted within the workspace.

Data Plane

Control Plane

Data Panel

Control Panel

None

When using the Column Class, which command filters based on the end of a column value? For example, a column named verb and filtered by words ending with "ing".

df.filter(col("verb").endswith("ing"))

df.filter("verb like '%ing'")

df.filter().col("verb").like("%ing")

None

..................... allows Azure to terminate the cluster after a specified number of minutes of inactivity.

Auto-termination

Auto-stop

Auto-scales

None

You can only attach a notebook for a running cluster.

False

True

None

You can only run up to ................. concurrent jobs in Databricks workspace

150

250

100

None

How many drivers does a Cluster have?

None

To parallelize work, the unit of distribution is a Spark Cluster. Every Cluster has a Driver and one or more executors. Work submitted to the Cluster is split into what type of object?

Stages

Arrays

Jobs

Slot

None

In Azure Databricks, A .................. is recommended for a single user, it can run workloads developed in any language: Python, SQL, R, and Scala.

Standard cluster

High Concurrency cluster

Single Node cluster

None

Scaling horizontally means .................

It is limited to a finite amount of RAM, Threads and CPU speeds.

we can simply add new "nodes" to the cluster almost endlessly.

None

10.

What is the Python syntax for defining a DataFrame in Spark from an existing Parquet file in DBFS?

IPGeocodeDF = spark.parquet.read("dbfs:/mnt/training/ip-geocode.parquet")

IPGeocodeDF = spark.read.parquet("dbfs:/mnt/training/ip-geocode.parquet")

IPGeocodeDF = parquet.read("dbfs:/mnt/training/ip-geocode.parquet")

None

11.

Delta Lake integrates tightly with Apache Spark, and uses an open format that is based on ....................

YAML

XML

Parquet

JSON

None

12.

In which modes does Azure Databricks provide data encryption?

At-rest only

At-rest and in-transit

In-transit only

None

13.

How do you cache data into the memory of the local executor for instant access?

.inMemory().save()

.save().inMemory()

.cache()

None

14.

You can read and write data that's stored in Delta Lake by using ............

Streaming APIs

Apache Spark SQL batch

PowerShell

T-SQL

15.

.................. is a unified processing engine that can analyze big data using SQL, machine learning, graph processing, or real-time stream analysis

Spark

Google BigQuery

Apache Flink

Apache Storm

None

16.

In Delta Lake architecture, Which tables provide a more refined view of our data?

Bronze tables

Gold tables

Silver tables

None

17.

You plan to perform batch processing in Azure Databricks once daily. Which type of Databricks cluster should you use?

Interactive

Standard

Automated

None

18.

Which command specifies a column value in a DataFrame's filter? Specifically, filter by a productType column where the value is equal to book?

df.filter("productType = 'book'")

df.filter(col("productType") == "book")

df.col("productType").filter("book")

None

19.

When doing a write stream command, what does the outputMode("append") option do?

The append outputMode allows records to be added to the output sink

The append mode replaces existing records and updates aggregates

The append mode allows records to be updated and changed in place

None

20.

In Azure Databricks, You can use both Python 2 and Python 3 notebooks on the same cluster.

True

False

None

21.

Which cluster modes are supported in Azure Databricks?

Enterprise

High Concurrency

Single Node

Standard

Multiple Node

22.

What's the purpose of linked services in Azure Data Factory?

To represent a data store or a compute resource that can host execution of an activity

To link data stores or computer resources together for the movement of data between resources

To represent a processing step in a pipeline

None

23.

In cluster, the results of each Job (parallelized/distributed action) is returned to the ...............

Slot

Task

Executer

Driver

None

24.

Which method for renaming a DataFrame's column is incorrect?

df.alias("timestamp", "dateCaptured")

df.toDF("dateCaptured")

df.select(col("timestamp").alias("dateCaptured"))

None

25.

What sort of pipeline is required in Azure DevOps for creating artifacts used in releases?

An Artifact pipeline

A Build pipeline

A Release pipeline

None

26.

In Databricks, maximum of number of Azure Databricks API calls/hour is ........

1500

1000

150

None

27.

You can deploy Azure Databricks using ....................

CLI

Azure portal

Azure Resource Manager templates

28.

Use ............... to remove columns from a DataFrame

delete()

remove()

drop()

None

29.

Spark has optimized sharing data from one worker to another operation by using a format called ............

SafeFormat

Tungsten

RowData

None

30.

You are creating a new notebook in Azure Databricks that will support R as the primary language but will also support Scala and SQL. Which switch should you use to switch between languages?

%[]

%(language)

None

31.

You can use your notebook to run a code without attaching it to a cluster

False

True

None

32.

....................... is the in-memory storage format for Spark SQL, DataFrames & Datasets.

SafeRow

UnsafeRow

SafeColumn

UnsafeColumn

None

33.

To use your notebook with another cluster, what should you do?

Create a notebook for each cluster

You can attach one notebook with multiple clusters

Detach your notebook from a cluster and attach it to another

None

34.

................. is a fundamental isolation unit in Databricks.

Executer

Cluster

Workspace

None

35.

Which is the correct syntax for overwriting data in Azure Synapse Analytics from a Databricks notebook?

df.write.format("com.databricks.spark.sqldw").mode("overwrite").option("...").option("...").save()

df.write.format("com.databricks.spark.sqldw").overwrite().option("...").option("...").save()

df.write.mode("overwrite").option("...").option("...").save()

None

36.

The first step to using Azure Databricks is to create and deploy a .........................

Databricks Spark

Databricks Workspace

Databricks Cluster

None

37.

What are the two prerequisites for connecting Azure Databricks with Azure Synapse Analytics that apply to the Azure Synapse Analytics instance?

Add the client IP address to the firewall's allowed IP addresses list and use the correctly formatted ConnectionString

Create a database master key and configure the firewall to enable Azure services to connect

Use a correctly formatted ConnectionString and create a database master key

None

38.

Apache Spark Structured Streaming is a fast, scalable, and fault-tolerant stream processing API

True

False

None

39.

Driver programs access Apache Spark through a .................... object regardless of deployment location.

SparkCluster

SparkNode

SparkSession

None

40.

............... is a data ingestion and transformation service that allows you to load raw data from over 70 different on-premises or cloud sources.

Azure Data Lake

Azure Data Factory

Azure Data Studio

None

41.

Which DataFrame method do you use to create a temporary view?

createTempViewDF()

createOrReplaceTempView()

createTempView()

None

42.

The Delta Lake Architecture is a vast improvement upon the traditional Lambda architecture.

True

False

None

43.

................ is an open-source format, supported by many data platforms, including Azure Synapse Analytics.

Delta Lake

XML

Data Lake

JSON

None

44.

Use the .................... function to display a DataFrame in the Notebook

select()

show()

display()

None

45.

Azure Databricks is a ................... platform.

Compute & Storage

Storage

Compute

None

46.

If you create a DataFrame that will read some data from Azure Blob Storage, and then you create another DataFrame by filtering the initial DataFrame. What feature of Spark causes these transformation to be analyzed?

Tungsten Record Format

Lazy Execution

Java Garbage Collection

None

47.

Which languages are supported in Apache Spark?

Scala

Node.JS

JAVA

Python

48.

Databricks was founded by the creators of ..........................

Amazon

Google

Apache Spark, Delta Lake, and MLflow

Microsoft

None

49.

The ............... is a big data processing architecture that combine both batch- and real-time processing methods.

Lambda architecture

MapReduce

Delta Lake

None

50.

You are a data engineer implementing a lambda architecture on Microsoft Azure. You use an open-source big data solution to collect, process, and maintain data. The analytical data store performs poorly. You must implement a solution Provide data warehousing, Reduce ongoing management activities and Deliver SQL query responses in less than one second, Which type of cluster should you create?

Apache Hadoop

Apache HBase

Interactive Query

Apache Spark

None

51.

.................. is the blade that you use to assign roles to grant access to Azure resources.

Azure role-based access control (RBAC)

MFA

Access control (IAM)

None

52.

How do you create a DataFrame object?

Introduce a variable name and equate it to something like myDataFrameDF =

Use the DF.create() syntax

Use the createDataFrame() function

None

53.

What's the primary supported language in Apache Spark?

Scala

Python

JAVA

None

54.

What does Azure Data Lake Storage (ADLS) Passthrough enable?

Commands running on a configured cluster can read and write data in ADLS without configuring service principal credentials

User security groups that are added to ADLS are automatically created in the workspace as Databricks groups

Automatically mounting ADLS accounts to the workspace that are added to the managed resource group

None

55.

You have an Azure Databricks workspace named workspace1 in the Standard pricing tier. You need to configure workspace1 to support autoscaling all-purpose clusters. The solution must automatically scale down workers when the cluster is underutilized for three minutes, minimize the time it takes to scale to the maximum number of workers, and minimize costs. What should you do first?

Create a cluster policy in workspace1.

Enable container services for workspace1.

Set Cluster Mode to High Concurrency.

Upgrade workspace1 to the Premium pricing tier.

None

56.

The first level of parallelization is the ........................ that is a JVM running on a node, typically, one executor instance per node.

Executor

Slot

Driver

Task

None

57.

The maximum number of jobs that a Databricks workspace can create in an hour is ..............

100

1000

150

None

58.

The supported Databricks notebook format is the ................ file type.

.notebook

.spark

DBC

None

59.

............... is a fully-managed version of the open-source Apache Spark analytics and data processing engine.

Azure Data Factory

Azure Synapse Service

Azure Databricks

Azure Spark Pool

None

60.

What're the characteristics of the "3 Vs of Big Data"?

High volume, High speed, and Variety

High volume, High velocity, and Variety

Big size, High speed, and Variety

None

61.

How do you infer the data types and column names when you read a JSON file?

spark.read.option("inferSchema", "true").json(jsonFile)

spark.read.option("inferData", "true").json(jsonFile)

spark.read.inferSchema("true").json(jsonFile)

None

62.

In Spark Structured Streaming, what method should be used to read streaming data into a DataFrame?

spark.read

spark.stream.read

spark.readStream

None

63.

In Databricks, the first user to login and initialize the workspace is the .............

Workspace admin

Workspace global admin

Workspace owner

None

64.

Every Cluster has a one executor and one or more drivers.

False

True

None

65.

................ is the number of which is determined by the number of cores and CPUs of each node.

Task

Executor

Driver

Slot

None

66.

What is a lambda architecture and what does it try to solve?

An architecture that defines a data processing pipeline whereby microservices act as compute resources for efficient large-scale data processing

An architecture that splits incoming data into two paths - a batch path and a streaming path. This architecture helps address the need to provide real-time processing in addition to slower batch computations.

An architecture that employs the latest Scala runtimes in one or more Databricks clusters to provide the most efficient data processing platform available today

None

67.

It is possible to put an Azure Databricks Notebook under Version Control in an Azure Devops repo

False

True

None

68.

Which feature of Spark determines how your code is executed?

Tungsten Record Format

Catalyst Optimizer

Java Garbage Collection

None

69.

What is the Databricks Delta command to display metadata?

DESCRIBE DETAIL tableName

SHOW SCHEMA tablename

MSCK DETAIL tablename

None

70.

You have an Azure Databricks resource. You need to log actions that relate to compute changes triggered by the Databricks resources. Which Databricks services should you log?

Clusters

Jobs

DBFS

Workspaces

None

71.

............. is currently the most secure way to access Azure data services from Azure Databricks.

Azure Private Link

Azure Public Link

Azure Protected Link

None

72.

In Azure Databricks, A .................. provide fine-grained sharing for maximum resource utilization and minimum query latencies, it can an run workloads developed in SQL, Python, and R.

Standard cluster

Single Node cluster

High Concurrency cluster

None

73.

In Delta Lake architecture, Which tables contain raw data ingested from various sources (JSON files, RDBMS data, IoT data, etc.)?

Silver tables

Bronze tables

Gold tables

None

74.

In Azure Databricks, only ............. supports table access control.

Single Node cluster

Standard cluster

High Concurrency cluster

None

75.

................ automates your release process up to the point where human intervention is needed, by clicking a button.

Continuous Deployment

Continuous Delivery

Continuous Integration

None

76.

In Cluster, the first level of parallelization is the .....................

Driver

Slot

Task

Executor

None

77.

In Databricks, the notebook interface is the .............. program

Driver

Workor

Slot

Executer

None

78.

...................... is a transactional storage layer designed specifically to work with Apache Spark and Databricks File System (DBFS).

Data Lake

Azure File storage

Azure Blob storage

None

79.

You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day. You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times. What should you include in the solution?

Use a JSON format for physical data storage.

Sink to Azure Queue storage.

Include a watermark column.

Partition by DateTime fields.

None

80.

A pipeline is a logical grouping of activities that together perform a task

True

False

None

81.

In the cluster, the second level of parallelization is the .....................

Driver

Task

Slot

Executor

None

82.

What happens if the command option("checkpointLocation", pointer-to-checkpoint directory) is not specified?

It will not be possible to create more than one streaming query that uses the same streaming source since they will conflict

The streaming job will function as expected since the checkpointLocation option does not exist

When the streaming job stops, all state around the streaming job is lost, and upon restart, the job must start from scratch

None

83.

In Azure Databricks, High Concurrency clusters terminates automatically after 120 minutes, by default.

True

False

None

84.

As Azure Limit, the Storage accounts per region per subscription is .............

350

250

150

None

85.

Use ................. to return records from a Dataframe to the driver of the cluster

select()

list()

collect()

None

86.

A ............ is a collection of cells to execute code, to render formatted text, or to display graphical visualizations.

Power BI Report

Notepad

Power BI Dashboard

Notebook

None

87.

In Delta Lake architecture, Which tables provide business level aggregates often used for reporting and dashboarding?

Bronze tables

Gold tables

Silver tables

None

88.

How do you list files in DBFS within a notebook?

%fs dir /my-file-path

ls /my-file-path

%fs ls /my-file-path

None

89.

In Azure Databricks, ............................. has no workers and runs Spark jobs on the driver node.

Single Node cluster

High Concurrency cluster

Standard cluster

None

90.

You create an Azure Databricks cluster and specify an additional library to install. When you attempt to load the library to a notebook, the library in not found. You need to identify the cause of the issue. What should you review?

Global init scripts logs

Notebook logs

Cluster event logs

Workspace logs

None

91.

................. is a scalable real-time data ingestion service that processes millions of data in a matter of seconds.

Azure Event Hubs

Azure Streaming

Azure Real-Time

None

92.

........................ is the authorization system you use to manage access to Azure resources.

Azure role-based access control (RBAC)

Access control (IAM)

MFA

None

93.

In Azure Databricks platform architecture, the ................hosts Databricks jobs, notebooks with query results, the cluster manager, web application, Hive metastore, and security access control lists (ACLs) and user sessions.

Control Panel

Data Panel

Data Plane

Control Plane

None

94.

Use the ............... function to display a small set of rows from a larger DataFrame

limit()

TopN

Where clause

None

95.

What is an Azure Key Vault-backed secret scope?

It is a method by which you create a secure connection to Azure Key Vault from a notebook and directly access its secrets within the Spark session

A Databricks secret scope that is backed by Azure Key Vault instead of Databricks

It is the Key Vault Access Key used to securely connect to the vault and retrieve secrets

None

96.

In Azure Databricks, ............................. requires at least one Spark worker node in addition to the driver node to execute Spark jobs.

Single Node cluster

High Concurrency cluster

Standard cluster

None

97.

To set a Spark configuration property called password to the value of the secret stored in secrets/apps/acme-app/password, Which syntax is correct?

spark.password {{secrets/apps/acme-app/password}}

spark.password ({secrets/apps/acme-app/password})

spark.password {(secrets/apps/acme-app/password)}

None

98.

................... are groups of computers that are treated as a single computer and handle the execution of commands issued from notebooks.

Drivers

Clusters

Slot

Worker

None

99.

In Spark Application, the ................ is the JVM in which our application runs.

Driver

Slot

Executor

Task

None

100.

As Azure Limit, Resource groups per subscription is ..............

900

950

980

None

101.

What steps are required to authorize Azure DevOps to connect to and deploy notebooks to a staging or production Azure Databricks workspace?

Create an Azure Active Directory application, copy the application ID, then use that as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline

In the production or staging Azure Databricks workspace, enable Git integration to Azure DevOps, then link to the Azure DevOps source code repo

Create a new Access Token within the user settings in the production Azure Databricks workspace, then use the token as the Databricks bearer token in the Databricks Notebooks Deployment step of the Release pipeline

None

102.

In Azure Databricks, High Concurrency clusters can run workloads developed in Scala.

True

False

None

103.

Use ............ to select a subset of columns from a DataFrame

limit()

select()

Where clause

None

104.

How can parameters be passed into an Azure Databricks notebook from Azure Data Factory?

Use notebook widgets to define parameters that can be passed into the notebook

Deploy the notebook as a web service in Databricks, defining parameter names and types

Use the new API endpoint option on a notebook in Databricks and provide the parameter name

None

105.

When creating a new cluster in the Azure Databricks workspace, what happens behind the scenes?

When an Azure Databricks workspace is deployed, you are allocated a pool of VMs. Creating a cluster draws from this pool.

Azure Databricks creates a cluster of driver and worker nodes, based on your VM type and size selections.

Azure Databricks provisions a dedicated VM that processes all jobs, based on your VM type and size selection.

None

106.

Each Driver has a number of Slots to which parallelized Tasks can be assigned to it by the Executor.

True

False

None

107.

...................... takes a step further by removing the human intervention and relying on automated tests to automatically determine whether the build should be deployed into production.

Continuous Integration

Continuous Deployment

Continuous Delivery

None

108.

In cluster, each Job is broken down into ..............

Task

Slot

Stages

Driver

None

109.

You plan to build a structured streaming solution in Azure Databricks. The solution will count new events in five-minute intervals and report only events that arrive during the interval. The output will be sent to a Delta Lake table. Which output mode should you use?

Update

Merge

Append

Complete

None

110.

A Cluster can have drivers running in parallel.

True

False

None

111.

Which statement about the Azure Databricks Data Plane is true?

The Data Plane contains the Cluster Manager and coordinates data processing jobs

The Data Plane is hosted within the client subscription and is where all data is processed and stored

The Data Plane is hosted within a Microsoft-managed subscription

None

112.

In Databricks, The maximum number of notebooks or execution contexts attached to a cluster is ............

150

100

None

113.

You can create a cluster without creating a Databricks workspace

False

True

None

114.

Which command orders by a column in descending order?

df.orderBy("requests desc")

df.orderBy("requests").desc()

df.orderBy(col("requests").desc())

None

115.

VNet peering is only required if using the standard deployment without VNet injection.

True

False

None

116.

In Azure Databricks platform architecture, the web app and cluster manager is part of the ...................

Control Panel

Data Panel

Control Plane

Data Plane

None

117.

What command should be issued to view the list of active streams?

Invoke spark.view.active

Invoke spark.streams.show

Invoke spark.streams.active

None

118.

The driver and the executors are

Java processes

Python processes

C# processes

None

119.

As Azure Limit, Virtual Machines (VMs) per subscription per region is ..............

20000

25000

30000

None

120.

You need to find the average of sales transactions by storefront. Which of the following aggregates would you use?

df.groupBy(col("storefront")).avg("completedTransactions")

df.select(col("storefront")).avg("completedTransactions")

df.groupBy(col("storefront")).avg(col("completedTransactions"))

None

121.

In Databricks, each workspace is identified by a globally unique 53-bit number, called .................

Workspace ID or Subscriber ID.

Workspace ID or Tenant ID.

Workspace ID or Organization ID.

None

122.

In Azure Databricks, the default cluster mode is ...............

Standard

High Concurrency

Single Node

None

123.

Use ........... and .............. to remove duplicate data in a DataFrame

distinct()

removeDuplicates

deleteDuplicates

dropDuplicates

124.

A Serverless Pool is self-managed pool of cloud resources that is auto-configured for interactive Spark workloads.

True

False

None

125.

How do you perform UPSERT in a Delta dataset?

Use UPSERT INTO my-table

Use MERGE INTO my-table USING data-to-upsert

Use UPSERT INTO my-table /MERGE

None

126.

In cluster, each parallelized action is referred to as ................

Job

Slot

Task

None

127.

In Azure Databricks, You can change the cluster mode after a cluster is created.

False

True

None

128.

What is required to specify the location of a checkpoint directory when defining a Delta Lake streaming query?

.writeStream.format("delta").checkpoint("location", checkpointPath)

.writeStream.format("parquet").option("checkpointLocation", checkpointPath)

.writeStream.format("delta").option("checkpointLocation", checkpointPath)

None

129.

What happens to Databricks activities (notebook, JAR, Python) in Azure Data Factory if the target cluster in Azure Databricks isn't running when the cluster is called by Data Factory?

The Databricks activity will fail in Azure Data Factory – you must always have the cluster running

If the target cluster is stopped, Databricks will start the cluster before attempting to execute

Simply add a Databricks cluster start activity before the notebook, JAR, or Python Databricks activity

None

130.

..................... enables you to capture the audit logs and make then centrally available and fully searchable.

Azure Log integration

Azure Monitor integration

Azure Search integration

None

131.

Which of the following statements describes a wide transformations?

A wide transformation can be applied per partition/worker with no need to share or shuffle data to other workers

A wide transformation requires sharing data across workers. It does so by shuffling data.

A wide transformation applies data transformation over a large number of columns

None

132.

Scaling vertically means .................

we can simply add new "nodes" to the cluster almost endlessly.

It is limited to a finite amount of RAM, Threads and CPU speeds.

None

1 out of 14

Are you sure, you would like to submit your responses on Azure Databricks Questions and Answers and view your results?

Time's up

Azure Databricks Questions and Answers

Leave a Reply Cancel reply

Subscribe to Our Newsletter