1.
What's the purpose of linked services in Azure Data Factory?
2.
What is a lambda architecture and what does it try to solve?
3.
How do you infer the data types and column names when you read a JSON file?
4.
In Azure Databricks, High Concurrency clusters can run workloads developed in Scala.
5.
Which is the correct syntax for overwriting data in Azure Synapse Analytics from a Databricks notebook?
6.
What does Azure Data Lake Storage (ADLS) Passthrough enable?
7.
As Azure Limit, the Storage accounts per region per subscription is .............
8.
You are a data engineer implementing a lambda architecture on Microsoft Azure. You use an open-source big data solution to collect, process, and maintain data. The analytical data store performs poorly. You must implement a solution Provide data warehousing, Reduce ongoing management activities and Deliver SQL query responses in less than one second, Which type of cluster should you create?
9.
Which command orders by a column in descending order?
10.
The driver and the executors are
11.
The ............... is a big data processing architecture that combine both batch- and real-time processing methods.
12.
Every Cluster has a one executor and one or more drivers.
13.
Driver programs access Apache Spark through a .................... object regardless of deployment location.
14.
What happens if the command option("checkpointLocation", pointer-to-checkpoint directory) is not specified?
15.
Which cluster modes are supported in Azure Databricks?
16.
In Azure Databricks, You can use both Python 2 and Python 3 notebooks on the same cluster.
17.
To parallelize work, the unit of distribution is a Spark Cluster. Every Cluster has a Driver and one or more executors. Work submitted to the Cluster is split into what type of object?
18.
How do you create a DataFrame object?
19.
Which languages are supported in Apache Spark?
20.
In cluster, each Job is broken down into ..............
21.
It is possible to put an Azure Databricks Notebook under Version Control in an Azure Devops repo
22.
................. is a fundamental isolation unit in Databricks.
23.
The first step to using Azure Databricks is to create and deploy a .........................
24.
Use ........... and .............. to remove duplicate data in a DataFrame
25.
In Azure Databricks, A .................. is recommended for a single user, it can run workloads developed in any language: Python, SQL, R, and Scala.
26.
Use the ............... function to display a small set of rows from a larger DataFrame
27.
Which DataFrame method do you use to create a temporary view?
28.
How many drivers does a Cluster have?
29.
................ is the number of which is determined by the number of cores and CPUs of each node.
30.
Each Driver has a number of Slots to which parallelized Tasks can be assigned to it by the Executor.
31.
............... is a data ingestion and transformation service that allows you to load raw data from over 70 different on-premises or cloud sources.
32.
Spark has optimized sharing data from one worker to another operation by using a format called ............
33.
What are the two prerequisites for connecting Azure Databricks with Azure Synapse Analytics that apply to the Azure Synapse Analytics instance?
34.
As Azure Limit, Resource groups per subscription is ..............
35.
In Azure Databricks, High Concurrency clusters terminates automatically after 120 minutes, by default.
36.
....................... is the in-memory storage format for Spark SQL, DataFrames & Datasets.
37.
............... is a fully-managed version of the open-source Apache Spark analytics and data processing engine.
38.
Use the .................... function to display a DataFrame in the Notebook
39.
..................... enables you to capture the audit logs and make then centrally available and fully searchable.
40.
A ............ is a collection of cells to execute code, to render formatted text, or to display graphical visualizations.
41.
Which statement about the Azure Databricks Data Plane is true?
42.
A pipeline is a logical grouping of activities that together perform a task
43.
You can read and write data that's stored in Delta Lake by using ............
44.
In Azure Databricks platform architecture, the ................hosts Databricks jobs, notebooks with query results, the cluster manager, web application, Hive metastore, and security access control lists (ACLs) and user sessions.
45.
What command should be issued to view the list of active streams?
46.
Databricks was founded by the creators of ..........................
47.
In Delta Lake architecture, Which tables contain raw data ingested from various sources (JSON files, RDBMS data, IoT data, etc.)?
48.
In the cluster, the second level of parallelization is the .....................
49.
.................. is a unified processing engine that can analyze big data using SQL, machine learning, graph processing, or real-time stream analysis
50.
Scaling vertically means .................
51.
VNet peering is only required if using the standard deployment without VNet injection.
52.
The first level of parallelization is the ........................ that is a JVM running on a node, typically, one executor instance per node.
53.
In Databricks, maximum of number of Azure Databricks API calls/hour is ........
54.
In Azure Databricks, You can change the cluster mode after a cluster is created.
55.
You can create a cluster without creating a Databricks workspace
56.
What is an Azure Key Vault-backed secret scope?
57.
The supported Databricks notebook format is the ................ file type.
58.
In Azure Databricks, ............................. requires at least one Spark worker node in addition to the driver node to execute Spark jobs.
59.
Which command specifies a column value in a DataFrame's filter? Specifically, filter by a productType column where the value is equal to book?
60.
What happens to Databricks activities (notebook, JAR, Python) in Azure Data Factory if the target cluster in Azure Databricks isn't running when the cluster is called by Data Factory?
61.
Which of the following statements describes a wide transformations?
62.
You need to find the average of sales transactions by storefront. Which of the following aggregates would you use?
63.
...................... takes a step further by removing the human intervention and relying on automated tests to automatically determine whether the build should be deployed into production.
64.
What is the Python syntax for defining a DataFrame in Spark from an existing Parquet file in DBFS?
65.
In Spark Structured Streaming, what method should be used to read streaming data into a DataFrame?
66.
What steps are required to authorize Azure DevOps to connect to and deploy notebooks to a staging or production Azure Databricks workspace?
67.
You have an Azure Databricks workspace named workspace1 in the Standard pricing tier. You need to configure workspace1 to support autoscaling all-purpose clusters. The solution must automatically scale down workers when the cluster is underutilized for three minutes, minimize the time it takes to scale to the maximum number of workers, and minimize costs. What should you do first?
68.
Scaling horizontally means .................
69.
What sort of pipeline is required in Azure DevOps for creating artifacts used in releases?
70.
In Azure Databricks platform architecture, the web app and cluster manager is part of the ...................
71.
Use ............ to select a subset of columns from a DataFrame
72.
When creating a new cluster in the Azure Databricks workspace, what happens behind the scenes?
73.
In Azure Databricks, ............................. has no workers and runs Spark jobs on the driver node.
74.
In Databricks, the notebook interface is the .............. program
75.
In Azure Databricks platform architecture, the ................ contains all the Databricks runtime clusters hosted within the workspace.
76.
When doing a write stream command, what does the outputMode("append") option do?
77.
You are designing an Azure Databricks table. The table will ingest an average of 20 million streaming events per day. You need to persist the events in the table for use in incremental load pipeline jobs in Azure Databricks. The solution must minimize storage costs and incremental load times. What should you include in the solution?
78.
You are creating a new notebook in Azure Databricks that will support R as the primary language but will also support Scala and SQL. Which switch should you use to switch between languages?
79.
What is required to specify the location of a checkpoint directory when defining a Delta Lake streaming query?
80.
In Spark Application, the ................ is the JVM in which our application runs.
81.
.................. is the blade that you use to assign roles to grant access to Azure resources.
82.
You plan to perform batch processing in Azure Databricks once daily. Which type of Databricks cluster should you use?
83.
You can only attach a notebook for a running cluster.
84.
How do you list files in DBFS within a notebook?
85.
You can deploy Azure Databricks using ....................
86.
To use your notebook with another cluster, what should you do?
87.
In which modes does Azure Databricks provide data encryption?
88.
What's the primary supported language in Apache Spark?
89.
Which feature of Spark determines how your code is executed?
90.
If you create a DataFrame that will read some data from Azure Blob Storage, and then you create another DataFrame by filtering the initial DataFrame. What feature of Spark causes these transformation to be analyzed?
91.
You create an Azure Databricks cluster and specify an additional library to install. When you attempt to load the library to a notebook, the library in not found. You need to identify the cause of the issue. What should you review?
92.
................ automates your release process up to the point where human intervention is needed, by clicking a button.
93.
The Delta Lake Architecture is a vast improvement upon the traditional Lambda architecture.
94.
A Serverless Pool is self-managed pool of cloud resources that is auto-configured for interactive Spark workloads.
95.
When using the Column Class, which command filters based on the end of a column value? For example, a column named verb and filtered by words ending with "ing".
96.
You have an Azure Databricks resource. You need to log actions that relate to compute changes triggered by the Databricks resources. Which Databricks services should you log?
97.
In Delta Lake architecture, Which tables provide business level aggregates often used for reporting and dashboarding?
98.
Azure Databricks is a ................... platform.
99.
As Azure Limit, Virtual Machines (VMs) per subscription per region is ..............
100.
..................... allows Azure to terminate the cluster after a specified number of minutes of inactivity.
101.
To set a Spark configuration property called password to the value of the secret stored in secrets/apps/acme-app/password, Which syntax is correct?
102.
In Azure Databricks, the default cluster mode is ...............
103.
............. is currently the most secure way to access Azure data services from Azure Databricks.
104.
You can only run up to ................. concurrent jobs in Databricks workspace
105.
In Databricks, the first user to login and initialize the workspace is the .............
106.
........................ is the authorization system you use to manage access to Azure resources.
107.
How do you cache data into the memory of the local executor for instant access?
108.
A Cluster can have drivers running in parallel.
109.
The maximum number of jobs that a Databricks workspace can create in an hour is ..............
110.
................... are groups of computers that are treated as a single computer and handle the execution of commands issued from notebooks.
111.
In Azure Databricks, only ............. supports table access control.
112.
How can parameters be passed into an Azure Databricks notebook from Azure Data Factory?
113.
In Azure Databricks, A .................. provide fine-grained sharing for maximum resource utilization and minimum query latencies, it can an run workloads developed in SQL, Python, and R.
114.
In Delta Lake architecture, Which tables provide a more refined view of our data?
115.
You can use your notebook to run a code without attaching it to a cluster
116.
What're the characteristics of the "3 Vs of Big Data"?
117.
In Databricks, The maximum number of notebooks or execution contexts attached to a cluster is ............
118.
In cluster, each parallelized action is referred to as ................
119.
...................... is a transactional storage layer designed specifically to work with Apache Spark and Databricks File System (DBFS).
120.
................ is an open-source format, supported by many data platforms, including Azure Synapse Analytics.
121.
How do you perform UPSERT in a Delta dataset?
122.
Use ............... to remove columns from a DataFrame
123.
Use ................. to return records from a Dataframe to the driver of the cluster
124.
In cluster, the results of each Job (parallelized/distributed action) is returned to the ...............
125.
In Databricks, each workspace is identified by a globally unique 53-bit number, called .................
126.
................. is a scalable real-time data ingestion service that processes millions of data in a matter of seconds.
127.
You plan to build a structured streaming solution in Azure Databricks. The solution will count new events in five-minute intervals and report only events that arrive during the interval. The output will be sent to a Delta Lake table. Which output mode should you use?
128.
In Cluster, the first level of parallelization is the .....................
129.
Delta Lake integrates tightly with Apache Spark, and uses an open format that is based on ....................
130.
Apache Spark Structured Streaming is a fast, scalable, and fault-tolerant stream processing API
131.
Which method for renaming a DataFrame's column is incorrect?
132.
What is the Databricks Delta command to display metadata?
Are you sure, you would like to submit your responses on Azure Databricks Questions and Answers and view your results?