<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[DataEngineerExpert.com Newsletter]]></title><description><![CDATA[A newsletter dedicated to talking about data engineering and technology trends. Interesting for Engineers and Leaders.]]></description><link>https://blog.dataengineerexpert.com</link><image><url>https://substackcdn.com/image/fetch/$s_!1_QY!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba7b3f1b-4996-42e0-bb5a-0c846d6300c4_421x421.png</url><title>DataEngineerExpert.com Newsletter</title><link>https://blog.dataengineerexpert.com</link></image><generator>Substack</generator><lastBuildDate>Wed, 06 May 2026 11:41:52 GMT</lastBuildDate><atom:link href="https://blog.dataengineerexpert.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Vedran Markulj]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[dataengineerexpert@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[dataengineerexpert@substack.com]]></itunes:email><itunes:name><![CDATA[Vedran Markulj]]></itunes:name></itunes:owner><itunes:author><![CDATA[Vedran Markulj]]></itunes:author><googleplay:owner><![CDATA[dataengineerexpert@substack.com]]></googleplay:owner><googleplay:email><![CDATA[dataengineerexpert@substack.com]]></googleplay:email><googleplay:author><![CDATA[Vedran Markulj]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How to deploy and configure routing table, vnet, nsg, including nsg rules and subnets using IaC for a production ready data platform]]></title><description><![CDATA[#02 - This article is a deep-dive, on how to configure a network layer for your data platform in Azure and how to deploy it using Azure Bicep (IaC).]]></description><link>https://blog.dataengineerexpert.com/p/how-to-deploy-and-configure-network-layer</link><guid isPermaLink="false">https://blog.dataengineerexpert.com/p/how-to-deploy-and-configure-network-layer</guid><dc:creator><![CDATA[Vedran Markulj]]></dc:creator><pubDate>Mon, 05 Aug 2024 05:45:15 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/98a9b947-8ab0-449f-b959-9853b0f25b72_853x607.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Before we start diving into the details here is a reason why, creating a network layer is important, don&#8217;t let anyone tell you that it&#8217;s sufficient to use managed networking built into many services on Azure. By creating a network layer that we have full control of we are preparing for a greater enterprise setup such as Hub and Spoke network architecture. In that context this network layer, housing the platform we want to build, we are going to create is a single spoke in a larger enterprise network landscape.</p><blockquote><p>DISCLAIMER: I am sorry to say but I am commited to writting quality articles for my subscribers. This means there will be alot of detail and I cannot promise any of it beeing easy to setup. This also means these articles are very different from what you will find elsewhere. Many guides aim at merly introducing topics and giving examples which cannot be used in any sort of production environment. This is very different in the articles you will find here.</p></blockquote><p></p><h3><strong>Infrastructure-as-code (IaC)</strong></h3><p>The IaC consists of bicep files applied to Azure infrastructure by submitting it to Azure Resource Manager (ARM)</p><p>Using DevOps pipelines. While this is the ideal solution, which we will cover, we will skip it for now and run the bicep scripts from our local machine. I want to skip the DevOps automation pipelines for now because that would take focus from the main objective of this article, which is to get an understanding of how to create a network layer infrastructure that can be used in development, testing, and production environments, and how to do that using Azure Bicep.</p><p></p><h3><strong>Create vnet and subnet and nsg</strong></h3><blockquote><p>DISCLAIMER: The following commands are for local, manual deployment and should only be used during development. We will go into the DevOps pipelines automation of the IaC in a later Article. But if you are up for it by having the commands that can be run manually you practically have the recipe for that commands a DevOps pipeline should run to automate the deployment process.</p></blockquote><p></p><h3><strong>Prerequisites</strong></h3><ul><li><p>You need to have access to an Azure Subscription.</p></li><li><p>You need to have the Azure CLI installed on your local machine.</p></li><li><p>You need to have created a Service Principal with a Secret.</p></li><li><p>The Service Principal needs Contributor rights on the Subscription.</p></li></ul><p></p><h3><strong>Access the subscription</strong></h3><p>Here is how you can access your subscription from the Azure CLI. Because we want to be able to automate the deployment in the future, we will not authenticate with the Azure CLI using a personal user, instead, let&#8217;s authenticate using a Service Principal.</p>
      <p>
          <a href="https://blog.dataengineerexpert.com/p/how-to-deploy-and-configure-network-layer">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[How to deploy a production ready data platform centered around Databricks and Azure services.]]></title><description><![CDATA[This post is an introduction and overview of what we want to set up. A production ready Data Platform consisting of Infrastructure and Framework, using Azure services and Databricks]]></description><link>https://blog.dataengineerexpert.com/p/deploy-a-production-ready-data-platform</link><guid isPermaLink="false">https://blog.dataengineerexpert.com/p/deploy-a-production-ready-data-platform</guid><dc:creator><![CDATA[Vedran Markulj]]></dc:creator><pubDate>Sun, 21 Jul 2024 14:47:56 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/4b4f9e0a-cc38-4228-9371-1d0e99faf2d5_853x607.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://blog.dataengineerexpert.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Vedran&#8217;s Substack! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The goal is to deploy and configure a production-ready data platform consisting of Azure Databricks Workspace at the heart, Azure Storage account as the data lake for the medallion architecture (bronze, silver, and gold data layers), Azure Key Vault for secrets, and securing everything with Vnet, Subnets, and Network Security Group in Azure. Beyond the infrastructure I will also cover the deployment of a Framework such that Databricks is configured completely and ready to work with as a Data Engineer or Data Scientist.</p>
      <p>
          <a href="https://blog.dataengineerexpert.com/p/deploy-a-production-ready-data-platform">
              Read more
          </a>
      </p>
   ]]></content:encoded></item><item><title><![CDATA[The first newsletter publication and what to expect from Data Engineer Expert.]]></title><description><![CDATA[What is going to happen for the reminder of 2024]]></description><link>https://blog.dataengineerexpert.com/p/dataengineer-expert-first-newsletter-publication</link><guid isPermaLink="false">https://blog.dataengineerexpert.com/p/dataengineer-expert-first-newsletter-publication</guid><dc:creator><![CDATA[Vedran Markulj]]></dc:creator><pubDate>Thu, 11 Jul 2024 19:20:00 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba7b3f1b-4996-42e0-bb5a-0c846d6300c4_421x421.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div class="latex-rendered" data-attrs="{&quot;persistentExpression&quot;:&quot;&quot;,&quot;id&quot;:&quot;QMBYIGRZIF&quot;}" data-component-name="LatexBlockToDOM"></div><p>Welcome everybody! I have finally decided to use Substack to publish my newsletter! Some of you might know it as &#8220;Stay Ahead Of The Machine Learning Curve&#8221;. Thank you for your patience as some of you &#8220;subscribed&#8220; to my newsletter over a year ago!</p><p>I&#8217;ll be committing to writing these biweekly or monthly going forward because I believe the short-from content on LinkedIn can&#8217;t cover the same level of depth that I can cover here.</p><p>The articles I will publish here will aim at Experienced Data Engineers and Architects who work with data analytics platforms. The Data Engineering Expert newsletter is also aimed at AI leaders looking to get a better understanding of trends in the rapidly changing landscape.</p><p></p><h3>My plan for the rest of 2024</h3><p>To share some of my experience with Azure Databricks. I will in aim to release articles about setting up a near production-ready Databricks setup in Azure. I have previously implemented such platforms and frameworks with great success and value to the companies, and I would like to share the recipe for succeeding in establishing production-ready data analytics platforms in the cloud. </p><p></p><p>Here is a brief draft content (it is now complete and many of the topics will extend into 2025. The topics might change over time)<br><br>In the end, a setup is deeply dependent on the needs of a company and the team that will work on it. Therefore there is almost no right and wrong, however, there are some common aspects to each setup. I will try and cover these. I will also cover what you can do as a leader (enabler) for such an initiative to succeed.</p><p>The following list is not complete and you can expect other articles in between the below topics to be published. <br><br>Content (draft)<br><br>1<br>Create and configure vnet, subnet and nsg using bicep.<br><br>2<br>Create and configure storage account and connect to vnet and subnet using bicep.<br><br>3<br>Create Databricks workspace in Azure and connect to vnet and subnet using Bicep.<br><br>4<br>Configure Databricks Pool using Python and the Databricks Rest Api.<br><br>5<br>Configure Databricks Cluster using Python and the Databricks Rest Api.<br><br>6<br>Add environment variables to Databricks Cluster using Python and the Databricks Rest Api.<br><br>7<br>Add Python dependencies to Databricks cluster using Python and the Databricks Rest Api.<br><br>8<br>Add Azure Service Principal to Databricks using Python and the Databricks Rest Api.<br><br>9<br>Create an Azure Key Vault backed secret scope in Databricks using Python and the Databricks Rest Api.<br><br>10<br>Set secrets in Azure Key Vault using Rest API and service principal. <br><br>11<br>Retrieve secrets from Databricks secret scope connected to Azure Key Vault using Databricks Notebook.<br><br>12<br>Mount Azure data lake storage account on Databricks cluster from Databricks Notebook.<br><br>13<br>Create database (schema) in the Metastore in Databricks from Databricks Notebook.<br><br>14<br>Register External Table in the Metastore in Databricks from Databricks Notebook.<br><br>15<br>Load data from External Table into a PySpark dataframe using Databricks Notebook.<br><br>16<br>Create Delta Table from PySpark dataframe using Databricks Notebook.<br><br>17<br>Append a PySpark dataframe to Existing Delta Table using Databricks Notebook.<br><br>18<br>Overwrite an Existing Delta Table from a PySpark dataframe using Databricks Notebook.<br><br>19<br>Optimize Delta Table using Databricks Notebook.<br><br>20<br>Enable PySpark dataframe caching using Databricks Notebook.<br><br>21<br>Create a Databricks Workflow from Databricks Workspace.<br><br>22<br>Create a Databricks Workflow using Python and the Databricks Rest Api.<br><br>23<br>Change Databricks Workflow owner and run-as using Python and the Databricks Rest Api.<br><br>24<br>Trigger a Databricks Workflow from Azure Data Factory using the Databricks Rest Api.<br><br>25<br>Send parameter values from Azure Data Factory to Databricks Workflow using Databricks Rest Api.<br><br>26<br>Mix SQL and PySpark when working with Spark Dataframes in Databricks Notebook.<br><br>27<br>Trigger a Databricks Workflow from a Databricks Notebook using the Databricks Rest Api.<br><br>28<br>Send parameter values to Databricks Workflow when triggering from Databricks Notebook using the Databricks Rest Api.</p><p>29<br>Deduplicate data and Insert deduplicated data from a PySpark dataframe into an existing Delta Table.<br><br>30<br>Update existing Delta Table and create type two history using PySpark Merge from Databricks Notebook.<br><br>31<br>Create controlled Databricks Notebook and workflow deployments to a Databricks Workspace using Python and the Databricks Rest Api<br><br>32<br>Configure Databricks Cluster to support distributed geospatial computation using SQL in Databricks notebook.<br><br>33<br>Work with geospatial data using Spark SQL in Databricks notebook.<br><br>34<br>Work with geospatial data using User Defined Functions and PySpark in Databricks notebook.<br><br>&#8230; more to come &#8230;</p><p></p><p>Thanks so much for reading my very first newsletter! I&#8217;m excited to be sharing more of my learnings with you as I continue this &#8220;entrepreneurship&#8220; journey! Follow me on LinkedIn (<a href="https://www.linkedin.com/in/vedranmarkulj/">https://www.linkedin.com/in/vedranmarkulj/</a>).</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://blog.dataengineerexpert.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://blog.dataengineerexpert.com/subscribe?"><span>Subscribe now</span></a></p>]]></content:encoded></item></channel></rss>