Create a Spark enabled Jupyter Notebook on Azure

Arun Jith A
4 min readNov 26, 2017

I hope you have read my piece on doing the same thing on a Google Cloud Platform. Same stuff , just nuances in the platform. If you haven’t already, well this might be the time to do it. Here’s the link.

Apart from setting up the virtual machine and setting some firewall rules everything else remains the same once you are in the virtual machine. I don’t want to Ctrl + C - Ctrl+V the whole stuff. So I will just leave you with setting up a new machine and setting the firewall rules which are fairly simple. A piece of cake if you already are familiar with GCP

Set up a new machine

Azure gives you a lot of options in terms of the type of instances you can choose from. Starting from a Cloudera Hadoop instance to Windows Server Machines. I will, as usual, focus this one on a linux machine (Ubuntu), since my last one was also on an Ubuntu instance.

Its pretty simple right from the start. You just need to find the right places to click and I will make your life easier. Starting on your dashboard there will be a list of tabs on the pane to the left. Look for Virtual Machines. There will be a button to Add an instance. Select the platform you want (in this particular case choose Ubuntu). Select the version of Ubuntu and click Create. You don’t have to change anything here.

In the next screen choose a name for your instance, type of hard disk (I prefer SSD ,simply because it will be faster). There is an option to select a Password instead of a private SSH key to connect to Azure instances. Make yourself comfortable here. Rest of them are pretty much self explanatory , like your user name, Location of this machine, resource group etc. Basically whatever you like. Click OK.

The next page needs you to choose how much firepower you need behind your computations. Choose one depending on your usage. If you are wondering how much you need, think about the size of the data that you are handling and the RAM size should be greater than this.

The next tab is optional. But I would urge you to actually check one option here. The Auto Shutdown. If you are not gonna work on something for over a day continuously, enabling this will be a good idea, for we all can be lazy and forgetful and may run it for days without any idea that you didn’t turn it off ( Like I did so many times). Keep in mind each hour will be charged when the instance is running , whether or not you are actually using it.

When you are creating a machine for the first time, you will need to create a Storage Account. You will be prompted to do this in the same page. Select the options which suit you best. When in doubt don’t forget to hover over or click the question mark , which will definitely help you make a decision.

Once you create the instance its gonna take a while to start it off. Once its done, jot down your IP address and use PuTTy to connect to this instance via SSH. Read my previous article if you have any questions on this. At this point go ahead and install everything I have mentioned in the previous one.

Only one final step is remaining!

Creating firewall rules

Here we are. We now have a running instance with Jupyter and everything required set up. Now we need to open up the port we used to host the Jupyter Notebook. Now go to your virtual machines page and click on the running instance. You will see an option to configure Networking.

Configure Networking options for a virtual machine

Click on Add inbound rule. Select DNS(TCP) in the drop down for Service and select the port number where you need the connection to be created eg: 8888. Choose a name and we are all good to go.

Now you can go to your browser and type in http:<IP Addres>:(Port) on your web browser’s address bar to access the notebook.

Hope you have fun with your notebook. Cheers! Till next one.

--

--