Databricks CentOS 9 Stream containers
Thursday, October 20. 2022
Earlier this year I was tinkering with Databricks and got fed up with Ubuntu 18 and 20 with pretty old Python in it. Easy fix! I just made containers with CentOS so I could have more recent versions of stuff in my nodes.
Natural next move was to bump CentOS version from 8 to 9. While at it I discarded the previous hierarchy. Here is the original pic:
CentOS 8 containers explained:
- Minimal: CentOS 8 Stream + stuff to make CentOS work in Apache Spark / Databricks
- SSH: Minimal + OpenSSH server for those running Databricks on top of their own VNets. If you aren't this won't do you any good. TCP/22 won't be open from The World.
- Python: This here is the beef in Databricks! Running a Jupyter Notebook / IPython will absolutely definitely need this.
- DBFS FUSE: Linux user file system to attach the container into DatabricksFS / HadoopFS
- Standard: DBFS FUSE + OpenSSH, see disclaimer from SSH container about connectivity
The hierachy originates from https://github.com/databricks/containers/tree/master/experimental. Initially I just went with the flow, but as always, gaining more information and experience on Databrics, it became apparent to me this separation wasn't working.
CentOS 8 containers explained:
- Base: CentOS 9 Stream + stuff to make CentOS work in Apache Spark / Databricks + FUSE
- Rationale: You will want to have DBFS mounted to your container anyway. It won't be a security risk and FUSE is a very light piece of software in a Linux.
- Note: This will not work as an Apache Spark / Databricks node.
- Python: Running a Jupyter Notebook will absolutely definitely need this.
- Rationale: A Spark node without proper Python won't boot. This should be in minimal/base to begin with, but I just wanted to separate all the required Linux components and settings from a Python.
- Note: This is the minimal. This will boot and work.
- Python-SSH: Python + OpenSSH
- Note: You will need your own VNet to get SSH-access into your Spark Driver -node.
- Note 2: If you don't specify your own, a managed VNet will be used. You just won't have any access into it.
- R: For statistical computing needs, quite a few users prefer R programming language. This here container-type will enable you to do that from a Jupyter Notebook in Databricks. Will contain Python.
- Rationale: R is a huge chunk of container. If you won't be needing this, stick with Python which is so much lighter to load and operate.
- R-SSH: R + OpenSSH
- See disclaimer from above
Python components version table
To verify what Databicks puts into their nodes, I gathered versions following Python components.
Python component | CentOS 9 | 11.2 | 11.1 | 11.0 | 10.4 LTS | 9.1 LTS |
---|---|---|---|---|---|---|
ipykernel | 6.16.0 | 6.12.1 | 6.12.1 | 6.12.1 | 5.3.4 | 5.3.4 |
ipython | 7.32.0 | 7.32.0 | 7.32.0 | 7.32.0 | 7.22.0 | 7.22.0 |
Jinja2 | 2.11.3 | 2.11.3 | 2.11.3 | 2.11.3 | 2.11.3 | 2.11.3 |
jupyter-core | 4.11.1 | 4.8.1 | 4.8.1 | 4.8.1 | 4.7.1 | 4.7.1 |
matplotlib | 3.4.3 | 3.4.3 | 3.4.3 | 3.4.3 | 3.4.2 | 3.4.2 |
numpy | 1.20.3 | 1.20.3 | 1.20.3 | 1.20.3 | 1.20.1 | 1.19.2 |
pandas | 1.3.4 | 1.3.4 | 1.3.4 | 1.3.4 | 1.2.4 | 1.2.4 |
pyarrow | 4.0.0 | 7.0.0 | 7.0.0 | 7.0.0 | 4.0.0 | 4.0.0 |
six | 1.16.0 | 1.16.0 | 1.16.0 | 1.16.0 | 1.15.0 | 1.15.0 |
virtualenv | 20.16.5 | 20.8.0 | 20.8.0 | 20.8.0 | 20.4.1 | 20.4.1 |
Comparison of other differences:
Component | CentOS 9 | 11.2 | 11.1 | 11.0 | 10.4 LTS | 9.1 LTS |
---|---|---|---|---|---|---|
Scala | 2.12.14 | 2.12.14 | 2.12.14 | 2.12 | 2.12 | 2.12 |
Spark | 3.3.0 | 3.3.0 | 3.3.0 | 3.3.0 | 3.2.1 | 3.2.1 |
Python | 3.9.14 | 3.9.5 | 3.9.5 | 3.9.5 | 3.8.10 | 3.8.10 |
R | 4.2.1 | 4.1.3 | 4.1.3 | 4.1.3 | 4.1.2 | 4.1.2 |
Linux | CentOS 9 Stream |
Ubuntu 20.04.5 LTS |
Ubuntu 20.04.5 LTS |
Ubuntu 20.04.5 LTS |
Ubuntu 20.04.5 LTS |
Ubuntu 20.04.5 LTS |
These two tables explain very well my motivation of doing all this. Getting a full control of what goes into those containers. Second motivation is to publish the recipe for anybody to tailor their own custom made containers containing the versions of software they'll be needing.
Testing my container
Here is a sample notebook I used while develping this:
Modern Databricks notebook supports SQL, R and Scala with ease. I absolutely wanted to see what'll be needed to get all of those running.
To repeat: Python will be needed to get the entire Databricks node booting. On top of that, Scala will be included by Apache Spark. SQL will be handled by Spark. R will be provided by Rserve and interface to that is via notebook's R Python-client.
Final words
Databricks: Please, publish your own work also. For unknown reason you aren't doing that.
ChromeOS Flex test drive
Monday, October 10. 2022
Would you like to run an operating system which ships as-is, no changes allowed after installation? Can you imagine your mobile phone without apps of your own choosing? Your Windows10 PC? Your macOS Monterey? Most of us cannot.
As a computer enthusiast, of course I had to try such a feat!
Prerequisites
How to get your ball rolling, check out Chrome OS Flex installation guide @ Google. What you'll need is a supported hardware. In the installation guide, there is a certified models list and it will contain a LOT of supported PC and Mac models. My own victim/subject/target was 12 year old Lenovo and even that is in the certified list! WARNING: The hard drive of the victim computer will be erased beyond data recovery.
The second thing you'll need is an USB-stick to boot your destination ChromeOS. Any capacity will do. I had a 32 GiB stick and it used 128 MiB of it. That's less than 1% of the capacity. So, any booting stick will do the trick for you. Also, you won't be needing the stick after install, requirement is to just briefly slip an installer into it, boot and be done.
Third and final thing you'll be needing is a Google Chrome browser and ChromeOS recovery media extension into it:
To clarify, let's repeat that:
Your initial installation into your USB-stick will be done from Google Chrome browser using a specific extension in it.
Yes. It sounds bit unorthodox or different than other OS does. Given Google's reach on web browser users, that seemed like the best idea. This extension will work in any OS.
To log into your ChromeOS, you will need a Google Account. Most people on this planet do have one, so most likely you're covered. On the other hand, if your religious beliefs are strongly anti-Google, the likelihood of you running an opearting system made by Google is low. You rare person won't be able to log in, but everybody else will.
Creating installation media
That's it. As there won't be much data on the stick, the creation flys by fast!
Installing ChromeOS Flex
If media creation was fast, this will be even faster.
Just boot the newly crated stick and that's pretty much it. The installer won't store much information to the drive, so you will be done in a jiffy.
Running ChromeOS Flex
Log into the machine with your Google Account. Remenber: This OS won't work without a network connection. You really really will need active network connection to use this one.
All you have is set of Google's own apps: Chrome, Gmail, YouTube and such. By looking at the list in Find apps for your Chromebook, you'd initially think all is good and you can have our favorite apps in it. To make your (mis)belief stronger, even Google Play is there for you to run and search apps. Harsh reality sets in quite fast: you can not install anything via Google Play. All the apps in Google Play are for Android or real ChromeOS, not for Flex. Reason is obvious: your platform is running AMD-64 CPU and all the apps are for ARM. This may change in the future, but at the time of writing it is what it is.
You lose a lot, but there is something good in this trade-off. As you literally can not install anything, not even malware can be installed. ChromeOS Flex has to be the safest OS ever made! Most systems in the world are manufactured from ground up to be generic and be able to run anything. This puppy isn't.
SSH
After initial investigation, without apps, without password manager, without anything, I was about to throw the laptop back to its original dust gathering duty. What good is a PC which runs a Chrome browser and nothing else? Then I found the terminal. It won't let you to actually enter the shell of our ChromeOS laptop. It will let you SSH to somewhere else.
On my own boxes, I'll always deactivate plaintext passwords, so I bumped into a problem. From where do I get the private key for SSH-session? Obvious answer is either via Google Drive (<shivers>) or via USB-stick. You can import a key to the laptop and not worry about that anymore.
Word of caution: Think very carefully if you want to store your private keys in a system managed for you by Google.
Biggest drawbacks
For this system to be actually usable, I'd need:
- Proper Wi-Fi. This 12 year old laptop had only Wi-Fi 4 (802.11n)
- This I managed to solve by using an Asus USB-AC51 -dongle to get Wi-Fi 5.
- lsusb:
ID 0b05:17d1 ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U]
- This won't solve my home network's need for Wi-Fi 6, but gets me to The Net.
- There is no list of supported USB-devices. I have bunch of 802.11ac USB-sticks and this is the only one to work in ChromeOS Flex.
- My password manager and passwords in it
- No apps means: no apps, not even password manager
- What good is a browser when you cannot log into anything. All my passwords are random and ridiculously complex. They were not designed to be remembered nor typed.
- In The world's BEST password advice, Mr. Horowitz said: "The most secure Operating System most people have access to is a Chromebook running in Guest Mode."
Nuisances
Installer won't let you change keyboard layout. If you have US keyboard, fine. If you don't, it sucks for you.
Partitions
As this is a PC, the partition table has EFI boot. Is running EXT-4 and EXT-2 partitions. Contains encrypted home drive. It's basically a hybrid between an Android phone and a Linux laptop.
My 240 GiB SSD installed as follows:
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
11 32.8kB 33.3kB 512B RWFW
6 33.3kB 33.8kB 512B KERN-C chromeos_kernel
7 33.8kB 34.3kB 512B ROOT-C
9 34.3kB 34.8kB 512B reserved
10 34.8kB 35.3kB 512B reserved
2 35.3kB 16.8MB 16.8MB KERN-A chromeos_kernel
4 16.8MB 33.6MB 16.8MB KERN-B chromeos_kernel
8 35.7MB 52.4MB 16.8MB ext4 OEM
12 52.4MB 120MB 67.1MB fat16 EFI-SYSTEM boot, legacy_boot, esp
5 120MB 4415MB 4295MB ext2 ROOT-B
3 4415MB 8709MB 4295MB ext2 ROOT-A
1 8709MB 240GB 231GB ext4 STATE
Finally
This is either for explorers who want to try stuff out or alternatively for people whose needs are extremely limited. If all you do is surf the web or YouTube then this might be for you. Anything special --> forget about it.
The best part with this is the price. I had the old laptop already, so cost was $0.