Tag: python

Reduce Repetitive Tasks and Development Time by Writing your own Tool in Python

For the last weeks, I’ve been working on a CLI tool to help developers of the qbeast-spark open-source project test their changes to the code. I’ll show you how I did it using setuptools.

Motivation

Some weeks ago, we, at Qbeast, were running tests manually, which involved several repetitive steps, which stole our developers a considerable amount of time. These steps are necessary for testing, but they are unnecessarily time-consuming. In a few words, these steps consist of “simple things”, such as creating clusters in Amazon EMR with the required dependencies, running spark applications on these clusters, checking available Datasets in Amazon S3, and other few things. Things that seem easy to achieve but complex when you have to run and remember several commands and fix problems manually. Something that could be automated somehow.

We had to develop a tool to automate all these steps as a solution… Something like a Command Line Interface (CLI), which lets us run easy commands doing the whole process automatically. We decided to call it qauto (‘q’ for ‘Qbeast’ and ‘auto’ for… well, our CEO has a gift naming things…). Of course, this will not be the name of your application, but you can get some inspiration from it.

The CLI

This tool would let us run something like qauto cluster create or qauto benchmark run: Easy commands that wrap and ease complex ones.
You’d say: complex? – Yes. If you check the number of available options when you try to create a cluster in Amazon EMR using their CLI (if you never have), you’ll feel overwhelmed: take a look. There are more than 30 different options at the time of writing this! And most of these options will remain the same in all runs (except cluster name and the number of machines, maybe?).

So, why not create a simple command that lets you specify only the necessary options for your day-to-day commands?

Setuptools – Package your Python projects

With Python, you can create a simple tool to wrap these commands. Let’s see how to do it:

  1. Create the following file structure in your directory:

  1. As you can see, the project folder contains a directory named qauto and a setup.py file. The qauto directory will contain different .py files, which will indeed be the code of your application. You can have as many files of these as you want to structure your code correctly.
  2. The setup.py file will be used to let the system install the application.
  3. The __init__.py will contain the different packages that you have in your application. Imagine you have a main.py and a utils.py python files, then your init file must include:
  4. The main.py will contain the code for your application. In our case, we will write a simple example with a few options, but you can extend it to your like.
    1. Application entry point. This is the part where your code begins its execution. We will create a main group to wrap everything into the main application.
      import click
      
      @click.group()
      def main():
      pass
    2. This main group can contain other sub-groups. In our example, we’re going to add an aws group to the main, which will indeed contain another sub-group:
      @main.group()
      def aws():
          """AWS Cloud Provider commands"""
      
      @aws.group()
      def cluster():
          """Cluster-related commands."""
      
    3. Groups can contain commands, which are the real “executable things”. These commands may have arguments and options (mandatory/optional). In the following example, we are implementing the qauto aws cluster create <cluster-name> <number-of-nodes> command:
      @cluster.command("create")
      @click.argument("cluster-name")
      @click.option("--number-of-nodes", help="Number of nodes for the cluster", default=2, show_default=True)
      def aws_create_cluster(cluster_name, number_of_nodes):
          # your program logic

Following this basic structure, the final result for the main.py file could be something like:

With this made up, we currently have a command “create” inside some groups. Following the group structure from the main group, we can see the command itself is qauto aws cluster create. But… wait a second! We defined an alias for “aws” in our setup.py file, so an alternative will be qw cluster create (obviously providing some required arguments!).

Easy, isn’t it?

Installing your new custom-CLI

Once you have finished building your application -or you want to test it- you can install it easily using Python’s pip installer. Being in the root directory of your project, you can run pip install -e . to install your new application. From now on, you can run qauto <command> (or the name you specified for your application).

  • The -e option installs your program in editable mode: You won’t need to re-install it if you make some changes.
  • To uninstall it, just run pip uninstall qauto, or the name you specified for your application.

Conclusion

Setuptools is a powerful Python library that lets you package your python projects. It can be used for many applications when building something easy to run. We used it to create a CLI, which eases a command containing many redundant options. In the same way, you can add other commands and structure your code to your needs.

About Qbeast
Qbeast is here to simplify the lives of the Data Engineers and make Data Scientists more agile with fast queries and interactive visualizations. For more information, visit qbeast.io
© 2020 Qbeast. All rights reserved.
Share:

Back to menu

Continue reading

© 2020 Qbeast
Design by Xurris