Publish your SBT project to the Central Repository

Paola Pardo & Eric Ávila

You and your team have been working hard on the very first release of your beloved project. Now it’s time to make it public for the ease of software usage 🚀

In this post, we will be explaining how to publish an sbt project to the Central Repository through Sonatype OSSRH Nexus Manager.

First-time preparation for releasing artifacts

Sonatype credentials and GPG keys must be set up before publishing an artifact. These steps have to be done only once and require some human revision.

Sonatype Setup

Sonatype OSSRH (OSS Repository Hosting) provides a repository hosting service for open source project binaries. It uses Maven Repository format and it would allow you to:

  • deploy snapshots
  • stage releases
  • promote releases and sync to the Central Repository

Can I change, delete or modify the published artifacts? → Quick and short answer: No.

Be careful and use -SNAPSHOT suffix on your version to test binaries before moving to a definitive stage.

For more information, click here 👈

1. Register to JIRA

There are some configurations that require human interaction (see here why). Sonatype uses JIRA to manage requests, so if you don’t have an account, it’s time to do so.

2. Open a JIRA ticket to solicit the domain

Now you have to create a new ticket requesting the namespace for your packages.

It’s very simple, but here it’s the request we made in case you want some inspiration:

3. Set your domain accordingly

Right after you publish the ticket, you will receive an automatic notification to configure the domain properly.

After setting TXT in your domain and editing the status of the ticket, you have to wait for the congratulations message.

Let’s move on to the next step!

GPG Setup

In order to sign the artifacts that you want to publish, you will need to create a private/public key pair. Using your tool of choice, create it and upload the public key to a key server when asked, or upload it manually.

I’ll show how to do so on the Linux command line:

1. Generate a GPG key

$> gpg --gen-key 

2. List keys to know it’s present in your machine

Once the key pair is generated, we can list them along with any other keys installed:

$> gpg --list-keys 
/Users/xxx/.gnupg/pubring.gpg
----------------------------------
pub   rsa2048 2012-02-14 [SCEA] [expires: 2028-02-09]
      <public-key>
uid           [ultimate] Eugene Yokota <eed3si9n@gmail.com>
sub   rsa2048 2012-02-14 [SEA] [expires: 2028-02-09]

3. Upload the public key to a server, so you will be able to sign packages and verify them

Since other people need your public key to verify your files, you have to distribute your public key to a key server:

$> gpg --keyserver keyserver.ubuntu.com --send-keys <public-key>

This first key will be set as default for your system, so now the sbt-pgp plugin will be able to use it.

Releasing an artifact

Now that you are registered in the OSS Sonatype Repository and configured the GPG keys to sign your library, it’s time to prepare your project to compile and produce the corresponding artifacts.

1. Prepare build.sbt

Add ‘sbt-sonatype’ and ‘sbt-pgp’ plugins to your project/plugins.sbt  file.

addSbtPlugin("org.xerial.sbt" % "sbt-sonatype" % "3.9.9")
addSbtPlugin("com.github.sbt" % "sbt-pgp" % "2.1.2")

In your build.sbt you have to add the reference to the remote Sonatype repository and some settings to accomplish the Maven Central repository requirements.

// Repository for releases on Maven Central using Sonatype
publishTo := sonatypePublishToBundle.value
sonatypeCredentialHost := "s01.oss.sonatype.org"

publishMavenStyle := true
sonatypeProfileName := "io.qbeast" // Your sonatype groupID

// Reference the project OSS repository
import xerial.sbt.Sonatype._
sonatypeProjectHosting := Some(
  GitHubHosting(user = "Qbeast-io", repository = "qbeast-spark", email = "info@qbeast.io"))

// Metadata referring to licenses, website, and SCM (source code management)
licenses:= Seq(
  "APL2" -> url("https://www.apache.org/licenses/LICENSE-2.0.txt"))
homepage := Some(url("https://qbeast.io/"))
scmInfo := Some(
  ScmInfo(
    url("https://github.com/Qbeast-io/qbeast-spark"),
    "scm:git@github.com:Qbeast-io/qbeast-spark.git"))
// Optional: if you want to publish snapshots 
// (which cannot be released to the Central Repository)
// You must set the sonatypeRepository in which to upload the artifacts
sonatypeRepository := {
  val nexus = "https://s01.oss.sonatype.org/"
  if (isSnapshot.value) nexus + "content/repositories/snapshots"
  else nexus + "service/local"
}
<!-- (Optional) pomExtra field, where you can reference developers, 
		among other things. This configuration must be in XML format, like
		in the example below, and it will be included in your .pom file. -->
pomExtra := 
	<developers>
    <developer>
      <id>osopardo1</id>
      <name>Paola Pardo</name>
      <url>https://github.com/osopardo1</url>
    </developer>
  </developers>

2. Sonatype credentials ~/.sbt/1.0/sonatype.sbt

Apart from the key, you need to set up a credentials file for the Sonatype server. Create a file in $HOME/.sbt/1.0/sonatype.sbt. This file will contain the credentials for Sonatype:

credentials += Credentials("Sonatype Nexus Repository Manager",
				"s01.oss.sonatype.org", // all domains registered since February 2021
				"(username)",
				"(password)")

3. Publish, stage and close

The easiest way is to run the following commands.

sbt clean
sbt publishSigned
sbt sonatypeBundleRelease

Please note that executing the third command is a definitive step and there’s no way back.

The full guide and explanation of what the next commands do can be found here: https://github.com/xerial/sbt-sonatype#publishing-your-artifact. I recommend reading it if it’s your first time doing so, to better understand the process.

Let’s explain the commands step by step.

  1. The first command does the cleaning of your target/ directory inside the project.
project-root$> sbt clean

   2. The second command creates all the required artifacts to publish to Maven Central. These files include different JARs (jar, jar+javadoc, jar+sources), a POM file with metadata required for publishing in Maven (qbeast-spark_x.xx-y.y.y.pom), and all the required checksum/CRC files.

project-root$> sbt publishSigned

    3. The third command prepares the Sonatype repository, uploads JARs, and releases them to the public, syncing with the Maven Central repository.

CAUTION. Executing this command is a definitive step and there’s no way back.

project-root$> sbt sonatypeBundleRelease

If you want to control each of the steps of sonatypeBundleRelease, you can run:

project-root$> sbt sonatypePrepare
project-root$> sbt sonatypeBundleUpload
project-root$> sbt sonatypeRelease

And finally…

In 10 minutes you will have your package ready for others to download and use it 🔥

Don’t worry if it does not appear on the Maven Central Repository in the following hours, since it takes a little longer to sync. But you can play with it right away!

We hope that you find this post useful 🙌 And don’t hesitate to share your projects with us 😃

About Qbeast
Qbeast is here to simplify the lives of the Data Engineers and make Data Scientists more agile with fast queries and interactive visualizations. For more information, visit qbeast.io
© 2020 Qbeast. All rights reserved.
Share:

Back to menu

Continue reading

Reduce Repetitive Tasks and Development Time by Writing your own Tool in Python

For the last weeks, I’ve been working on a CLI tool to help developers of the qbeast-spark open-source project test their changes to the code. I’ll show you how I did it using setuptools.

Motivation

Some weeks ago, we, at Qbeast, were running tests manually, which involved several repetitive steps, which stole our developers a considerable amount of time. These steps are necessary for testing, but they are unnecessarily time-consuming. In a few words, these steps consist of “simple things”, such as creating clusters in Amazon EMR with the required dependencies, running spark applications on these clusters, checking available Datasets in Amazon S3, and other few things. Things that seem easy to achieve but complex when you have to run and remember several commands and fix problems manually. Something that could be automated somehow.

We had to develop a tool to automate all these steps as a solution… Something like a Command Line Interface (CLI), which lets us run easy commands doing the whole process automatically. We decided to call it qauto (‘q’ for ‘Qbeast’ and ‘auto’ for… well, our CEO has a gift naming things…). Of course, this will not be the name of your application, but you can get some inspiration from it.

The CLI

This tool would let us run something like qauto cluster create or qauto benchmark run: Easy commands that wrap and ease complex ones.
You’d say: complex? – Yes. If you check the number of available options when you try to create a cluster in Amazon EMR using their CLI (if you never have), you’ll feel overwhelmed: take a look. There are more than 30 different options at the time of writing this! And most of these options will remain the same in all runs (except cluster name and the number of machines, maybe?).

So, why not create a simple command that lets you specify only the necessary options for your day-to-day commands?

Setuptools – Package your Python projects

With Python, you can create a simple tool to wrap these commands. Let’s see how to do it:

  1. Create the following file structure in your directory:

  1. As you can see, the project folder contains a directory named qauto and a setup.py file. The qauto directory will contain different .py files, which will indeed be the code of your application. You can have as many files of these as you want to structure your code correctly.
  2. The setup.py file will be used to let the system install the application.
  3. The __init__.py will contain the different packages that you have in your application. Imagine you have a main.py and a utils.py python files, then your init file must include:
  4. The main.py will contain the code for your application. In our case, we will write a simple example with a few options, but you can extend it to your like.
    1. Application entry point. This is the part where your code begins its execution. We will create a main group to wrap everything into the main application.
      import click
      
      @click.group()
      def main():
      pass
    2. This main group can contain other sub-groups. In our example, we’re going to add an aws group to the main, which will indeed contain another sub-group:
      @main.group()
      def aws():
          """AWS Cloud Provider commands"""
      
      @aws.group()
      def cluster():
          """Cluster-related commands."""
      
    3. Groups can contain commands, which are the real “executable things”. These commands may have arguments and options (mandatory/optional). In the following example, we are implementing the qauto aws cluster create <cluster-name> <number-of-nodes> command:
      @cluster.command("create")
      @click.argument("cluster-name")
      @click.option("--number-of-nodes", help="Number of nodes for the cluster", default=2, show_default=True)
      def aws_create_cluster(cluster_name, number_of_nodes):
          # your program logic

Following this basic structure, the final result for the main.py file could be something like:

With this made up, we currently have a command “create” inside some groups. Following the group structure from the main group, we can see the command itself is qauto aws cluster create. But… wait a second! We defined an alias for “aws” in our setup.py file, so an alternative will be qw cluster create (obviously providing some required arguments!).

Easy, isn’t it?

Installing your new custom-CLI

Once you have finished building your application -or you want to test it- you can install it easily using Python’s pip installer. Being in the root directory of your project, you can run pip install -e . to install your new application. From now on, you can run qauto <command> (or the name you specified for your application).

  • The -e option installs your program in editable mode: You won’t need to re-install it if you make some changes.
  • To uninstall it, just run pip uninstall qauto, or the name you specified for your application.

Conclusion

Setuptools is a powerful Python library that lets you package your python projects. It can be used for many applications when building something easy to run. We used it to create a CLI, which eases a command containing many redundant options. In the same way, you can add other commands and structure your code to your needs.

About Qbeast
Qbeast is here to simplify the lives of the Data Engineers and make Data Scientists more agile with fast queries and interactive visualizations. For more information, visit qbeast.io
© 2020 Qbeast. All rights reserved.
Share:

Back to menu

Continue reading

Scala Test Dive-in: Public, Private and Protected methods

We all know that testing code can be done in different ways. This pill is not to explain which is the best way to see if your Scala project is working as it should. But it will provide some tips and tricks for testing public, private, and protected methods.

Public Methods

Public methods are the functions inside a class, that can be called from outside, through the instantiated object. Public method testing is no rocket science. In Scala, the use of Matchers and Clues is needed in order to understand what is wrong.

Imagine we want to test a MathUtils class that has simple methods min and max:

class MathUtils {
  def min(x: Int, y: Int): Int = if (x <= y) x else y

  def max(x: Int, y: Int): Int = if (x >= y) x else y

}

This is how your test should look like:

import org.scalatest.AppendedClues.convertToClueful
import org.scalatest.matchers.should.Matchers
import org.scalatest.flatspec.AnyFlatSpec


class MathUtilsTest extends AnyFlatSpec with Matchers {

  "MathUtils" should "compute min correctly" in {
    val min = 10
    val max = 20
		val mathUtils = new MathUtils()
    mathUtils.min(min, max) shouldBe min withClue s"Min is not $min"
  }

  it should "compute max correctly" in {
    val min = 10
    val max = 20
		val mathUtils = new MathUtils()
    mathUtils.max(min, max) shouldBe max withClue s"Max is not $max"
  }
}

Private Methods

Private methods are the methods that cannot be accessed in any other class than the one in which they are declared.

Testing these functions is way more tricky. You have different ways of proceeding: copy and paste the implementation in a test class (which is out of the table), use Mockito, or try with PrivateMethodTester.

Let’s write a private method on the class MathUtils:

class MathUtils {

  def min(x: Int, y: Int): Int = if (x <= y) x else y

  def max(x: Int, y: Int): Int = if (x >= y) x else y

  private def sum(x: Int, y: Int): Int = {
    x + y
  }

  def sum(x: Int, y: Int, z: Int): Int = {
    val aux = sum(x, y)
    sum(aux, z)
  }

}

PrivateMethodTester is a trait that facilitates the testing of private methods. You have to mix it in your test class in order to take advantage of it.


import org.scalatest.AppendedClues.convertToClueful
import org.scalatest.matchers.should.Matchers
import org.scalatest.flatspec.AnyFlatSpec
import org.scalatest.PrivateMethodTester

class MathUtilsPrivateTest extends AnyFlatSpec with Matchers with PrivateMethodTester {

  "MathUtils" should "compute sum correctly" in {
  
    val x = 1
    val y = 2

    val mathUtils = new MathUtils()
		val sumPrivateMethod = PrivateMethod[Int]('sum)
    val privateSum = mathUtils invokePrivate sumPrivateMethod(1, 2)
    privateSum shouldBe (x + y) withClue s"Sum is not is not ${x + y}"
  }
}

In val sumPrivateMethod = PrivateMethod[Int]('sum) we have different parts:

  • [Int] is the return type of the method
  • (’sum) is the name of the method to call

In mathUtils invokePrivate sumPrivateMethod(x, y) you can collect the result in a val to compare and understand if it’s working properly. You need to use an instance of the class/object to invoke the method, otherwise, it will not find it.

Protected Methods

A protected method is like a private method in that it can only be invoked from within the implementation of a class or its subclasses.

For example we decide to make sum method protected instead of private. Class MathUtils would look like this:

class MathUtils {
  def min(x: Int, y: Int): Int = if (x <= y) x else y

  def max(x: Int, y: Int): Int = if (x >= y) x else y

  protected def sum(x: Int, y: Int): Int = x + y

}

If we create a new object from MathUtils and try to call the sum method, it will throw a warning saying that ‘sum is not accessible from this place’

But don’t worry, we have a solution for that as well.

We can write a subclass specific for this test and override the method since it can be invoked through the implementation of its subclasses.


class MathUtilsTestClass extends MathUtils {
  override def sum(x: Int, y: Int): Int = super.sum(x, y)
}

class MathUtilsProtectedTest extends AnyFlatSpec with Matchers {
  "MathUtils" should "compute sum correctly" in {
    val x = 1
    val y = 2
    val mathUtilsProtected = new MathUtilsTestClass()
    mathUtilsProtected.sum(x, y) shouldBe (x + y) withClue s"Sum is not is not ${x + y}"
  }

}

Summary

Now you can test the different types of methods in your Scala project: public, private, and protected. For more information about Scala, functional programming, and style, feel free to ask us or check out our other pills!

About Qbeast
Qbeast is here to simplify the lives of the Data Engineers and make Data Scientists more agile with fast queries and interactive visualizations. For more information, visit qbeast.io
© 2020 Qbeast. All rights reserved.
Share:

Back to menu

Continue reading

Read from public S3 bucket with Spark

S3 Hadoop Compatibility

Trying to read from public Amazon S3 object storage with Spark can cause many errors related to Hadoop versions.

Here are some tips to configure your spark application.

Spark Configuration

To read the S3 public bucket, you need to start a spark-shell with version 3.1.1 or superior and Hadoop dependencies of 3.2.

If you have to update the binaries to a compatible version to use this feature, follow these steps:

  • Download spark tar from the repository
$ > wget https://archive.apache.org/dist/spark/spark-3.1.1/spark-3.1.1-bin-hadoop3.2.tgz
  • Decompress the files
$ > tar xzvf spark-3.1.1-bin-hadoop3.2.tgz
  • Update the SPARK_HOME environment variable
$ > export SPARK_HOME=$PWD/spark-3.1.1-bin-hadoop3.2

Once you have your spark ready to execute, the following configuration must be used:

$ > $SPARK_HOME/bin/spark-shell \
--conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider \ 
--packages com.amazonaws:aws-java-sdk:1.12.20,\
		org.apache.hadoop:hadoop-common:3.2.0,\
    org.apache.hadoop:hadoop-client:3.2.0,\
    org.apache.hadoop:hadoop-aws:3.2.0

The  org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider  provides Anonymous credentials in order to access the public S3.

And to read the file:

val df = spark
.read
.format("parquet")
.load("s3a://qbeast-public-datasets/store_sales")

Summary

There’s no known working version of Hadoop 2.7 for AWS S3. However, you can try to use it. If you do so, remember to include the following option:

--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
About Qbeast
Qbeast is here to simplify the lives of the Data Engineers and make Data Scientists more agile with fast queries and interactive visualizations. For more information, visit qbeast.io
© 2020 Qbeast. All rights reserved.
Share:

Back to menu

Continue reading

Code Formatting with Scalafmt

Whether you are starting a Scala project or collaborating in one, here, you have a guide to know the most used frameworks for improving the code style.

Scalastyle and Scalafmt

Scalastyle is a handy tool for coding style in Scala, similar to what Checkstyle does in Java. Scalafmt formats code to look consistent between people on your team, and it is perfectly integrated into your toolchain.

Installation

For the installation, you need to add the following to the plugins.sbt file under the project folder.

addSbtPlugin("org.scalameta" % "sbt-scalafmt" % "2.4.2") 
addSbtPlugin("org.scalastyle" %% "scalastyle-sbt-plugin" % "1.0.0")

This will create a Scalastyle configuration under scalastyle_config.xml. And a file .scalafmt.conf where you can write rules to maintain consistency across the project.

For example:

# This style is copied from
# <https://github.com/apache/spark/blob/master/dev/.scalafmt.conf> version = "2.7.5"
align = none
align.openParenDefnSite = false
align.openParenCallSite = false
align.tokens = [] 
optIn = { 
  configStyleArguments = false 
} 
danglingParentheses = false 
docstrings = JavaDoc 
maxColumn = 98 
newlines.topLevelStatements = [before,after]

Quickstart

When opening a project that contains a .scalafmt.conf file, you will be prompted to use it:

Choose the scalafmt formatter, and it will be used at compile-time for formatting files.

However, you can check it manually with:

sbt scalastyle

Another exciting feature is that you can configure your IDE to reformat at saving:

Alternatively, force code formatting:

sbt scalafmt # Format main sources 

sbt test:scalafmt # Format test sources 

sbt scalafmtCheck # Check if the scala sources under the project have been formatted 

sbt scalafmtSbt # Format *.sbt and project /*.scala files 

sbt scalafmtSbtCheck # Check if the files have been formatted by scalafmtSbt

More tricks

Scaladocs

Sbt also checks the format of the Scala docs when publishing the artifacts. The following command will check and generate the Scaladocs:

sbt doc

Header Creation

Sometimes a header must be present in all files. You can do so by using this plugin: https://github.com/sbt/sbt-header

First, add it in the plugins.sbt:

addSbtPlugin("de.heikoseeberger" % "sbt-header" % "5.6.0")

Include the header you want to show in your build.sbt

headerLicense := Some(HeaderLicense.Custom("Copyright 2021 Qbeast Pills"))

And use it in compile time with:

Compile / compile := (Compile / compile).dependsOn(Compile / headerCheck).value

To automatize the creation of headers in all files, execute:

sbt headerCreate

Using println

Scalafmt has strong policies on print information. And we all debug like this now and then.

The quick solution is to wrap your code:

// scalastyle:off println
<your beautiful piece of code>
// scalastyle:on println

But make sure you delete these comments before pushing any commits 😉

About Qbeast
Qbeast is here to simplify the lives of the Data Engineers and make Data Scientists more agile with fast queries and interactive visualizations. For more information, visit qbeast.io
© 2020 Qbeast. All rights reserved.
Share:

Back to menu

Continue reading

Create awesome GIFs from a terminal: Nice-looking animations with Terminalizer

Have you ever wanted to generate cool GIFs from a terminal output? Do you want to have fancy animations to show some code snippets?

Using terminalizer, you will be able to create fantastic animations by following this simple guide!

The solution

1. First, you need to install NodeJS v12.21.0 (LTS) from https://nodejs.org/download/release/v12.21.0/. Other versions may not be compatible with terminalizer, and you might have problems when using it.

2. After that, install terminalizer globally by using the command:

npm install -g terminalizer

Now you can use the following commands to record, play and share a GIF:

# Start recording a demo in a file called my_demo.yml
terminalizer record my_demo

# Now run the commands you want to appear in the GIF.
# When you have finished, press Ctrl+D (⌘+D) to stop recording.

# You can play the demo you just recorded by using the play option.te
terminalizer play my_demo.yml

# At this point you can customize several things,
# check the "🌟 Pro tips" section below.

# If you're happy with the result, render the GIF from your YML
# file. This will create a file in your current directory.
terminalizer render my_demo.yml

🌟 Pro tips: You can edit and customize your GIF before rendering it by modifying the content of the .yml file.

  • For example, you can change the colours or the style by changing the theme and the frameBox objects in the YML:
theme:
    background: "#28225C"
(...)
frameBox:
    type: floating
    title: Terminalizer Rocks!
  • You can also edit the content and the timing of the output by modifying the records object:
# Records, feel free to edit them
records:
  - delay: 50
    content: "\e[35mErics-MacBook-Air\e[0m:~ eric$ "

So with a few modifications, we can get something like this:

You can find more customization options and tips at the original repo on GitHub: https://github.com/faressoft/terminalizer

Summary

Terminalizer is a beautiful and easy-to-use tool to create GIFs from your console/terminal output. It allows you to create GIFs by recording a session and customizing everything as desired in a simple YML format, perfect for newcomers to use it.

About Qbeast
Qbeast is here to simplify the lives of the Data Engineers and make Data Scientists more agile with fast queries and interactive visualizations. For more information, visit qbeast.io
© 2020 Qbeast. All rights reserved.
Share:

Back to menu

Continue reading

Contact us info@qbeast.io

C/ Roc Boronat 117, 2a Planta, 08018 Barcelona

© 2020 Qbeast
Design by Xurris