Welcome to my blog, where I share hands-on guides and war-stories around. Browse the latest posts below, or filter by tags and series.

Spark and Data Sketches: Taming count-distinct

Distinct counting is a commonly used metric in trend analysis to measure popularity or performance. Although it may seem like a simple problem, the challenge quickly grows as the amount of data grows. Counting the exact number of distinct values can consume a significant amount of resources while taking a long time even when using a parallelized processing engine. To address this challenge, you can use probabilistic algorithms [Read More]

Common Pitfalls to Avoid When Publishing Artifacts on Maven Central

Are you planning to publish your first artifact to Maven Central and make it available to a wider audience? Congratulations, you’re taking an important step in contributing to the open-source community! However, the process may not be as straightforward as you expect. In this post, we’ll go over some common pitfalls to avoid to make your publishing experience smoother. [Read More]