Get ahead of the curve with the latest insights, trends, and analysis in the tech world.
I am sure the quantum hype has reached every person in tech (and outside it, most probably). With some over-the-top claims, like “some company has proved quantum supremacy,” “the quantum revolution is here,” or my favorite, “quantum computers are here, and it will make classical computers obsolete.” I am going to be honest with you; […] The post Should Data Scientists Care About Quantum Computing? appeared first on...
Let’s say you are in a customer care center, and you would like to know the probability distribution of the number of calls per minute, or in other words, you want to answer the question: what is the probability of receiving zero, one, two, … etc., calls per minute? You need this distribution in order […] The post Method of Moments Estimation with Python Code appeared first on Towards Data Science.
The basic principle of Large Language Models (LLMs) is very simple: to predict the next word (or token) in a sequence of words based on statistical patterns in their training data. However, this seemingly simple capability turns out to be incredibly sophisticated when it can do a number of amazing tasks such as text summarization, […] The post How to Measure the Reliability of a Large Language Model’s Response appeared...
Introduction Developers work on applications that are supposed to be deployed on some server in order to allow anyone to use those. Typically in the machine where these apps live, developers set up environment variables that allow the app to run. These variables can be API keys of external services, URL of your database and […] The post Manage Environment Variables with Pydantic appeared first on Towards Data Science.
Python has grown to dominate data science, and its package Pandas has become the go-to tool for data analysis. It is great for tabular data and supports data files of up to 1GB if you have a large RAM. Within these size limits, it is also good with time-series data because it comes with some […] The post Pandas Can’t Handle This: How ArcticDB Powers Massive Datasets appeared first on Towards Data Science.
It’s been more than 15 years since I finished my master’s degree, but I’m still haunted by the hair-pulling frustration of managing my ofRscripts. As a (recovering) perfectionist, I named each script very systematically by date (think:ancova_DDMMYYYY.r). A system I just *knew* was better than_v1,_v2,_finaland its frenemies. Right? Trouble was, every time I wanted to […] The post Branching Out: 4 Git Workflows for...
Want to make the most out of large language models? Check out these prompting techniques you can start using today.
Discover freelancing platforms that care about you, not just your money, offering low commission rate, better policies, and higher earning potential.
Large language models (LLMs) have evolved and permeated our lives so much and so quickly that many we have become dependent on them in all sorts of scenarios.
Decision tree algorithms have always fascinated me. They are easy to implement and achieve good results on various classification and regression tasks. Combined with boosting, decision trees are still state-of-the-art in many applications. Frameworks such as sklearn, lightgbm, xgboost and catboost have done a very good job until today. However, in the past few months, […] The post Build a Decision Tree in Polars from...
Virtualization makes it possible to run multiple virtual machines (VMs) on a single piece of physical hardware. These VMs behave like independent computers, but share the same physical computing power. A computer within a computer, so to speak. Many cloud services rely on virtualization. But other technologies, such as containerization and serverless computing, have become […] The post Virtualization & Containers...
Bubble charts elegantly compress large amounts of information into a single visualization, with bubble size adding a third dimension. However, comparing “before” and “after” states is often crucial. To address this, we propose adding a transition between these states, creating an intuitive user experience. Since we couldn’t find a ready-made solution, we developed our own. […] The post 4-Dimensional Data Visualization:...
How Reliable Are Your Predictions? About To be considered reliable, a model must be calibrated so that its confidence in each decision closely reflects its true outcome. In this blog post we’ll take a look at the most commonly used definition for calibration and then dive into a frequently used evaluation measure for model calibration. […] The post Understanding Model Calibration: A Gentle Introduction & Visual...
There seems to be a consensus that leveraging data, analytics, and AI to create a data-driven organization requires a clear strategic approach. However, there is less clarity and agreement on exactly what this strategic approach should look like in practice. This article provides a short overview of what strategy work I believe is required to […] The post Data vs. Business Strategy appeared first on Towards Data...
Overview Introduction — Purpose and Reasons Speed is important when dealing with large amounts of data. If you are handling data in a cloud data warehouse or similar, then the speed of execution for your data ingestion and processing affects the following: As you’ve probably understood from the title, I am going to provide a […] The post Polars vs. Pandas — An Independent Speed Comparison appeared first on Towards Data...
Before we begin, let's make sure you're in the right place.
This article will explore initiating the RAG system and making it fully voice-activated.
In this article, I will introduce you to 10 little-known Python libraries every data scientist should know.
Machine learning (ML) is considered the largest subarea of artificial intelligence (AI) , studying the development of software systems that learn from data by themselves to perform a task, without being explicitly programmed with the instructions to address it.
Stable Diffusion 1.5/2.0/2.1/XL 1.0, DALL-E, Imagen… In the past years, diffusion models have showcased stunning quality in image generation. However, while producing great quality on generic concepts, these struggle to generate high quality for more specialised queries, for example generating images in a specific style, that was not frequently seen in the training dataset. We […] The post Six Ways to Control Style and...
Subqueries are popular tools for more complex data manipulation in SQL. If you’re a beginner on a quest to understand subqueries, this is the article for you.
An analysis and discussion of the data science tools expected to gain prominence throughout the present year, and why.
Learn the easiest way to use a state-of-the-art Google experimental model locally.
LangChain is a robust framework conceived to simplify the developing of LLM-powered applications — with LLM, of course, standing for large language model.
Which Outcome Matters? Here is a common scenario : An A/B test was conducted, where a random sample of units (e.g. customers) were selected for a campaign and they received Treatment A. Another sample was selected to receive Treatment B. “A” could be a communication or offer and “B” could be no communication or no […] The post The Gamma Hurdle Distribution appeared first on Towards Data Science.
Accurate impact estimations can make or break your business case. Yet, despite its importance, most teams use oversimplified calculations that can lead to inflated projections. These shot-in-the-dark numbers not only destroy credibility with stakeholders but can also result in misallocation of resources and failed initiatives. But there’s a better way to forecast effects of gradual […] The post Triangle Forecasting:...
Recently, DeepSeek announced their latest model, R1, and article after article came out praising its performance relative to cost, and how the release of such open-source models could genuinely change the course of LLMs forever. That is really exciting! And also, too big of a scope to write about… but when a model like DeepSeek […] The post I Tried Making my Own (Bad) LLM Benchmark to Cheat in Escape Rooms appeared...
The surge of AI in general — and large language models (LLMs) in particular — is thanks to numerous research groups and companies racing to develop their most advanced models and demonstrate their potential use cases across broad domains.
Popularity of RAG Over the past two years while working with financial firms, I’ve observed firsthand how they identify and prioritize Generative AI use cases, balancing complexity with potential value. Retrieval-Augmented Generation(RAG) often stands out as a foundational capability across many LLM-driven solutions, striking a balance between ease of implementation and real-world impact. By combining […] The post...
Time series forecasting helps predict future data using past information, useful in areas like finance, weather, and inventory.
Audio processing is one of the most important application domains of digital signal processing (DSP) and machine learning. Modeling acoustic environments is an essential step in developing digital audio processing systems such as: speech recognition, speech enhancement, acoustic echo cancellation, etc. Acoustic environments are filled with background noise that can have multiple sources. For example, […] The post The...
While building my own LLM-based application, I found many prompt engineering guides, but few equivalent guides for determining the temperature setting. Of course, temperature is a simple numerical value while prompts can get mindblowingly complex, so it may feel trivial as a product decision. Still, choosing the right temperature can dramatically change the nature of […] The post A Comprehensive Guide to LLM...
Tools I use for coding, writing, grammar improvement, research, machine learning experiments, and organizing projects.
As tech layoffs increase, data scientists must adapt. Here's how to safeguard your data science job in 2025.
Check out this practical guide to building multilingual applications with Hugging Face.