Get ahead of the curve with the latest insights, trends, and analysis in the tech world.
Hands-on approach from chunked processing to parallel executionContinue reading on Towards Data Science »
Curious about what LLMs are and want to know about them? Explore the Full Guide Right Here, Right Now!
As AI becomes further enmeshed into every product we use, what rules should exist to protecthumans?What rules should AI profiles play by? Screenshot by James Barney, 3 January2025.IntroductionThis post explores and analyzes AI profiles on Meta’s various platforms. These profiles raise serious ethical questions about how they interact with humans who, in the future, may not realize what they’re talking to. By...
Speed Up PyTorch with CustomKernelsWe’ll begin with torch.compile, move on to writing a custom Triton kernel, and finally dive into designing a CUDAkernelRead for free at alexdremov.mePyTorch offers remarkable flexibility, allowing you to code complex GPU-accelerated operations in a matter of seconds. However, this convenience comes at a cost. PyTorch executes your code sequentially, resulting in suboptimal...
A hands-on alternative to Google’s CausalImpactPhoto by Vedrana Filipović onUnsplashWhat is the impact of my last advertising campaign? What are the long-term costs of Brexit? How much has I gained in my new pricing strategy? All these questions are commonly asked of data scientists and other data practitioners (maybe not the one on Brexit, but it is interesting nonetheless). It makes sense because stakeholders are...
Using Clustering Algorithms to Handle Missing Time-Series DataContinue reading on Towards Data Science »
Everything you need to know to get started with text miningContinue reading on Towards Data Science »
Why Bayesian A/B testing can lead to misunderstandings, inflated false positive rates, introduce bias and complicate results(Image generated by the author using Midjourney)Over the past decade, I’ve engaged in countless discussions about Bayesian A/B testing versus Frequentist A/B testing. In nearly every conversation, I’ve maintained the same viewpoint: there’s a significant disconnect between the industry’s...
With the help of an intricate geometric construction, we can prove that instance-wise cost functions quickly drive SVC to infinity.In the previous article in this series, we examined the concept of strategic VC dimension (SVC) and its connection to the Fundamental Theorem of Strategic Learning. We will make use of both of those in this article, alongside the ideas of achievable labelings and strategic shattering...
In this article, I’m going to share data science project ideas that will actually help you stand out. These are creative projects that solve problems with data, and I’ve included source code and tutorials to help you replicate them.
Training large language models (LLMs) is an involved process that requires planning, computational resources, and domain expertise.
Part 5: Increasing LP flexibility to handle tricky logicContinue reading on Towards Data Science »
How can numerical user metrics, such as “3 visits in the past week,” be transformed into a personalized assessment of whether this behavior is typical or unusual for theuser?Cover, image byAuthorIn almost any digital product, analysts often face the challenge of building a digital customer profile—a set of parameters that describe the customer’s state and behavior in one way oranother.What are the potential...
This article shows how to use Great Expectations to check data quality in data science projects.
Examples of custom callbacks and custom fine-tuning code from different librariesContinue reading on Towards Data Science »
The interplay between ownership, outsourcing, and remoteworkAs we enter 2025, artificial intelligence (AI) is taking center stage at companies across industries. Faced with the twin challenges of acting decisively in the short run (or at least appearing to do so to reassure various stakeholders) and securing a prosperous future for the company in the long run, executives may be compelled to launch strategic AI...
Element-wise operations are a crucial part of data preprocessing in Pandas. Learn how to perform them with practical examples using the DataFrame.map() function.
You can’t afford to remain an AI-ignoramus, even if your product isn’t using anLLMIf you’re a Software Architect, or a Tech Lead, or really anyone senior in tech whose role includes making technical and strategic decisions, and you’re not a Data Scientist or Machine Learning expert, then the likelihood is that Generative AI and Large Language Models (LLMs) were new to you back in2023.AI was certainly new tome.We all...
Even with zero math backgroundPhoto by Antoine Dautry onUnsplashDo you want to become a Data Scientist or machine learning engineer, but you feel intimidated by all the math involved? I get it. I’ve beenthere.I dropped out of High School after 10th grade, so I never learned any math beyond trigonometry in school. When I started my journey into Machine Learning, I didn’t even know what a derivative was.Fast forward to...
A deep dive into the world of computational modeling and its applicationsContinue reading on Towards Data Science »
Tracing the roots of ChatGPT: GPT-1, the foundation of OpenAI’sLLMs(Image from Unsplash)The GPT (Generative Pre-Training) model family, first introduced by OpenAI in 2018, is another important application of the Transformer architecture. It has since evolved through versions like GPT-2, GPT-3, and InstructGPT, eventually leading to the development of OpenAI’s powerfulLLMs.In other words: understanding GPT models is...
Integration architecture focusing on security and accesscontrolConnecting Compute—image by Alexandre Debiève onUnsplash1. IntroductionMicrosoft Fabric and Azure Databricks are both powerhouses in the data analytics field. These platforms can be used end-to-end in a medallion architecture, from data ingestion to creating data products for end users. Azure Databricks excels in the initial stages due to its strength in...
Create a comprehensive AI agent from the ground up utilizing LangChain and DuckDBContinue reading on Towards Data Science »
Solving the issue of having missing data in the variables for sampling designContinue reading on Towards Data Science »
How CDC tools use MySQL Binlog and PostgreSQL WAL with logical decoding for real-time data streamingPhoto by Matoo.Studio onUnsplashCDC (Change Data Capture) is a term that has been gaining significant attention over the past few years. You might already be familiar with it (if not, don’t worry—there’s a quick introduction below). One question that puzzled me, though, was how tools like the Debezium CDC connectors can...
Put a real-world object into fully AI-generated 4D scenes with minimal effort, so that it can star in yourvideos.The three steps of consistent video creation usingGenAI.Progress in generative AI (GenAI) is astonishingly fast. It’s becoming more mature in various text-driven tasks, going from typical natural language processing (NLP) to independent AI agents, capable of performing high-level tasks by themselves....
Building idempotent and re-playable data pipelinesContinue reading on Towards Data Science »
What you need to know, best practices, and where you can practice your skillsContinue reading on Towards Data Science »
Do you want to learn data wrangling with Python on a budget? No worries, there are (at least) five free courses that’ll provide you with solid knowledge.
Understanding the latest project to build speech-to-speech with open source technologies.
Simple concepts that differentiate a professional from amateursContinue reading on Towards Data Science »
We have to draw the line somewherePhoto by Siora Photography onUnsplashIt’s become something of a meme that statistical significance is a bad standard. Several recent blogs have made the rounds, making the case that statistical significance is a “cult” or “arbitrary.” If you’d like a classic polemic (and who wouldn’t?), check out: https://www.deirdremccloskey.com/docs/jsm.pdf.This little essay is a defense of the...
I’ll set the record straight—AI Agents are not new but advanced. Learn how they’ve evolved and where to get started.Continue reading on Towards Data Science »
Using Seattle’s local retail store data for consumer patterns of the lottery (SQL, Python)Continue reading on Towards Data Science »
With large language model (LLM) products such as ChatGPT and Gemini taking over the world, we need to adjust our skills to follow the trend.