How the Medallion Model Unifies Batch and Streaming Data Processing

·October 28, 2025

·10 min read

How the Medallion Model Unifies Batch and Streaming Data Processing — Image Source: pexels

You can have big problems when you handle both batch and streaming data. Many companies have trouble with things like too much complexity, data not matching, and slow updates. Here are some common problems:

Challenge	Example
Complexity	A logistics company runs many batch jobs to sync inventory each day.
Data Inconsistencies	A manufacturer sees different production numbers in different analytics systems.
High Latency	A rideshare platform uses data that is a day old to calculate incentives.

The Medallion Model helps fix these problems by using layers. You keep raw data in the Bronze layer. You clean and improve it in the Silver layer. You make trusted data products in the Gold layer. This setup works for both batch and streaming data. Your data team gets better scaling, stronger matching, and more flexibility.

Key Takeaways

The Medallion Model puts data into three groups. Bronze is for raw data. Silver is for cleaning and making data better. Gold is for data that is ready for business use.
The Medallion Model makes things less confusing. It helps make data better. This makes it easier to trust your results.
You can use the Medallion Model for batch data. You can also use it for streaming data. This gives you more ways to work with data.
Add strong data checks at each group. This keeps data good and the same. It stops mistakes from hurting your results.
Pick the best tools for each group in the Medallion Model. This helps you work with data better and reach your business goals.

Medallion Model Layers

Bronze, Silver, Gold Overview

The Medallion Model has three layers. Each layer has its own job. These layers help you organize your data in steps. Here is a simple look at what each layer does:

Layer	Purpose	Key Characteristics	Use Cases
Bronze	System of record, storing raw data	Keeps original data safe, only adds new data, works with many types	Checking data, fixing mistakes, exploring
Silver	Standardizes and enriches data	Cleans up data, removes copies, uses rules for data	Main source for answers, helps with analysis
Gold	Delivers business-ready data products	Puts data together, uses models, makes things run faster	Reports, machine learning, full views

First, you put raw data in Bronze. Next, you clean and change it in Silver. Last, you use Gold for trusted reports and answers.

Layered Data Quality and Organization

Every layer in the Medallion Model makes your data better. Bronze keeps data safe and does not change it. Silver looks for mistakes and fixes them using rules. Gold gives you data that is ready for business. This table shows how each layer helps with quality:

Layer	Purpose	Data Quality Improvement
Bronze	Captures raw data	Saves data just as it is
Silver	Cleans and transforms data	Finds mistakes and fixes them
Gold	Aggregates and enriches data for business use	Gives you great data that is ready to use

Your data gets better as it moves from Bronze to Gold. This setup helps you trust your results.

Supporting Batch and Streaming

The Medallion Model works with batch and streaming data. You can put real-time streams or big batches in Bronze. Silver works on data as soon as it comes in, so you always have fresh data. Gold lets you run fast reports and searches on new data.

The model makes data better as it goes through each layer.
You can use Lambda for old data or Kappa for live streams.
Every layer works with both ways, so you do not need two systems.

Tip: Pick the way that fits your needs best. The Medallion Model gives you choices and keeps your data neat.

Unified Data Flow

Ingesting Batch and Streaming Data

You can use the Medallion Model for batch and streaming data. This helps you work with many kinds of data sources. Some data comes in big groups. Other data arrives all the time. The Medallion Model lets you pick how to bring in data. You can choose what fits your needs.

Here is a table that shows ways to bring in data:

Ingestion Method	Cost	Latency	Examples
Continuous incremental ingestion	Higher	Lower	Streaming Table using spark.readStream to ingest from cloud storage or message bus.
Triggered incremental ingestion	Lower	Higher	Streaming Table ingesting from cloud storage or message bus using spark.readStream.
Batch ingestion with manual incremental	Lower	Highest	Streaming Table ingest from cloud storage using spark.read.

Continuous ingestion is good for real-time needs. It gives you new data fast but costs more. Triggered or manual ingestion is slower and saves money. The Medallion Model lets you use both ways. You can mix them to get the speed and cost you want.

Tip: Pick the way to bring in data that helps your business. You can change it later if you need to.

Data Processing Across Layers

After you bring in data, you move it through layers. Each layer has a job. First, you keep raw data. Next, you clean it. Last, you get it ready for business. This works for batch and streaming data.

The table below shows how data moves through layers:

Layer	Purpose	Process Description	Key Benefits
Bronze	Ingestion and Raw Data Storage	Data is stored in its raw form in Google Cloud Storage, including unprocessed logs.	Retains original data for traceability, enabling historical analysis and debugging.
Silver	Data Cleansing and Validation	Data is cleansed and validated in BigQuery or Delta Lake, applying business rules.	Ensures data consistency and usability through schema enforcement and deduplication.
Gold	Business Intelligence and Aggregation	High-quality data is prepared for reporting and machine learning in Databricks.	Provides structured datasets for dashboards and business decision-making.

You can move data from Bronze to Silver right away. This keeps your data fresh for analysis. You can also process big batches when needed. The Medallion Model uses the same steps for batch and streaming data. This makes it easy to manage all your data.

ACID Transactions and Metadata

You need strong rules to keep your data safe. The Medallion Model uses ACID transactions for this. ACID stands for Atomicity, Consistency, Isolation, and Durability. These rules help your data stay correct, even when many people use it.

Here is what each part of ACID means:

Aspect	Description
Atomicity	All parts of a transaction finish together or not at all.
Consistency	Each transaction keeps the database in a good state.
Isolation	Transactions run alone and do not mess up each other.
Durability	When a transaction is done, the data stays safe even if something breaks.

ACID transactions keep your data safe for batch and streaming writes.
They help you change your data structure when your data grows or changes.
You can run queries from different engines and keep your data good.

You also get strong metadata support. Metadata helps you know where your data comes from and how it changes. This helps you trust your results and fix problems.

Note: ACID transactions and good metadata help you trust your data, no matter how fast or slow it comes in.

Implementation and Best Practices

Designing Unified Pipelines

You can make a strong data pipeline by following easy steps. The Medallion Model helps you organize your work and keeps your data safe. Here is a simple way to set up your pipeline:

Put your data in the bronze bucket. Make a bronze and silver area for your data. Bring your data into bronze by making an ingestion branch, uploading your data, and joining it with the main branch.
Read from the bronze bucket and change your data in the silver bucket. Keep track of where your data comes from by saving the commit ID.
(Optional) Send your final dataset to a gold bucket. This step lets you share trusted data with your business.
(Optional) Make separate places for production, testing, and quality checks. This keeps your work safe and neat.

Tip: You can switch from old data systems to the Medallion Model step by step. This helps your team trust and use your data more easily.

Tools and Technologies

You have many choices for tools when you build your pipeline. The right tools help you work with batch and streaming data. Here is a table that shows what works best at each layer:

Medallion Layer	Workload Type	Best Processing Method
Bronze	Ingestion, large data sizes	Streaming processing
Silver	Transformation, mixed loads	Batch or streaming
Gold	Aggregation, small data	Batch processing

You can use these tools to build your pipeline:

Cloud Platform: Google Cloud Platform
Storage: Google Cloud Storage or Amazon S3
Processing Engine: Apache Spark with PySpark
Workflow Orchestration: Apache Airflow
Data Lake Format: Delta Lake
Data Warehouse: BigQuery or Amazon Redshift

Note: Not all tools work the same way with every layer. Pick the ones that fit your needs and budget.

Monitoring and Optimization

You need to watch your pipeline to keep it working well. Here are some best practices:

Add checks for data quality. Use rules to make sure your data is right.
Watch your pipeline’s speed. Use tools like Spark UI or Snowflake Query History to find slow spots.
Set up error alerts. Use retry steps and alerts to fix problems fast.

If you follow these steps, you can keep your data fresh and trusted. The Medallion Model gives you a clear way to better data for your business.

Challenges and Solutions

Common Pitfalls

When you use the Medallion Model, you can run into problems. Each layer changes your data. This can hide mistakes or make them harder to spot. Here are some issues you might see:

Data checks sometimes miss business needs because earlier layers do not have enough information.
If upstream data changes, the bronze layer can break. This can make reports show wrong results.
Moving data between layers too much can slow things down and cost more money.
The step-by-step flow can cause slowdowns when your business needs change fast.
Data made for analytics is not always ready for operations. This means you cannot reuse it easily.
The strict setup can have trouble with real-time data. This can cause delays or extra work.

Tip: You can stop many problems by planning your data flow well and making sure each layer has a clear job.

Data Quality and Schema Evolution

You need strong checks to keep your data clean and useful. Many teams add extra layers to check data quality and schema rules. These layers act like checkpoints. They make sure your data follows rules before moving on. Schema checks see if new data fits the right format and help handle changes over time.

Some teams forget to check schemas. This can let bad data into your lake. For streaming data, new columns get added by themselves. Deleted columns stay but get set to null. If you change a data type, streaming systems save the data in a special column. For batch data, you must update the schema by hand. If you do not, your pipeline might break.

Challenge Type	Streaming Data Handling	Batch Data Handling
New Columns	Added automatically	Needs manual update
Deleted Columns	Set to Null, not removed	Needs manual update
Data Type Changes	Rescued in special column	May cause failure if not handled

Adoption Tips

You can help your team do well with the Medallion Model by using these best practices:

Treat your transformation logic like a product. Keep it versioned and write good notes.
Use metadata and automation. This makes your pipelines easier to run.
Build pipelines that can handle changes and run in small steps.
Match your team’s jobs to the data layers so everyone knows their role.

Regular meetings help teams find problems early. Training together builds both tech and business skills. Good communication keeps everyone working together. For example, when a retail company let teams own product data, they made fewer mistakes and worked better together. Working with experts also helps data quality and team spirit.

Note: Good planning, teamwork, and training help everyone use the Medallion Model better.

You can fix the problem of using batch and streaming data with the Medallion Model. This way uses layers to help your data grow and stay the same. It lets you use cloud tools and work on many things at once. You get better data, faster searches, and clear rules for your data.

Benefit	Impact
Data Quality	You get good answers from each layer
Scalability	It works with lots of data and keeps growing
Query Efficiency	You get answers faster with ready-to-use data

To begin, build strong Silver and Gold layers. Test your pipelines and make them work better. This helps your team finish work faster and get better results for your business.

FAQ

What is the main benefit of the Medallion Model?

You get a simple way to organize your data. The model helps you keep raw, clean, and business-ready data separate. This makes your work easier and your results more reliable.

Can you use the Medallion Model for real-time data?

Yes, you can. The Medallion Model lets you handle both streaming and batch data. You can process new data as it arrives or work with large groups of data.

How do you keep data quality high in each layer?

You add checks and rules at every step. For example:

Bronze: Save raw data.
Silver: Clean and fix errors.
Gold: Make sure data is ready for business.

Tip: Always test your data before moving it to the next layer.

What tools work best with the Medallion Model?

You can use many tools. Popular choices include:

Layer	Tool Example
Bronze	Apache Spark
Silver	Delta Lake
Gold	BigQuery

Pick tools that fit your needs and budget.