
You can have big problems when you handle both batch and streaming data. Many companies have trouble with things like too much complexity, data not matching, and slow updates. Here are some common problems:
Challenge | Example |
|---|---|
Complexity | A logistics company runs many batch jobs to sync inventory each day. |
Data Inconsistencies | A manufacturer sees different production numbers in different analytics systems. |
High Latency | A rideshare platform uses data that is a day old to calculate incentives. |
The Medallion Model helps fix these problems by using layers. You keep raw data in the Bronze layer. You clean and improve it in the Silver layer. You make trusted data products in the Gold layer. This setup works for both batch and streaming data. Your data team gets better scaling, stronger matching, and more flexibility.
The Medallion Model puts data into three groups. Bronze is for raw data. Silver is for cleaning and making data better. Gold is for data that is ready for business use.
The Medallion Model makes things less confusing. It helps make data better. This makes it easier to trust your results.
You can use the Medallion Model for batch data. You can also use it for streaming data. This gives you more ways to work with data.
Add strong data checks at each group. This keeps data good and the same. It stops mistakes from hurting your results.
Pick the best tools for each group in the Medallion Model. This helps you work with data better and reach your business goals.

The Medallion Model has three layers. Each layer has its own job. These layers help you organize your data in steps. Here is a simple look at what each layer does:
Layer | Purpose | Key Characteristics | Use Cases |
|---|---|---|---|
Bronze | System of record, storing raw data | Keeps original data safe, only adds new data, works with many types | Checking data, fixing mistakes, exploring |
Silver | Standardizes and enriches data | Cleans up data, removes copies, uses rules for data | Main source for answers, helps with analysis |
Gold | Delivers business-ready data products | Puts data together, uses models, makes things run faster | Reports, machine learning, full views |
First, you put raw data in Bronze. Next, you clean and change it in Silver. Last, you use Gold for trusted reports and answers.
Every layer in the Medallion Model makes your data better. Bronze keeps data safe and does not change it. Silver looks for mistakes and fixes them using rules. Gold gives you data that is ready for business. This table shows how each layer helps with quality:
Layer | Purpose | Data Quality Improvement |
|---|---|---|
Bronze | Captures raw data | Saves data just as it is |
Silver | Cleans and transforms data | Finds mistakes and fixes them |
Gold | Aggregates and enriches data for business use | Gives you great data that is ready to use |
Your data gets better as it moves from Bronze to Gold. This setup helps you trust your results.
The Medallion Model works with batch and streaming data. You can put real-time streams or big batches in Bronze. Silver works on data as soon as it comes in, so you always have fresh data. Gold lets you run fast reports and searches on new data.
You can use Lambda for old data or Kappa for live streams.
Every layer works with both ways, so you do not need two systems.
Tip: Pick the way that fits your needs best. The Medallion Model gives you choices and keeps your data neat.

You can use the Medallion Model for batch and streaming data. This helps you work with many kinds of data sources. Some data comes in big groups. Other data arrives all the time. The Medallion Model lets you pick how to bring in data. You can choose what fits your needs.
Here is a table that shows ways to bring in data:
Ingestion Method | Cost | Latency | Examples |
|---|---|---|---|
Continuous incremental ingestion | Higher | Lower | Streaming Table using spark.readStream to ingest from cloud storage or message bus. |
Triggered incremental ingestion | Lower | Higher | Streaming Table ingesting from cloud storage or message bus using spark.readStream. |
Batch ingestion with manual incremental | Lower | Highest | Streaming Table ingest from cloud storage using spark.read. |
Continuous ingestion is good for real-time needs. It gives you new data fast but costs more. Triggered or manual ingestion is slower and saves money. The Medallion Model lets you use both ways. You can mix them to get the speed and cost you want.
Tip: Pick the way to bring in data that helps your business. You can change it later if you need to.
After you bring in data, you move it through layers. Each layer has a job. First, you keep raw data. Next, you clean it. Last, you get it ready for business. This works for batch and streaming data.
The table below shows how data moves through layers:
Layer | Purpose | Process Description | Key Benefits |
|---|---|---|---|
Bronze | Ingestion and Raw Data Storage | Data is stored in its raw form in Google Cloud Storage, including unprocessed logs. | Retains original data for traceability, enabling historical analysis and debugging. |
Silver | Data Cleansing and Validation | Data is cleansed and validated in BigQuery or Delta Lake, applying business rules. | Ensures data consistency and usability through schema enforcement and deduplication. |
Gold | Business Intelligence and Aggregation | High-quality data is prepared for reporting and machine learning in Databricks. | Provides structured datasets for dashboards and business decision-making. |
You can move data from Bronze to Silver right away. This keeps your data fresh for analysis. You can also process big batches when needed. The Medallion Model uses the same steps for batch and streaming data. This makes it easy to manage all your data.
You need strong rules to keep your data safe. The Medallion Model uses ACID transactions for this. ACID stands for Atomicity, Consistency, Isolation, and Durability. These rules help your data stay correct, even when many people use it.
Here is what each part of ACID means:
Aspect | Description |
|---|---|
Atomicity | All parts of a transaction finish together or not at all. |
Consistency | Each transaction keeps the database in a good state. |
Isolation | Transactions run alone and do not mess up each other. |
Durability | When a transaction is done, the data stays safe even if something breaks. |
ACID transactions keep your data safe for batch and streaming writes.
They help you change your data structure when your data grows or changes.
You can run queries from different engines and keep your data good.
You also get strong metadata support. Metadata helps you know where your data comes from and how it changes. This helps you trust your results and fix problems.
Note: ACID transactions and good metadata help you trust your data, no matter how fast or slow it comes in.
You can make a strong data pipeline by following easy steps. The Medallion Model helps you organize your work and keeps your data safe. Here is a simple way to set up your pipeline:
Put your data in the bronze bucket. Make a bronze and silver area for your data. Bring your data into bronze by making an ingestion branch, uploading your data, and joining it with the main branch.
Read from the bronze bucket and change your data in the silver bucket. Keep track of where your data comes from by saving the commit ID.
(Optional) Send your final dataset to a gold bucket. This step lets you share trusted data with your business.
(Optional) Make separate places for production, testing, and quality checks. This keeps your work safe and neat.
Tip: You can switch from old data systems to the Medallion Model step by step. This helps your team trust and use your data more easily.
You have many choices for tools when you build your pipeline. The right tools help you work with batch and streaming data. Here is a table that shows what works best at each layer:
Medallion Layer | Workload Type | Best Processing Method |
|---|---|---|
Bronze | Ingestion, large data sizes | Streaming processing |
Silver | Transformation, mixed loads | Batch or streaming |
Gold | Aggregation, small data | Batch processing |
You can use these tools to build your pipeline:
Processing Engine: Apache Spark with PySpark
Workflow Orchestration: Apache Airflow
Data Lake Format: Delta Lake
Data Warehouse: BigQuery or Amazon Redshift
Note: Not all tools work the same way with every layer. Pick the ones that fit your needs and budget.
You need to watch your pipeline to keep it working well. Here are some best practices:
Add checks for data quality. Use rules to make sure your data is right.
Watch your pipeline’s speed. Use tools like Spark UI or Snowflake Query History to find slow spots.
Set up error alerts. Use retry steps and alerts to fix problems fast.
If you follow these steps, you can keep your data fresh and trusted. The Medallion Model gives you a clear way to better data for your business.
When you use the Medallion Model, you can run into problems. Each layer changes your data. This can hide mistakes or make them harder to spot. Here are some issues you might see:
Data checks sometimes miss business needs because earlier layers do not have enough information.
If upstream data changes, the bronze layer can break. This can make reports show wrong results.
Moving data between layers too much can slow things down and cost more money.
The step-by-step flow can cause slowdowns when your business needs change fast.
Data made for analytics is not always ready for operations. This means you cannot reuse it easily.
The strict setup can have trouble with real-time data. This can cause delays or extra work.
Tip: You can stop many problems by planning your data flow well and making sure each layer has a clear job.
You need strong checks to keep your data clean and useful. Many teams add extra layers to check data quality and schema rules. These layers act like checkpoints. They make sure your data follows rules before moving on. Schema checks see if new data fits the right format and help handle changes over time.
Some teams forget to check schemas. This can let bad data into your lake. For streaming data, new columns get added by themselves. Deleted columns stay but get set to null. If you change a data type, streaming systems save the data in a special column. For batch data, you must update the schema by hand. If you do not, your pipeline might break.
Challenge Type | Streaming Data Handling | Batch Data Handling |
|---|---|---|
New Columns | Added automatically | Needs manual update |
Deleted Columns | Set to Null, not removed | Needs manual update |
Data Type Changes | Rescued in special column | May cause failure if not handled |
You can help your team do well with the Medallion Model by using these best practices:
Treat your transformation logic like a product. Keep it versioned and write good notes.
Use metadata and automation. This makes your pipelines easier to run.
Build pipelines that can handle changes and run in small steps.
Match your team’s jobs to the data layers so everyone knows their role.
Regular meetings help teams find problems early. Training together builds both tech and business skills. Good communication keeps everyone working together. For example, when a retail company let teams own product data, they made fewer mistakes and worked better together. Working with experts also helps data quality and team spirit.
Note: Good planning, teamwork, and training help everyone use the Medallion Model better.
You can fix the problem of using batch and streaming data with the Medallion Model. This way uses layers to help your data grow and stay the same. It lets you use cloud tools and work on many things at once. You get better data, faster searches, and clear rules for your data.
Benefit | Impact |
|---|---|
Data Quality | You get good answers from each layer |
Scalability | It works with lots of data and keeps growing |
Query Efficiency | You get answers faster with ready-to-use data |
To begin, build strong Silver and Gold layers. Test your pipelines and make them work better. This helps your team finish work faster and get better results for your business.
You get a simple way to organize your data. The model helps you keep raw, clean, and business-ready data separate. This makes your work easier and your results more reliable.
Yes, you can. The Medallion Model lets you handle both streaming and batch data. You can process new data as it arrives or work with large groups of data.
You add checks and rules at every step. For example:
Bronze: Save raw data.
Silver: Clean and fix errors.
Gold: Make sure data is ready for business.
Tip: Always test your data before moving it to the next layer.
You can use many tools. Popular choices include:
Layer | Tool Example |
|---|---|
Bronze | Apache Spark |
Silver | Delta Lake |
Gold | BigQuery |
Pick tools that fit your needs and budget.
Streamline Data Processing With Apache Kafka's Efficiency
An Introductory Guide to Building Data Pipelines
Linking Live Data to Superset for Instant Analysis