Understanding the difference between UNION and UNION ALL in SQL

UNION and UNION ALL are SQL tools that combine results from several queries. UNION removes duplicates, so equal rows appear only once. UNION ALL keeps every row, even if it repeats. Choose UNION to eliminate duplicates; choose UNION ALL for faster results when duplicates are okay. This helps data work.

Outline:

  • Hook and scope: SQL set operations that sound tiny but matter in real apps
  • What UNION and UNION ALL actually do

  • The big difference: duplicates in the final result

  • A quick example that shows the behavior

  • When to choose each one (with practical reasons)

  • Performance notes and engines you’ll hear about

  • Common gotchas to avoid

  • Quick tips to remember

  • A friendly wrap-up with a nod to real-world data work

UNION vs UNION ALL: a simple choice with real consequences

Let me ask you this: in data work, do you want every copy you’ve got, or just one copy of each thing? It sounds like a small decision, but it changes how your queries behave in a big way. Two lines of SQL, and suddenly you’re choosing between a neat, deduplicated list or a full stack of all rows, including duplicates. The two little keywords that do the job are UNION and UNION ALL. Here’s the thing: they’re both about stacking results from multiple queries, but they handle duplicates very differently.

What they do in plain terms

  • UNION: imagine you’re mixing two guest lists for a party, then cleaning up any double entries so every guest shows up once. In SQL, UNION removes duplicate rows from the combined result.

  • UNION ALL: same idea of merging two lists, but you keep every entry, even if it’s the same row showing up twice. It’s like having two copies of the exact same invitation—both end up on the guest list.

That distinction—that simple rule about duplicates—drives how you choose between them. If you care about never showing the same row twice, go with UNION. If you’re more concerned with speed or you know duplicates are meaningful in your data, UNION ALL is often the better fit.

A concrete example makes it click

Suppose you have two small tables:

  • Table1: (id, name) = (1, 'Ana'), (2, 'Kai')

  • Table2: (id, name) = (2, 'Kai'), (3, 'Lee')

If you want to combine these rows and you don’t care about duplicates, you might write:

  • Using UNION ALL:

  • Result: (1, 'Ana'), (2, 'Kai'), (2, 'Kai'), (3, 'Lee')

  • Using UNION:

  • Result: (1, 'Ana'), (2, 'Kai'), (3, 'Lee')

Notice how the only difference is the extra (2, 'Kai') that UNION ALL keeps but UNION drops. This is exactly the core idea: UNION removes duplicates, UNION ALL preserves them.

When to pick UNION or UNION ALL (with real-world reasoning)

  • Use UNION when you need a clean, unique set of rows

  • You’re aggregating data from multiple sources that might share the same records

  • You want to present a single, non-redundant list to users or downstream queries

  • It’s common when you’re collecting tag-like results or cross-dataset identifiers and you don’t want duplicates to pollute counts or joins later

  • Use UNION ALL when duplicates matter or performance matters

  • You’re stacking datasets that are already known to be distinct, or you simply need to preserve every occurrence

  • You’re performing further processing, like grouping or counting, where keeping duplicates is intentional

  • Performance savings can be meaningful: UNION ALL avoids the deduplication step, which can be non-trivial on large results

A quick performance sanity check

Deduping costs time. The database has to look at every row in both inputs and then determine whether a row is a duplicate, which often means sorting or hashing the combined set. In practice, UNION ALL tends to be faster because it skips that dedup step. Your mileage may vary depending on the engine (PostgreSQL, MySQL, SQL Server, Oracle, etc.) and the size of the inputs, but the general rule holds: if duplicates aren’t a concern, UNION ALL is usually the quicker route.

Common gotchas to watch for

  • Column alignment matters

  • Both SELECT statements in the UNION or UNION ALL must return the same number of columns, and corresponding columns must be compatible in type and order. If you select id, name in one query, make sure the second one lines up the same way.

  • Data type compatibility

  • The data types of each corresponding column should be compatible. If one query returns integers and the other returns strings for the same column, you’ll run into type conversion issues or errors.

  • Column names in the result

  • The final result uses the column names from the first SELECT. If you want specific headings, you can alias them in the first SELECT.

  • ORDER BY placement

  • If you want a sorted, unified list, place ORDER BY at the very end of the entire UNION/UNION ALL chain. Remember, you can’t sprinkle ORDER BY in the middle of the chain and expect a global sort.

  • When to count duplicates

  • If your downstream logic relies on a count of distinct items, UNION helps by reducing duplicates early. If you’re counting occurrences, UNION ALL might be the better fit, followed by a GROUP BY and COUNT.

Tips you can actually use

  • Keep it readable

  • If your queries become long, consider breaking them into views or using CTEs (common table expressions). This makes it easier to reason about what’s being merged and why.

  • Combine with other set operations carefully

  • You can stack UNION or UNION ALL with other set operations like INTERSECT or EXCEPT, but the ordering and logic can get tricky. If you’re new to this, start with a simple pair and test step by step.

  • Don’t over-apply deduplication

  • It’s tempting to throw in DISTINCT on a SELECT to mimic UNION behavior, but DISTINCT is applied after individual SELECTs and then combined, which can produce the same duplicates removal effect but is less clear and sometimes less efficient than using UNION.

  • Test with real data samples

  • A couple of rows aren’t always representative. Try with larger, messier datasets to see how the numbers shift when duplicates matter.

Relatable takeaways: thinking in layers

Think of UNION like cleaning a pantry. You pull in jars from two shelves, and then you wipe out duplicates so you end up with one jar of each item. UNION ALL is more like stacking those jars side by side, no matter what—duplicates stay visible, maybe for an inventory check or a sanity pass. The choice isn’t about “which is better” so much as “which do I need right now?” The right answer depends on what you’re trying to learn from the data and how you want to present it.

A few words on the broader landscape

If you’re exploring SQL on real projects, you’ll encounter set operations repeatedly. They’re the bread-and-butter tools that let you combine results from multiple queries cleanly. You’ll also see them used in reporting dashboards, data integration tasks, and even in ad-hoc analysis when you’re chasing trends across datasets that aren’t perfectly aligned. Getting comfortable with the behavior of UNION and UNION ALL gives you a reliable foundation for more advanced data wrangling.

A quick mental model to keep handy

  • If you want a single, deduplicated list: use UNION

  • If you want to preserve every entry, including duplicates: use UNION ALL

  • If you’re unsure, run a tiny test with a small sample of your real data. See which result set matches your intent. It’s faster than guessing, and you’ll avoid surprises when you scale up.

Wrapping it up—one helpful recap

UNION and UNION ALL are deceptively simple. They’re both about merging results, but they treat duplicates in opposite ways. The choice you make affects not just the result but also performance and downstream analyses. With a clear sense of whether duplicates matter, you’ll pick the right tool for the job, and your data storytelling will be sharper as a result.

If you’re exploring SQL concepts in a broader learning path, keep this distinction in mind as you move through tables, joins, and aggregates. The more you practice with real datasets—seeing how the outputs shift when you switch between deduplicated and non-deduplicated results—the more natural these decisions will feel. And who knows—your next query might reveal a cleaner insight because you chose the right way to merge your data in the first place.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy