Part 4. Merging and appending branches
Step by Step Guide: How to Apply Merge & Append
Both Steps are used for combine datasets: With the Append Step, you will combine vertically both datasets. On the other hand, Merge allows you to combine datasets side by side by matching entity values on both datasets.
01. Clipping Datasets
Go to the repository Sales Dataset Example. Then, clipped the three datasets to use them later:
02. Create a Pipeline
As we have seen, go to the right corner of your screen Create New then click on Pipeline
We will begin by loading Jane's sales data. Let's select Fetch Alphacast Dataset and drag and drop our clipped dataset SalesData from the sidebar.
03. Append Step
We have a number of options here. First, we need to choose which data we are going to append Jhon Doe information to our data. We have two options:
3.1 Load a dataset
3.2 Fetch a dataset
We have the option of transforming it before appending it. If you like, you can add more steps to be executed between the fetching of the data and the append step.
04. Append Editor
We need to define how will the "duplicates" be handled. What will happen if there are rows in both datasets that share the same entities? You have to decide whether you will keep the data from the original dataset, from the new one, or drop both.
Notice that we now have John and Jane on a single dataset.
05. Merge Step
Now we are going to combine this data with the third dataset, with information on the number of items sold by both of them.Again, we need to choose the dataset. This time we will drag and drop "Items Sold".
06. Merge Editor
6.1 Criteria to combine the rows
How can we know which rows from the first dataset correspond to each row from the second one.
Year and Sales Person are the two entities of our data. In this example, we just need to define that year match with year, and that sales persons match with sales person.
6.2 Find a match between both datasets
We have to decide what will happen when not all rows and columns in both datasets find a match in the other one. There are four types of criteria for joining:
- Inner join: The new dataset will have only those rows that can be matched.
- Left join: All the rows of dataset A will be present and the unmatched rows of dataset B are discarded.
- Right join: Reverse to the previous one. All those from dataset B and discarded the unmatched ones from A.
- Outer join: The data from both datasets will remain even if they do not match.
In this example, all the rows match, so the result will look alike with any option. For more information on Merging Datasets click here (opens in a new tab)