Merging & Joins
BoxFrame provides two main ways to combine DataFrames: concatenation (stacking) and merging (SQL-style joins).
Concatenation
The concat() method stacks DataFrames vertically or horizontally. This is useful for combining datasets with the same structure or adding columns side-by-side.
Vertical Concatenation (axis=0)
Stacks DataFrames on top of each other, adding rows:
const df1 = new DataFrame({
id: [1, 2],
name: ["Alice", "Bob"],
age: [25, 30]
});
const df2 = new DataFrame({
id: [3, 4],
name: ["Charlie", "Diana"],
age: [35, 28]
});
const result = BoxFrame.concat([df1, df2], 0);
// Result: 4 rows with id, name, age columns
When columns don't match:
const df1 = new DataFrame({ id: [1, 2], name: ["Alice", "Bob"] });
const df2 = new DataFrame({ id: [3, 4], age: [25, 30] });
const result = BoxFrame.concat([df1, df2], 0);
// Result: 4 rows with id, name, age columns
// Missing values are filled with null
Horizontal Concatenation (axis=1)
Stacks DataFrames side-by-side, adding columns:
const df1 = new DataFrame({ id: [1, 2], name: ["Alice", "Bob"] });
const df2 = new DataFrame({ age: [25, 30], city: ["NY", "LA"] });
const result = BoxFrame.concat([df1, df2], 1);
// Result: 2 rows with id, name, age, city columns
With ignore_index option:
const result = BoxFrame.concat([df1, df2], 0, { ignore_index: true });
// Creates new sequential index: [0, 1, 2, 3, ...]
SQL-Style Joins (merge)
The merge() method performs SQL-style joins based on matching key values. This is essential for combining datasets with related information.
Join Types
BoxFrame supports four join types:
- inner: Only rows with matching keys in both DataFrames
- left: All rows from left DataFrame, matched with right where available
- right: All rows from right DataFrame, matched with left where available
- outer: All rows from both DataFrames
Basic Merge Examples
Inner Join
const customers = new DataFrame({
id: [1, 2, 3],
name: ["Alice", "Bob", "Charlie"]
});
const orders = new DataFrame({
customer_id: [2, 3, 4],
order_total: [100, 200, 150]
});
// Inner join - only matching rows
const result = BoxFrame.merge(customers, orders, {
leftOn: "id",
rightOn: "customer_id",
how: "inner"
});
// Result: 2 rows (id 2 and 3 matched)
Left Join
// Left join - all customers, with orders where available
const result = BoxFrame.merge(customers, orders, {
leftOn: "id",
rightOn: "customer_id",
how: "left"
});
// Result: 3 rows (all customers, order_total is null for id=1)
Right Join
// Right join - all orders, with customer info where available
const result = BoxFrame.merge(customers, orders, {
leftOn: "id",
rightOn: "customer_id",
how: "right"
});
// Result: 3 rows (all orders, name is null for customer_id=4)
Outer Join
// Outer join - all rows from both DataFrames
const result = BoxFrame.merge(customers, orders, {
leftOn: "id",
rightOn: "customer_id",
how: "outer"
});
// Result: 4 rows (all customers and all orders)
Same Column Names
When both DataFrames have the same column name for the key, use on:
const df1 = new DataFrame({ id: [1, 2, 3], name: ["Alice", "Bob", "Charlie"] });
const df2 = new DataFrame({ id: [2, 3, 4], age: [25, 30, 35] });
const result = BoxFrame.merge(df1, df2, { on: "id", how: "inner" });
Multiple Join Keys
You can join on multiple columns:
const df1 = new DataFrame({
year: [2020, 2020, 2021],
month: [1, 2, 1],
sales: [1000, 1500, 1200]
});
const df2 = new DataFrame({
year: [2020, 2020, 2021],
month: [1, 2, 2],
expenses: [500, 600, 550]
});
const result = BoxFrame.merge(df1, df2, {
on: ["year", "month"],
how: "inner"
});
// Matches rows where both year AND month are equal
Column Name Conflicts
When both DataFrames have columns with the same name (other than join keys), the right DataFrame's columns get a suffix:
const df1 = new DataFrame({ id: [1, 2], value: [10, 20] });
const df2 = new DataFrame({ id: [1, 2], value: [100, 200] });
const result = BoxFrame.merge(df1, df2, { on: "id", how: "inner" });
// Result columns: id, value (from df1), value_y (from df2)
Custom suffixes:
const result = BoxFrame.merge(df1, df2, {
on: "id",
how: "inner",
suffixes: ["_left", "_right"]
});
// Result columns: id, value_left, value_right
Duplicate Keys (Cartesian Product)
When a key appears multiple times in either DataFrame, merge creates a cartesian product:
const df1 = new DataFrame({
id: [1, 1, 2],
name: ["Alice", "Alice", "Bob"]
});
const df2 = new DataFrame({
id: [1, 1, 3],
order: ["A", "B", "C"]
});
const result = BoxFrame.merge(df1, df2, { on: "id", how: "inner" });
// Result: 4 rows (2 left rows × 2 right rows for id=1)
When to Use Concat vs Merge
Use concat() when:
- Combining datasets with the same structure (same columns)
- Appending new rows to existing data
- Adding columns side-by-side (axis=1)
- Simple stacking operations
Use merge() when:
- Combining related data from different sources
- Joining on key values (like SQL JOIN)
- Need to match rows based on specific columns
- Working with relational data
Complete Example
// Employee data
const employees = new DataFrame({
emp_id: [1, 2, 3, 4],
name: ["Alice", "Bob", "Charlie", "Diana"],
dept_id: [10, 20, 10, 30]
});
// Department data
const departments = new DataFrame({
dept_id: [10, 20, 30],
dept_name: ["Engineering", "Sales", "HR"]
});
// Join employees with departments
const withDept = BoxFrame.merge(employees, departments, {
leftOn: "dept_id",
rightOn: "dept_id",
how: "left"
});
// Add salary data (same structure, use concat)
const newEmployees = new DataFrame({
emp_id: [5, 6],
name: ["Eve", "Frank"],
dept_id: [20, 10]
});
const allEmployees = BoxFrame.concat([employees, newEmployees], 0);
API Reference
BoxFrame.concat()
static concat(
dataFrames: DataFrame[],
axis?: 0 | 1,
options?: { ignore_index?: boolean }
): DataFrame
Parameters:
dataFrames: Array of DataFrames to concatenateaxis:0for vertical (rows),1for horizontal (columns). Default:0options.ignore_index: Iftrue, creates new sequential index. Default:false
BoxFrame.merge()
static merge(
left: DataFrame,
right: DataFrame,
options?: MergeOptions
): DataFrame
Parameters:
left: Left DataFrameright: Right DataFrameoptions: Merge configuration
MergeOptions:
interface MergeOptions {
how?: "inner" | "left" | "right" | "outer"; // Join type (default: "inner")
on?: string | string[]; // Column name(s) if same in both
leftOn?: string | string[]; // Left DataFrame key column(s)
rightOn?: string | string[]; // Right DataFrame key column(s)
suffixes?: [string, string]; // Suffixes for conflicts (default: ["_x", "_y"])
}
Notes:
- Either specify
on(same column names) or bothleftOnandrightOn(different names) - Multiple keys are supported by passing arrays
- Unmatched rows are filled with
nullvalues - Duplicate keys create cartesian products (all combinations)