Deduplicate array by key

typescript

A generic TypeScript function that removes duplicate elements from an array based on a custom key selector function, preserving the first occurrence of each unique key.

arraydeduplicationutility

Code

typescript

type KeySelector<T, K> = (item: T) => K;

function uniqueBy<T, K>(array: T[], keySelector: KeySelector<T, K>): T[] {
  const seen = new Set<K>();
  const result: T[] = [];

  for (const item of array) {
    const key = keySelector(item);
    if (!seen.has(key)) {
      seen.add(key);
      result.push(item);
    }
  }

  return result;
}

// Example usage with different data types

interface User {
  id: number;
  name: string;
  email: string;
}

const users: User[] = [
  { id: 1, name: "Alice", email: "alice@example.com" },
  { id: 2, name: "Bob", email: "bob@example.com" },
  { id: 1, name: "Alice Updated", email: "alice.new@example.com" },
  { id: 3, name: "Charlie", email: "charlie@example.com" },
  { id: 2, name: "Bob Clone", email: "bob2@example.com" }
];

// Deduplicate by user ID
const uniqueUsersById = uniqueBy(users, (user) => user.id);
console.log("Unique by ID:", uniqueUsersById);
// Output: [{ id: 1, name: "Alice", ... }, { id: 2, name: "Bob", ... }, { id: 3, name: "Charlie", ... }]

// Deduplicate by email domain
const uniqueUsersByDomain = uniqueBy(users, (user) => user.email.split("@")[1]);
console.log("Unique by domain:", uniqueUsersByDomain);
// Output: [{ id: 1, name: "Alice", ... }] (all share example.com)

// Works with primitives too
const numbers = [1, 2, 3, 2, 4, 1, 5, 3];
const uniqueNumbers = uniqueBy(numbers, (n) => n);
console.log("Unique numbers:", uniqueNumbers);
// Output: [1, 2, 3, 4, 5]

// Deduplicate by computed key (e.g., modulo)
const uniqueByModulo = uniqueBy(numbers, (n) => n % 3);
console.log("Unique by modulo 3:", uniqueByModulo);
// Output: [1, 2, 3] (first occurrences where n % 3 equals 1, 2, 0)

// Complex key using tuple-like string
interface Product {
  category: string;
  sku: string;
  price: number;
}

const products: Product[] = [
  { category: "electronics", sku: "A001", price: 99 },
  { category: "electronics", sku: "A001", price: 89 },
  { category: "clothing", sku: "A001", price: 29 },
  { category: "electronics", sku: "B002", price: 199 }
];

const uniqueProducts = uniqueBy(
  products,
  (p) => `${p.category}:${p.sku}`
);
console.log("Unique products:", uniqueProducts);

How It Works

The uniqueBy function uses TypeScript generics to create a flexible, type-safe deduplication utility. The function signature <T, K> allows T to represent any array element type and K to represent any key type that the selector returns. This means you can deduplicate arrays of objects, primitives, or any complex type while maintaining full type inference throughout your code.

The implementation uses a Set to track which keys have already been encountered, providing O(1) lookup time for each element. We iterate through the array once, checking if each item's computed key exists in the Set. If not, we add both the key to the Set and the original item to the result array. This approach ensures O(n) time complexity overall, making it efficient even for large arrays. The space complexity is O(n) in the worst case where all elements are unique.

A key design decision is keeping the first occurrence of each duplicate rather than the last. This is typically the expected behavior and matches how SQL's DISTINCT or GROUP BY operations work. If you needed the last occurrence instead, you could either reverse the input array before processing and reverse the result, or iterate backwards through the array.

The KeySelector type alias improves code readability and can be reused if you're building multiple similar utility functions. The key can be any type that works with Set equality—primitives like strings and numbers work perfectly. For object keys, remember that Set uses reference equality, so you'd need to serialize objects to strings (as shown in the product example with template literals) for meaningful comparison.

This pattern is ideal for scenarios like removing duplicate API responses based on ID, filtering user lists by email, or any situation where you need to reduce redundancy based on a specific property. Avoid this approach when you need to merge duplicate entries rather than discard them, or when the deduplication logic requires comparing multiple items simultaneously rather than using a simple key lookup.