numpy-sorting
Sorting and searching algorithms including O(n) partitioning, binary search, and hierarchical multi-key sorting. Triggers: sort, argsort, partition, searchsorted, lexsort, nan sort order.
When & Why to Use This Skill
This Claude skill provides high-performance data sorting and searching capabilities powered by NumPy. It enables efficient O(n) partitioning for top-k selection, hierarchical multi-key sorting, and vectorized binary search, allowing for rapid data organization, binning, and retrieval in complex datasets.
Use Cases
- Top-K Selection: Efficiently extract the largest or smallest elements from massive datasets using O(n) partitioning without the overhead of a full sort.
- Hierarchical Data Organization: Order complex datasets based on multiple criteria, such as sorting financial records by date and then by transaction amount.
- Vectorized Range Mapping: Perform fast lookups to map numerical data into specific bins or ranges using optimized binary search algorithms.
- Robust Data Handling: Maintain data integrity by managing specific sort orders for datasets containing NaNs and other sensitive numerical types.
| name | numpy-sorting |
|---|---|
| description | Sorting and searching algorithms including O(n) partitioning, binary search, and hierarchical multi-key sorting. Triggers: sort, argsort, partition, searchsorted, lexsort, nan sort order. |
Overview
NumPy sorting provides efficient tools for ordering data. Beyond basic sorting, it includes partitioning for top-k selection and vectorized binary search for finding insertion points in sorted data.
When to Use
- Finding the top $k$ largest or smallest elements without a full sort ($O(N)$).
- Ordering data based on multiple criteria (e.g., sort by Date, then by Price).
- Mapping data into bins or ranges using binary search.
- Handling datasets containing NaNs where sorting order is sensitive.
Decision Tree
- Need the indices of the sorted order (not the values)?
- Use
np.argsort.
- Use
- Only need the $k$ smallest elements?
- Use
np.partition(arr, k). Elements to the left of index $k$ are smaller.
- Use
- Finding where to insert a value to keep order?
- Use
np.searchsorted(sorted_arr, value).
- Use
Workflows
Efficiently Finding the Smallest K Elements
- Identify an unsorted array.
- Call
np.partition(arr, kth=k). - Select the first k elements:
result[:k].
Vectorized Lookup in Sorted Ranges
- Ensure the target array 'A' is sorted.
- Pass a list of values 'V' to
np.searchsorted(A, V). - Use the returned indices to map values to specific bins or ranges.
Indirect Multi-Key Sort
- Define primary and secondary key arrays.
- Use
np.lexsort((secondary, primary))to get the index array. - Apply the indices to the data to achieve the desired hierarchical sort order.
Non-Obvious Insights
- NaN Position:
np.nanis treated as larger thannp.infand is always sorted to the end of the array. - Partition Performance: Partitioning along the last axis is significantly faster and uses less memory than partitioning along any other axis.
- Lexsort Order:
lexsorttakes keys in reverse order of importance; the last key in the sequence is the primary sort key.
Evidence
- "In the output array, all elements smaller than the k-th element are located to the left of this element and all equal or greater are located to its right." Source
- "Binary search is used to find the required insertion points." Source
Scripts
scripts/numpy-sorting_tool.py: Implements top-k selection and hierarchical lexsort.scripts/numpy-sorting_tool.js: Basic sort simulation.
Dependencies
numpy(Python)