This Claude skill provides expert guidance on Ruby performance optimization, profiling, and memory management. It helps developers identify bottlenecks using industry-standard tools like stackprof and benchmark-ips, while offering actionable strategies for reducing object allocations, tuning garbage collection, and implementing efficient concurrency models to significantly speed up Ruby applications.

When should I use performance?

performance is useful in the following scenarios: • Profiling and Benchmarking: Identify performance bottlenecks in Ruby scripts and compare different implementation approaches using statistical analysis tools. • Memory Footprint Reduction: Optimize memory usage by implementing frozen string literals, lazy enumerables, and reducing unnecessary object allocations. • Algorithm Optimization: Enhance execution speed by selecting appropriate data structures (like Sets vs Arrays) and implementing memoization for expensive computations. • Garbage Collection (GC) Tuning: Analyze GC statistics and adjust environment variables to reduce GC pressure and improve application throughput. • Concurrency and Parallelism: Implement modern Ruby concurrency features like Ractors for CPU-bound tasks and Async or Threads for I/O-bound operations.

name	performance
description	This skill should be used when the user asks about "Ruby performance", "optimization", "profiling", "benchmarking", "memory", "garbage collection", "GC", "benchmark-ips", "stackprof", "memory_profiler", "slow code", "speed up Ruby", or needs guidance on making Ruby code faster.
version	1.0.0

Ruby Performance Optimization

Guide to profiling, benchmarking, and optimizing Ruby code.

Profiling First

Always measure before optimizing. Identify bottlenecks with profiling tools.

benchmark-ips

Compare implementations with statistical significance:

require "benchmark/ips"

Benchmark.ips do |x|
  x.report("map + flatten") do
    [[1, 2], [3, 4]].map { |a| a * 2 }.flatten
  end

  x.report("flat_map") do
    [[1, 2], [3, 4]].flat_map { |a| a * 2 }
  end

  x.compare!
end

# Output:
# flat_map:  1234567.8 i/s
# map + flatten: 987654.3 i/s - 1.25x slower

stackprof (CPU Profiling)

require "stackprof"

StackProf.run(mode: :cpu, out: "tmp/stackprof.dump") do
  # Code to profile
  1000.times { expensive_operation }
end

# View results
# $ stackprof tmp/stackprof.dump --text
# $ stackprof tmp/stackprof.dump --method 'YourClass#method'

memory_profiler

require "memory_profiler"

report = MemoryProfiler.report do
  # Code to analyze
  data = process_large_dataset
end

report.pretty_print
# Shows allocated objects, retained objects, memory by gem/file/location

Memory Optimization

Reduce Object Allocations

# Bad: Creates many intermediate objects
def bad_join(items)
  result = ""
  items.each do |item|
    result = result + item.to_s + ", "  # Creates new strings each time
  end
  result
end

# Good: Modify in place
def good_join(items)
  result = +""  # Unfrozen empty string
  items.each do |item|
    result << item.to_s << ", "
  end
  result
end

# Best: Use built-in
items.join(", ")

Frozen String Literals

# frozen_string_literal: true

# All string literals are now frozen (immutable)
# Reduces memory by reusing string objects
name = "Alice"  # Frozen, shared across uses

Symbol vs String

# Symbols are interned (shared in memory)
# Good for hash keys, identifiers
hash = { name: "Alice", age: 30 }  # Symbol keys

# Strings are mutable, not shared
# Good for user data, content
hash = { "user_input" => value }

Lazy Enumerables

# Bad: Loads entire file into memory
File.readlines("large.txt").select { |l| l.include?("ERROR") }.first(10)

# Good: Processes line by line, stops early
File.foreach("large.txt")
    .lazy
    .select { |l| l.include?("ERROR") }
    .first(10)

Object Pooling

class ConnectionPool
  def initialize(size:)
    @available = Array.new(size) { create_connection }
    @mutex = Mutex.new
  end

  def with_connection
    conn = checkout
    yield conn
  ensure
    checkin(conn)
  end

  private

  def checkout
    @mutex.synchronize { @available.pop }
  end

  def checkin(conn)
    @mutex.synchronize { @available.push(conn) }
  end

  def create_connection
    # Expensive connection creation
  end
end

Algorithm Optimization

Choose Right Data Structures

require "set"

# O(n) lookup
array = [1, 2, 3, 4, 5]
array.include?(3)  # Slow for large arrays

# O(1) lookup
set = Set[1, 2, 3, 4, 5]
set.include?(3)  # Fast

# O(1) lookup with value
hash = { 1 => true, 2 => true, 3 => true }
hash.key?(3)  # Fast

Avoid N+1 in Ruby Code

# Bad: O(n*m) - nested iteration
users.each do |user|
  user.orders.each do |order|
    # O(n*m) iterations
  end
end

# Better: Pre-group data
orders_by_user = orders.group_by(&:user_id)
users.each do |user|
  user_orders = orders_by_user[user.id] || []
  # O(n) + O(m) iterations
end

Memoization

class ExpensiveCalculator
  def result
    @result ||= compute_expensive_result
  end

  # For methods with arguments
  def calculate(n)
    @cache ||= {}
    @cache[n] ||= expensive_computation(n)
  end

  # Clear cache when needed
  def clear_cache!
    @result = nil
    @cache = nil
  end
end

Garbage Collection

Understanding GC

# Check GC stats
GC.stat
# => { count: 42, heap_allocated_pages: 100, ... }

# Manual GC (usually not needed)
GC.start

# Disable during benchmarks (not in production)
GC.disable
# ... run benchmark ...
GC.enable

Reduce GC Pressure

# Bad: Many short-lived objects
def bad_process(items)
  items.map { |i| i.to_s }
       .map { |s| s.upcase }
       .map { |s| s.strip }
end

# Good: Chain operations, fewer intermediates
def good_process(items)
  items.map { |i| i.to_s.upcase.strip }
end

# Best: Modify in place when possible
def best_process(items)
  items.each do |i|
    # Modify i in place if possible
  end
end

GC Tuning Environment Variables

# Increase heap slots (reduce GC frequency)
# Note: These values are examples. Profile your application first
# and adjust based on actual memory usage patterns.
RUBY_GC_HEAP_INIT_SLOTS=600000

# Increase malloc limit before GC
RUBY_GC_MALLOC_LIMIT=64000000

# Growth factor for heap
RUBY_GC_HEAP_GROWTH_FACTOR=1.25

Warning: GC tuning values should be determined through profiling your specific application. Avoid cargo-cult optimization by copying values without understanding your application's memory patterns. Always measure before and after tuning.

Concurrency

Threads for I/O

require "concurrent"

# Thread pool for I/O-bound work
pool = Concurrent::ThreadPoolExecutor.new(
  min_threads: 5,
  max_threads: 10,
  max_queue: 100
)

urls.each do |url|
  pool.post do
    fetch_url(url)
  end
end

pool.shutdown
pool.wait_for_termination

Ractors for CPU

# True parallelism for CPU-bound work
ractors = data.each_slice(data.size / 4).map do |chunk|
  Ractor.new(chunk) do |items|
    items.map { |item| expensive_computation(item) }
  end
end

results = ractors.flat_map(&:take)

Async for I/O

require "async"

Async do
  results = urls.map do |url|
    Async do
      fetch_url(url)
    end
  end.map(&:wait)
end

Common Optimizations

String Building

# Bad
result = ""
items.each { |i| result += i.to_s }

# Good
result = items.map(&:to_s).join

# Also good for large strings
io = StringIO.new
items.each { |i| io << i.to_s }
result = io.string

Array Operations

# Use appropriate methods
array.any? { |x| x > 5 }  # Stops at first match
array.all? { |x| x > 5 }  # Stops at first failure
array.find { |x| x > 5 }  # Returns first match

# Avoid repeated operations
# Bad
array.count > 0   # Counts all elements
# Good
array.any?        # Stops immediately

# Bad
array.select { ... }.first
# Good
array.find { ... }

Hash Operations

# Use fetch with default
hash.fetch(:key, default_value)
hash.fetch(:key) { compute_default }

# Transform keys/values efficiently
hash.transform_keys(&:to_sym)
hash.transform_values(&:to_s)

# Merge in place
hash.merge!(other_hash)  # Modifies hash

Additional Resources

Reference Files

references/profiling-guide.md - Detailed profiling workflows and tool usage

performance

When & Why to Use This Skill

Use Cases