Chapter 20: Performance Optimization#

In this chapter, we'll explore techniques for optimizing Jac applications to achieve better performance in both local and distributed environments. We'll build and progressively optimize a friend-finding algorithm that demonstrates graph structure optimization, traversal efficiency, and memory management strategies.

What You'll Learn

Graph structure optimization techniques
Efficient traversal patterns and algorithms
Memory management strategies in Jac
Performance monitoring and profiling
Distributed performance considerations

Graph Structure Optimization#

The foundation of Jac performance lies in how you structure your graph data. Efficient graph design can dramatically improve traversal performance and reduce memory usage.

Optimization Benefits

Faster Traversals: Well-structured graphs enable efficient pathfinding
Reduced Memory: Optimized node relationships minimize storage overhead
Better Scaling: Efficient structures handle larger datasets gracefully
Improved Caching: Predictable access patterns enhance cache performance

Traditional vs Optimized Graph Design#

Graph Design Comparison

Traditional ApproachJac Optimized Structure

# inefficient_friends.py - Nested loops and redundant data
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
        self.friends = []  # Direct list storage
        self.friend_data = {}  # Redundant friend information

    def add_friend(self, friend):
        self.friends.append(friend)
        friend.friends.append(self)  # Bidirectional
        # Store redundant data
        self.friend_data[friend.name] = {
            'age': friend.age,
            'mutual_friends': []
        }

    def find_mutual_friends(self, other_person):
        mutual = []
        # Inefficient nested loop
        for my_friend in self.friends:
            for their_friend in other_person.friends:
                if my_friend.name == their_friend.name:
                    mutual.append(my_friend)
        return mutual

    def friends_of_friends(self, max_depth=2):
        found = set()
        # Inefficient recursive traversal
        def traverse(person, depth):
            if depth <= 0:
                return
            for friend in person.friends:
                if friend.name != self.name:
                    found.add(friend.name)
                    traverse(friend, depth - 1)

        traverse(self, max_depth)
        return list(found)

# optimized_friends.jac - Efficient graph-native design
node Person {
    has name: str;
    has age: int;
    has friend_count: int = 0;  # Cached for quick access

    def add_friend(friend: Person) -> bool {
        # Check if already connected to avoid duplicates
        existing = [self --> Friend --> Person](?name == friend.name);
        if existing {
            return false;
        }

        # Create bidirectional connection efficiently
        friendship = Friend(since="2024-01-15");
        self ++> friendship ++> friend;

        # Update cached counters
        self.friend_count += 1;
        friend.friend_count += 1;
        return true;
    }
}

edge Friend {
    has since: str;
    has strength: int = 1;  # Relationship strength for weighted algorithms
}

walker find_mutual_friends {
    has person1_name: str;
    has person2_name: str;

    can find_efficiently with `root entry {
        # Direct graph traversal - no nested loops
        person1 = [-->(`?Person)](?name == self.person1_name);
        person2 = [-->(`?Person)](?name == self.person2_name);

        if not person1 or not person2 {
            report {"error": "Person not found"};
            return;
        }

        # Get friends using graph navigation
        person1_friends = [person1[0] --> Friend --> Person];
        person2_friends = [person2[0] --> Friend --> Person];

        # Efficient set intersection
        mutual_names = {f.name for f in person1_friends} & {f.name for f in person2_friends};

        report {
            "mutual_friends": list(mutual_names),
            "count": len(mutual_names)
        };
    }
}

Traversal Efficiency#

Efficient graph traversal is crucial for performance in Object-Spatial Programming. Let's examine different approaches to finding friends-of-friends and optimize them progressively.

Basic Friend-Finding Algorithm#

Simple Friend Discovery

JacPython Equivalent

# basic_friend_finder.jac
walker find_friends_of_friends {
    has person_name: str;
    has max_depth: int = 2;
    has visited: set[str] = set();
    has results: set[str] = set();

    can traverse_network with `root entry {
        start_person = [-->(`?Person)](?name == self.person_name);

        if not start_person {
            report {"error": "Person not found"};
            return;
        }

        # Start traversal from the person
        self.traverse_from_person(start_person[0], self.max_depth);

        # Remove the starting person from results
        self.results.discard(self.person_name);

        report {
            "person": self.person_name,
            "friends_of_friends": list(self.results),
            "total_found": len(self.results)
        };
    }

    def traverse_from_person(person: Person, remaining_depth: int) {
        if remaining_depth <= 0 or person.name in self.visited {
            return;
        }

        self.visited.add(person.name);
        self.results.add(person.name);

        # Navigate to friends and continue traversal
        friends = [person --> Friend --> Person];
        for friend in friends {
            self.traverse_from_person(friend, remaining_depth - 1);
        }
    }
}

# basic_friend_finder.py - Less efficient traversal
def find_friends_of_friends(person_name, people_data, max_depth=2):
    visited = set()
    results = set()

    def traverse(current_name, depth):
        if depth <= 0 or current_name in visited:
            return

        visited.add(current_name)
        results.add(current_name)

        # Manual lookup in data structure
        if current_name in people_data:
            friends = people_data[current_name].get('friends', [])
            for friend_name in friends:
                traverse(friend_name, depth - 1)

    traverse(person_name, max_depth)
    results.discard(person_name)  # Remove starting person

    return {
        "person": person_name,
        "friends_of_friends": list(results),
        "total_found": len(results)
    }

Optimized Breadth-First Traversal#

Performance-Optimized Friend Finding

# optimized_friend_finder.jac
import from collections { deque }

walker find_friends_optimized {
    has person_name: str;
    has max_depth: int = 2;

    can breadth_first_search with `root entry {
        start_person = [-->(`?Person)](?name == self.person_name);

        if not start_person {
            report {"error": "Person not found"};
            return;
        }

        # Use BFS for more predictable performance
        queue = deque([(start_person[0], 0)]);  # (person, depth)
        visited = {self.person_name};
        results = set();

        while queue {
            current_person, depth = queue.popleft();

            if depth >= self.max_depth {
                continue;
            }

            # Get friends efficiently
            friends = [current_person --> Friend --> Person];

            for friend in friends {
                if friend.name not in visited {
                    visited.add(friend.name);
                    results.add(friend.name);
                    queue.append((friend, depth + 1));
                }
            }
        }

        report {
            "person": self.person_name,
            "friends_of_friends": list(results),
            "total_found": len(results),
            "algorithm": "breadth_first"
        };
    }
}

# Cached version for repeated queries
walker find_friends_cached {
    has person_name: str;
    has max_depth: int = 2;

    can cached_search with `root entry {
        start_person = [-->(`?Person)](?name == self.person_name);

        if not start_person {
            report {"error": "Person not found"};
            return;
        }

        person = start_person[0];

        # Check if we have cached results
        cache_nodes = [person --> CacheEntry](?depth == self.max_depth);

        if cache_nodes {
            cache = cache_nodes[0];
            report {
                "person": self.person_name,
                "friends_of_friends": cache.friend_names,
                "total_found": len(cache.friend_names),
                "cached": true
            };
            return;
        }

        # Compute and cache results
        queue = deque([(person, 0)]);
        visited = {self.person_name};
        results = set();

        while queue {
            current_person, depth = queue.popleft();

            if depth >= self.max_depth {
                continue;
            }

            friends = [current_person --> Friend --> Person];
            for friend in friends {
                if friend.name not in visited {
                    visited.add(friend.name);
                    results.add(friend.name);
                    queue.append((friend, depth + 1));
                }
            }
        }

        # Cache the results
        cache = CacheEntry(
            depth=self.max_depth,
            friend_names=list(results),
            computed_at="2024-01-15"
        );
        person ++> cache;

        report {
            "person": self.person_name,
            "friends_of_friends": list(results),
            "total_found": len(results),
            "cached": false
        };
    }
}

node CacheEntry {
    has depth: int;
    has friend_names: list[str];
    has computed_at: str;
}

Memory Management#

Efficient memory usage is critical for large-scale graph applications. Let's explore techniques to minimize memory footprint while maintaining performance.

Memory-Efficient Data Structures#

Optimized Memory Usage

# memory_optimized.jac
# Use lightweight nodes for large-scale networks
node LightPerson {
    has name: str;
    has age: int;
    # Remove unnecessary cached data to save memory

    def get_friend_count() -> int {
        # Calculate on-demand instead of caching
        return len([self --> (`?Friend) --> (`?LightPerson)]);
    }

    def get_connections_summary() -> dict {
        friends = [self --> (`?Friend) --> (`?LightPerson)];

        return {
            "friend_count": len(friends),
            "avg_age": sum(f.age for f in friends) / len(friends) if friends else 0,
            "friend_names": [f.name for f in friends[:5]]  # Limit for memory
        };
    }
}

# Memory-conscious walker with cleanup
walker find_friends_memory_efficient {
    has person_name: str;
    has max_depth: int = 2;
    has batch_size: int = 100;  # Process in batches

    can memory_conscious_search with `root entry {
        start_person = [-->](`?LightPerson)(?name == self.person_name);

        if not start_person {
            report {"error": "Person not found"};
            return;
        }

        results = [];
        processed = 0;
        queue = deque([(start_person[0], 0)]);
        visited = {self.person_name};

        while queue and processed < self.batch_size {
            current_person, depth = queue.popleft();
            processed += 1;

            if depth >= self.max_depth {
                continue;
            }

            # Get only essential friend data
            friends = [current_person --> Friend --> LightPerson];

            for friend in friends[:10] {  # Limit friends per iteration
                if friend.name not in visited {
                    visited.add(friend.name);
                    results.append({
                        "name": friend.name,
                        "age": friend.age,
                        "depth": depth + 1
                    });

                    if depth + 1 < self.max_depth {
                        queue.append((friend, depth + 1));
                    }
                }
            }

            # Periodic cleanup of references
            if processed % 50 == 0 {
                # Force cleanup of temporary variables
                friends = None;
            }
        }

        report {
            "person": self.person_name,
            "friends_found": results,
            "total_processed": processed,
            "memory_optimized": true
        };
    }
}

Performance Monitoring#

Performance Tracking Walker

# performance_monitor.jac
import from time { time }
import from psutil { Process }

walker benchmark_friend_finding {
    has person_name: str;
    has algorithm: str = "optimized";  # "basic", "optimized", "cached"

    can run_benchmark with `root entry {
        start_time = time();
        process = Process();
        start_memory = process.memory_info().rss / 1024 / 1024;  # MB

        # Run the specified algorithm
        if self.algorithm == "basic" {
            result = find_friends_of_friends(person_name=self.person_name) spawn here;
        } elif self.algorithm == "optimized" {
            result = find_friends_optimized(person_name=self.person_name) spawn here;
        } elif self.algorithm == "cached" {
            result = find_friends_cached(person_name=self.person_name) spawn here;
        } else {
            report {"error": "Unknown algorithm"};
            return;
        }

        end_time = time();
        end_memory = process.memory_info().rss / 1024 / 1024;  # MB

        # Performance metrics
        execution_time = end_time - start_time;
        memory_used = end_memory - start_memory;

        report {
            "algorithm": self.algorithm,
            "execution_time_ms": round(execution_time * 1000, 2),
            "memory_used_mb": round(memory_used, 2),
            "friends_found": len(result[0].get("friends_of_friends", [])),
            "performance_ratio": round(len(result[0].get("friends_of_friends", [])) / execution_time, 2)
        };
    }
}

# Batch benchmarking for statistical analysis
walker run_performance_suite {
    has test_count: int = 10;
    has test_persons: list[str] = ["Alice", "Bob", "Charlie"];

    can comprehensive_benchmark with `root entry {
        algorithms = ["basic", "optimized", "cached"];
        results = [];

        for algorithm in algorithms {
            algorithm_results = [];

            for person in self.test_persons {
                for i in range(self.test_count) {
                    benchmark = benchmark_friend_finding(
                        person_name=person,
                        algorithm=algorithm
                    );
                    result = benchmark spawn here;
                    algorithm_results.append(result[0]);
                }
            }

            # Calculate averages
            avg_time = sum(r["execution_time_ms"] for r in algorithm_results) / len(algorithm_results);
            avg_memory = sum(r["memory_used_mb"] for r in algorithm_results) / len(algorithm_results);
            avg_friends = sum(r["friends_found"] for r in algorithm_results) / len(algorithm_results);

            results.append({
                "algorithm": algorithm,
                "avg_execution_time_ms": round(avg_time, 2),
                "avg_memory_used_mb": round(avg_memory, 2),
                "avg_friends_found": round(avg_friends, 2),
                "total_tests": len(algorithm_results)
            });
        }

        report {
            "benchmark_suite": results,
            "test_configuration": {
                "test_count": self.test_count,
                "test_persons": self.test_persons
            }
        };
    }
}

Distributed Performance#

When deploying to Jac Cloud, consider performance implications of distributed execution and data locality.

Cloud-Optimized Friend Finding#

Distributed Performance Patterns

# distributed_friends.jac
walker find_friends_distributed {
    has person_name: str;
    has max_depth: int = 2;
    has partition_size: int = 50;  # Optimize for cloud chunks

    can cloud_optimized_search with `root entry {
        start_person = [-->](`?LightPerson)(?name == self.person_name);

        if not start_person {
            report {"error": "Person not found"};
            return;
        }

        # Partition the search for distributed processing
        person = start_person[0];
        immediate_friends = [person --> Friend --> LightPerson];

        # Process in chunks optimized for cloud deployment
        friend_chunks = [
            immediate_friends[i:i + self.partition_size]
            for i in range(0, len(immediate_friends), self.partition_size)
        ];

        all_results = [];
        chunk_count = 0;

        for chunk in friend_chunks {
            chunk_results = [];
            chunk_count += 1;

            for friend in chunk {
                # Second-degree friends
                if self.max_depth > 1 {
                    second_degree = [friend --> Friend --> LightPerson];
                    for second_friend in second_degree {
                        if second_friend.name != self.person_name {
                            chunk_results.append({
                                "name": second_friend.name,
                                "age": second_friend.age,
                                "via": friend.name,
                                "depth": 2
                            });
                        }
                    }
                }

                chunk_results.append({
                    "name": friend.name,
                    "age": friend.age,
                    "via": "direct",
                    "depth": 1
                });
            }

            all_results.extend(chunk_results);
        }

        # Remove duplicates efficiently
        unique_results = {};
        for result in all_results {
            unique_results[result["name"]] = result;
        }

        report {
            "person": self.person_name,
            "friends_network": list(unique_results.values()),
            "total_found": len(unique_results),
            "chunks_processed": chunk_count,
            "distributed": true
        };
    }
}

# Health check for performance monitoring
walker performance_health_check {
    can check_system_health with `root entry {
        # Count total nodes and relationships
        all_persons = [-->](`?LightPerson);
        total_friendships = [-->](`?Friend);

        # Calculate network density
        max_possible_connections = len(all_persons) * (len(all_persons) - 1) / 2;
        density = len(total_friendships) / max_possible_connections if max_possible_connections > 0 else 0;

        # Sample performance with a quick test
        start_time = time();
        if all_persons {
            sample_person = all_persons[0];
            sample_friends = [sample_person --> Friend --> LightPerson];
        }
        sample_time = time() - start_time;

        report {
            "total_persons": len(all_persons),
            "total_friendships": len(total_friendships),
            "network_density": round(density, 4),
            "sample_query_time_ms": round(sample_time * 1000, 2),
            "health_status": "good" if sample_time < 0.1 else "degraded"
        };
    }
}

Performance Testing and Deployment#

Creating Test Data#

Performance Test Setup

# test_data_generator.jac
import from random { randint, choice }

walker generate_test_network {
    has person_count: int = 100;
    has avg_friends: int = 5;

    can create_test_network with `root entry {
        # Clear existing test data
        existing_persons = [-->](`?LightPerson);
        existing_friends = [-->](`?Friend);

        for person in existing_persons {
            del person;
        }
        for friendship in existing_friends {
            del friendship;
        }

        # Create people
        people = [];
        for i in range(self.person_count) {
            person = LightPerson(
                name=f"Person{i}",
                age=randint(18, 65)
            );
            here ++> person;
            people.append(person);
        }

        # Create friendships
        total_friendships = 0;
        for person in people {
            friends_to_add = randint(1, self.avg_friends * 2);

            for _ in range(friends_to_add) {
                potential_friend = choice(people);
                if potential_friend != person {
                    # Check if friendship already exists
                    existing = [person --> Friend --> LightPerson](?name == potential_friend.name);
                    if not existing {
                        friendship = Friend(since="2024-01-15");
                        person ++> friendship ++> potential_friend;
                        total_friendships += 1;
                    }
                }
            }
        }

        report {
            "people_created": len(people),
            "friendships_created": total_friendships,
            "avg_friends_per_person": round(total_friendships * 2 / len(people), 2)
        };
    }
}

Testing Performance#

# Deploy the optimized version
jac serve distributed_friends.jac

# Generate test data
curl -X POST http://localhost:8000/walker/generate_test_network \
  -H "Content-Type: application/json" \
  -d '{"person_count": 1000, "avg_friends": 10}'

# Run performance benchmarks
curl -X POST http://localhost:8000/walker/run_performance_suite \
  -H "Content-Type: application/json" \
  -d '{"test_count": 5, "test_persons": ["Person1", "Person50", "Person100"]}'

# Check system health
curl -X POST http://localhost:8000/walker/performance_health_check \
  -H "Content-Type: application/json" \
  -d '{}'

Best Practices#

Performance Optimization Guidelines

Measure before optimizing: Always profile before making performance changes
Optimize hot paths: Focus on frequently executed code sections
Use appropriate data structures: Choose the right graph patterns for your use case
Cache intelligently: Cache expensive computations but avoid memory leaks
Batch operations: Process multiple items together when possible
Monitor continuously: Track performance metrics in production

Key Takeaways#

What We've Learned

Graph Structure Optimization:

Efficient relationships: Well-designed edges and nodes improve traversal speed
Index strategy: Cache frequently accessed data for faster retrieval
Memory management: Batch processing and cleanup reduce memory footprint
Connection patterns: Optimize graph topology for common access patterns

Algorithm Optimization:

Traversal strategies: Choose BFS vs DFS based on your specific requirements
Caching patterns: Strategic caching reduces redundant computations
Batch processing: Handle large datasets efficiently with chunking
Early termination: Stop processing when results are sufficient

Performance Monitoring:

Built-in metrics: Track execution time, memory usage, and throughput
Benchmarking: Compare different implementation strategies objectively
Real-world testing: Test with production-scale data and load patterns
Continuous profiling: Monitor performance trends over time

Distributed Performance:

Cloud optimization: Design for distributed execution and data locality
Scaling patterns: Horizontal scaling through partitioning and parallelization
Resource efficiency: Optimize for cloud computing cost and performance
Load balancing: Distribute work evenly across available resources

Try It Yourself

Apply performance optimization by: - Profiling your graph applications to identify bottlenecks - Implementing different caching strategies and measuring their impact - Testing with larger datasets to understand scaling characteristics - Optimizing walker traversal patterns for your specific use cases

Remember: Performance optimization is an iterative process - measure, optimize, and verify!

Ready to learn about migration strategies? Continue to Chapter 21: Python to Jac Migration!