Monday, March 31, 2014

Segment Trees and USACO March Contest #1 (Gold)

The problem can be found here:
http://usaco.org/index.php?page=viewproblem2&cpid=418

I was unable to solve this problem in the actual contest, but after spending some time learning segment trees, I was able to understand the solution. This problem in USACO illustrates the power of segment trees. If you are not already familiar with segment trees, there are many tutorials online.

Here is a detailed explanation of the solution.

The idea here is to use a line-sweep algorithm. However, there is a slight difficulty. We notice that the points reachable by Bessie within a maximum manhattan distance of K is actually a square rotated by 45°. We must transform the coordinates such that the sides of this square are parallel to the new x and y axes. This is achieved by the following transformation:
(x,y) ->  (x+y,y-x)
For example, (3,5) gets transformed to (8,2). 

It is important to note that now, the square is rotated 45° and also dilated by a factor of sqrt(2). Clearly, the sides of the square are now 2K units long.

Instead of thinking of this square as the points reachable by Bessie traveling a maximum manhattan distance of K, it is advantageous to think of this rather as a square (centered on a grass patch) that contains the points where Bessie can be such that she can still reach the grass patch. It is easy to see that this square has the same dimensions.

For example, say the ith grass patch is located at (3,5) (after transformation) and K = 3. Bessie can be at any location (x,y) where 0<=x<=6 and 2<=y<=8. This is a square region with side length 2K. Each grass patch will produce its own square region.

Now we are ready to line sweep. Just like in all line-sweep algorithms, we need to pick "event" points. Each grass patch essentially has 5 interesting pieces of information. If grass patch i is located at (x_i,y_i), then here are the 5 pieces of information that we care about:
1.  x-coordinate of the left boundary of the square formed (this was described earlier). The value is (x_i-K).
2.  x-coordinate of the right boundary. The value is (x_i+K).
3.  y-coordinate of the bottom boundary. The value is (y_i-K).
4.  y-coordinate of the top boundary. The value is (y_i+K).
5. The value of the grass patch. The value is g_i.

Using this information, we can create two "events" for each grass patch. One for entering the square region and one for exiting. We assign an event as a combination of 4 integers (x,y1,y2,g), where x is the x-coordinate of the event, y1 the lower boundary of the region, y2 the upper boundary, and g is the value of the grass patch (positive for entering, negative for exiting). The events are sorted by non-decreasing x-coordinate.

Essentially, one should imagine the line as Bessie herself. For example, if Bessie is located on the left boundary, and she is also in between y1 and y2, then she can reach the grass. Naturally, we should add this to the amount of grass she can reach if she is in the segment [y1,y2]. Thus, the value of g for this event is positive. Conversely, if she is on the right boundary, she is exiting the region, so we should subtract this amount of grass from the total reachable grass if she is located in the segment [y1,y2].

Here is where the segment tree comes in. We need to use a segment tree to find the maximum grass Bessie can be able to reach given that she is located at a certain x coordinate (but she can be at any y coordinate). Everytime Bessie visits a new event (in the course of our line-sweep), we update this segment tree, checking if the new maximum for this event exceeds the previous best.

How do we update the tree? Think of each node of the segment tree representing the maximum amount of grass reachable given that Bessie is located in the segment represented by that node. The x-coordinate, as we already know, is fixed when we update. Thus, for each event, we update by incrementing the segment [y1,y2] by g. Then the maximum for this event is given by the query for the root node, which gives the maximum over all y. 

Since we are always querying the root node, this suggests a heuristic - lazy propagation. Again, there are many tutorials on lazy propagation, so I will not go into much detail.


Hopefully, I have helped you understand better the solution to this problem as well as introduce an interesting application of the segment tree data structure.















Tuesday, December 24, 2013

SPOJ Problem: Prime Generator

I guess I'm taking a break from USACO training and I'm now starting SPOJ.

Problem is here: http://www.spoj.com/problems/PRIME1/

The problem statement is pretty straightforward and easy to understand.
The difficult part is getting the solution to run in time.

We have that m,n<=10^9, and n-m<=10^5. Thus, simply generating a sieve from 1 - 10^9, and then printing the primes between m and n will not work. Neither will the primality test method for the ~10^5 numbers between m and n.

What we need is a segmented sieve. First, let's see how the sieve of Eratosthenes works. If we wanted to find the primes between 1 and 100, all we need to do is initialize a boolean array of size 100 and take the primes 2,3,5,7 and mark all multiples of those primes to be composite. Why don't we need to consider the multiples of primes 11 and greater? Consider the following argument:

The smallest number that does not have a prime factor in the set {2,3,5,7} is 11*11= 121.

This is relatively easy to see, and 121>100. It follows that if we want to perform a sieve up to N, then we only need to consider the primes <= sqrt(N).

We can just shift our focus to a segment (m,n) now. First, we precompute the primes from 1 to sqrt(10^9) using a normal sieve (there are only about 3500 of them). Next we initialize a boolean array of size n-m+1, where the ith index represents whether m+i is prime or not. The "interesting" multiples of each prime are given by: (ceil(m/p)+k)*p where 0<=k<=n.

Using integer division and loops for each prime p:

for(int j =(m+p-1)/p; j*p<=n; j++)
     if(j!=1)
          sieve[j*p-m]=false;

This raises a puzzling question- What is the runtime complexity? We have our loop through the primes < sqrt(10^9) and then we have our loop to mark the multiples of these primes as composite.
We have something that looks like this: sum(10^5/p) for all primes p < sqrt(10^9). Taking out the 10^5 we get:
10^5(1/2+1/3+1/5+...+ 1/~3500th prime) = ~10^5*(ln(ln(3500))+.26) by the harmonic series for primes approximation. This gives ~250,000 iterations, and since there are 10 test cases 250000*10=2.5*10^6, well under the time limit.

Saturday, December 14, 2013

USACO Training Problem: Cow Tours

Here's an interesting graph theory problem:
http://cerberus.delos.com:790/usacoprob2?a=osacVNAncMv&S=cowtour

What makes this problem hard is the incorrect assumption that many people (including me) make.

The first part is relatively simple. The idea is to calculate the Euclidean distances between connected vertices and store these distances in a matrix. Even though the problem gives that vertex i is not connected to itself, we can assume it is with no harm. Thus, dist(i,i) = 0 for all i.

The next step is to use Floyd-Warshall on the matrix of distances such that for each i,j, dist(i,j) stores the shortest path from vertex i to vertex j. dist(i,j) is infinity if j is unreachable from i.

Make an array of size n that stores the farthest distance that can be traveled from a given vertex taking only shortest paths (not counting infinity). Thus farthest[i]=max_not_infinity(dist[i]).

It is advantageous but not necessary to find the diameter of each field by precalculation.

Keep track of a global variable that has initial value infinity.
Finally loop through the entire matrix, and every time dist(i,j)==infinity (which means that edge(i,j) does not exist and is a candidate to be filled in), find the maximum of the following values:
d1 - farthest[i]+distance(P_i,P_j)+farthest[j]
d2 - max(diameter[i],diameter[j])  //This is the part that most people forget
where distance(Point a, Point b) returns the Euclidean distance between two points and where P_i and P_j denote the two points that are connected by edge (i,j).
Now update the global variable if max(d1,d2) < current value of global variable.

*If you are getting an error on test case 7 or higher, it is probably because you overlooked the value d2, and to see why it is necessary consider the test case (from TopCoder):

4
0 0
10 0
20 0
10 1
0100
1010
0100
0000

The answer should be: 20.000000
Runtime: O(N^3)


Saturday, December 7, 2013

USACO Training Problem: Cow Pedigrees

This is an interesting problem that can be solved using dynamic programming.
The problem can be found here:
http://cerberus.delos.com:790/usacoprob2?a=9HnzpXyw1Xh&S=nocows

This problem is hard because it is difficult to come up with a recurrence relation.

      First note that there can not be an even number of nodes in the tree. This is easy to see since every node must have an even number of children, but there exists a single root node that makes the node count odd.
     Secondly, we can make the problem simpler by just treating a group of 3 nodes as 1 node:
For example:
        @                                    @      
        /  \             becomes         /  
     @   @                             @
     /  \
  @   @

Now the reduced tree does not have to follow the condition that each node must have 0 or 2 children.
We can create a dp state (i,j) = x, such x represents the number of trees that have at most i levels and exactly j nodes. Note that we are applying our dp state to the reduced trees.

Let's say we have a tree with at most i levels and exactly j nodes, if we consider the two branches of the root node, we have two subtrees and thus two subproblems.

Let n1 = the number of nodes in the left subtree and n2 = number of nodes in the right subtree.
We have that n1 + n2 + 1 = j.

Also, both subtrees have at most i-1 levels.

Thus, in order to find the number of ways to construct a tree with state(i,j) we must find all combinations of left subtree constructions and right subtree constructions. We can use a loop for the number of nodes in the left subtree (denoted below as m) and the number of nodes in the right subtree is clearly j-m-1.

This gives the recurrence: dp(i,j)=sum(dp(i-1,m) * dp(i-1,j-m-1)) as m goes from 0 to j-1.

It is trivial that dp(i,0) = 1 for all i. Since with 0 nodes you can construct 1 with #levels<=i for all i.

Lastly, in the problem we must find the number of tree constructions given N and K, where N is the exact number of nodes and K is the exact number of levels. However we make a reduction first, so we must set N := N/2 and K := K-1.
Now, we cannot simply retrieve dp(K,N), since in the state (i,j), i represents the upper bound of the number of levels.
Thus, the number of constructions with exactly K levels is dp(K,N)-dp(K-1,N)
Runtime: O(N^2*K)