2020-05-19 03:00:13 +05:30

8.6 KiB

Galloping

Standard merging procedure for merging two sorted arrays array_1: [10] and array_2: [1,2,3,4,6,9,14] goes as below:

CREATE GIF enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

enter image description here

As you can see we are consistently taking elements from array_2 until we reach 14. But can we do better? Yes, use galloping.

The idea of galloping is to perform exponential search to find a correct position, rather than comparing elements one by one.

For example, if we find the correct position of 10 in array_2 using exponential search, then it will perform better than this traditional procedure.

Does Galloping work in every situation?

Certainly no, when we are not achieving a sufficient index jump by performing exponential search, then we might end up having more comparisons than classical merge procedure. But what is a sufficient jump?

It is decided to be a constant 7, it is called minimum gallop(MIN_GALLOP). If index-jump is at least 7, then it is guaranteed to perform better than the traditional procedure.

Other than that we have a new variable called current_min_gallop, which is assigned to constant MIN_GALLOP(7) at the start of the algorithm.

We use galloping mode during the merge. To avoid the drawbacks of galloping mode, we perform actions as below:

  1. If we find that we are taking elements from one array more than or equal to current_min_gallop times consistently, then we enter into galloping mode.
  2. After entering into galloping mode, we continue to remain into galloping mode if and only if we find that we have a jump of more than or equal to MIN_GALLOP(7). Otherwise, we exit gallop mode. If galloping mode is a success, then we decrease current_min_gallop by 1 per one success. It promotes galloping mode. Note: In galloping mode we are using MIN_GALLOP constant to check for the success of galloping mode and to enter into galloping mode we are using current_min_gallop variable.
  3. After we exit galloping mode, we increase current_min_gallop by one, to discourage a return to galloping mode again. This is to make sure that next time we recover the lost of more comparisons.

So, we are trying to balance the whole situation by taking advantage of galloping. Sometimes, the value of current_min_gallop becomes so large that we never enter into galloping mode.

The whole procedure discussed above will be helpful in merge procedure. Now, we come back to pure galloping.

Note that we can do galloping(exponential search) from any side of the array, either left or right because we are just intended to find a position and that can be approached by doing exponential search from any side.

The starting position for the search is called a "$hint$". Sometimes it is better to search from the left and sometimes from the right. For the given array below,

enter image description here

If we want to find position for 3, then starting from index 0(hint = 0) is efficient, but if we are looking for position of 13 then starting from index $len-1$(hint = len-1) is more efficient. It is particularly efficient for big arrays.

Procedure for galloping:

  1. If key < data[base+hint], then it indicates that we should do galloping towards the left side because element at the starting position is greater than the key.
  2. Otherwise, we do galloping towards the right side.
  3. To do galloping. we first find the range for the key using the procedure we used in exponential search.
  4. At last, we do binary search over the range to find the correct position.

We have two types of galloping function gallopRight and gallopLeft, the main difference between them is, gallopRight and gallopLeft returns rightmost index and leftmost index respectively in case if there are equal elements.

For example, for the array given below,

enter image description here

If you find a position of 13 by using galloping, then gallopLeft will return 6, but gallopRight will return 9.

In simple mode, it is trivial to maintain stability, but to maintain stability while merging in galloping mode, we sometimes use gallopRight and sometimes gallopLeft. Just to get the basic idea, see the below example, it will be more clear when we will the merge procedure.

enter image description here

Now, if we are finding a position of run2[0] in run1, then we will use gallopRight, but if we are finding a position for run1[3] in run2, then we will use gallopLeft.

We have a slightly modified binary search methods for both of them.

Below is the implementation of both functions. It is simply a slightly modified exponential search.

// Returns k, 0 <= k <= n such that a[b + k - 1] <= key < a[b + k]
// Rightmost index in case of equal elements
int gallopRight(vector<int>& data, int key, int base, int len, int hint)
{
    int ofs = 1;
    int lastofs = 0;
    
    if (key < data[base + hint]) {
        int maxofs = hint + 1;

        // Gallop towards Left side
        // Find range for key using Exponentiation
        while (ofs < maxofs && key < data[base + hint - ofs]) {
            lastofs = ofs;
            ofs = (ofs << 1) + 1;
        }
        if (ofs > maxofs)
            ofs = maxofs;

        int tmp = lastofs;
        lastofs = hint - ofs;
        ofs = hint - tmp;

    }
    else {
        int maxofs = len - hint;

        // Gallop towards Right side
        while (ofs < maxofs && key >= data[base + hint + ofs]) {
            lastofs = ofs;
            ofs = (ofs << 1) + 1;
        }
        if (ofs > maxofs)
            ofs = maxofs;
        lastofs += hint;
        ofs += hint;
    }
    lastofs++;

    // Binary search over the range to find a position
    while (lastofs < ofs) {
        int mid = (lastofs + ofs) / 2;
        if (key < data[base + mid])
            ofs = mid;
        else
            lastofs = mid + 1;
    }
    return ofs;
}

// Returns k, 0 <= k <= n such that a[b + k - 1] < key <= a[b + k]
// Leftmost index in case of equal elements
int gallopLeft(vector<int>& data, int key, int base, int len, int hint)
{
    int ofs = 1;
    int lastofs = 0;
    if (key <= data[base + hint]) {
        int maxofs = hint + 1;
        
        // Gallop towards Left side
        // Find range for key using Exponentiation
        while (ofs < maxofs && key <= data[base + hint - ofs]) {
            lastofs = ofs;
            ofs = (ofs << 1) + 1;
        }
        if (ofs > maxofs)
            ofs = maxofs;

        int tmp = lastofs;
        lastofs = hint - ofs;
        ofs = hint - tmp;

    }
    else {
        int maxofs = len - hint;

        // Gallop towards right side
        while (ofs < maxofs && key > data[base + hint + ofs]) {
            lastofs = ofs;
            ofs = (ofs << 1) + 1;
        }
        if (ofs > maxofs)
            ofs = maxofs;
        lastofs += hint;
        ofs += hint;
    }
    lastofs++;

    // Binary search over the range to find a position
    while (lastofs < ofs) {
        int mid = (lastofs + ofs) / 2;
        if (key <= data[base + mid])
            ofs = mid;
        else
            lastofs = mid + 1;
    }
    return ofs;
}

Time complexity: It is same as exponential search, O(log(i)). But due to hint mechanism, overall it works better than traditional exponential search.