## Tuesday, June 4, 2013

### Finding the first occurence of a number in a sorted array.

Let us consider an array of numbers sorted in ascending order which can possibly contain duplicates. How do we find the left most/first occurrence of a given number?
The moment we read searching in a sorted array binary search should come to our mind. But Binary search has a limitation that it just finds out if a particular element exists in the array. If array contains duplicate entries, binary search finds out any matching index. But we want to find out the left most occurrence.

For example let us consider an array [1,2,2,3,4] and we have to find the element 2. The binary search finds out 2 at the index 2(assuming the array index starts with 0) and returns. But we actually have to return index 1 because it is the first occurrence.

One brute-force method is to find the occurrence of the element and keep moving left until we reach the first occurrence. But this approach is not effective. Let us consider an example to understand this.

If the input array contains just one element repeated n times. [1,1,1,1,1]. We want to find out the first occurrence of 1. First we find 1 at the index 2, then we have to traverse n/2 places backwards in this worst case. So the time complexity is O(n) which is as good as linear search.

Then how can we modify the binary search to solve this problem with O(log n) time complexity?

Modify the binary search algorithm such that if a matching entry is found, instead of returning, compare the middle element with the previous element. If they do not match, we found the first occurrence. Otherwise search the left half for the first occurrence. This approach guarantees to take O(log n) time to find that first occurrence. Below is the implementation of this approach in C++.

#include<iostream>

using namespace std;
//modified binary search method to find the first occurence
//of an element in the sorted array
int findFirst(int *array, int d, int l, int h)
{
while( l <= h)
{
int m = (l+h)/2;
//if element is found
if( array[m] == d )
{
//have we reached the beginning of the array
//or middle and it's previous elements are equal?
//search the left half
if( m > 0 && array[m] == array[m-1])
{
h= m-1;
}
else
return m; //we found the first occurence return it.
}
//following two conditions are similar to the binary search
else if ( array[m] < d)
{
l = m+1;
}
else
{
h = m-1;
}
}
return -1;
}

int main()
{
int n;
cin>>n;
int *array = new int[n];
int i;
for( i = 0 ; i < n ; i++)
cin>>array[i];
int x;
cin>>x;
cout<<x<<" found at "<<findFirst(array,x,0,n-1);
delete[] array;
return 0;
}