wpe41.gif (23084 bytes)CIS3355: Business Data Structures
Fall, 2008
 

The Maximum/Average

Number of Binary Searches

A Binary search is a technique for searching an ordered list in which we first check the middle item and based on that comparison, we "discard" half the data. The same procedure is then applied to the remaining half until a match is found or there are no more items left.

In binary search, we first compare the key with the item in the middle position of the array. If there's a match, we can return immediately. If the key is less than the middle key, then the item sought must lie in the lower half of the array; if it's greater then the item sought must lie in the upper half of the array. So we repeat the procedure on the lower (or upper) half of the array.

Our FindInCollection function can now be implemented:

static void *bin_search( collection c, int low, int high, void *key ) {

        int mid;

        /* Termination check */

        if (low > high) return NULL;

        mid = (high+low)/2;

        switch (memcmp(ItemKey(c->items[mid]),key,c->size)) {

               /* Match, return item found */

               case 0: return c->items[mid];

               /* key is less than mid, search lower half */

               case -1: return bin_search( c, low, mid-1, key);

               /* key is greater than mid, search upper half */

               case 1: return bin_search( c, mid+1, high, key );

               default : return NULL;

               }

        }



void *FindInCollection( collection c, void *key ) {

/* Find an item in a collection

   Pre-condition: 

        c is a collection created by ConsCollection

        c is sorted in ascending order of the key

        key != NULL

   Post-condition: returns an item identified by key if

   one exists, otherwise returns NULL

*/

        int low, high;

        low = 0; high = c->item_cnt-1;

        return bin_search( c, low, high, key );

}

Points to note:

  1. bin_search is recursive: it determines whether the search key lies in the lower or upper half of the array, then calls itself on the appropriate half.
  2. There is a termination condition (two of them in fact!)
    1. If low > high then the partition to be searched has no elements in it and
    2. If there is a match with the element in the middle of the current partition, then we can return immediately.
  3. AddToCollection will need to be modified to ensure that each item added is placed in its correct place in the array. The procedure is simple:
    1. Search the array until the correct spot to insert the new item is found,
    2. Move all the following items up one position and
    3. Insert the new item into the empty position thus created.
  4. bin_search is declared static. It is a local function and is not used outside this class: if it were not declared static, it would be exported and be available to all parts of the program. The static declaration also allows other classes to use the same name internally.

static reduces the visibility of a function an should be used wherever possible to control access to functions!

Analysis

Each step of the algorithm divides the block of items being searched in half. We can divide a set of n items in half at most log2 n times.

Thus the running time of a binary search is proportional to log n and we say this is a O(log n) algorithm.

 

Binary search requires a more complex program than our original search and thus for small n it may run slower than the simple linear search. However, for large n,

Thus at large n, log n is much smaller than n, consequently an O(log n) algorithm is much faster than an O(n) one.

Plot of n and log n vs n .

http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/searching.htm#binary-search

In order to find the average number of comparisons for a binary search, consider the comparison tree. Take the average number of comparisons for a  successful search, find the total comparisons for successful searches and divide by the number of searches (= n). Count the number of branches leading from root to each leaf that terminates a successful search.

    1.    Height of tree = Maximum number of key comparisons possible (height = number of levels below root)

    2.    Height of tree is at most one more than average number of key comparisons because:

Levels of leaves can only differ by one, as size of lists when divided by algorithm can only differ by zero or one.

Also a tree has 2n leaves = n successful searches and n unsuccessful searches and the number of leaves in a tree expands by the power of two.

tree structure

Note the power of 2 = height of tree, therefore at 3rd level number of leaves = 23

Let height of tree = h, we know tree has 2n leaves, therefore if tree has leaves on same level 2h = 2n

If tree has leaves on two levels then 2h > 2n

(where h is smallest integer that matches the inequality)

Generally we can say 2h >= 2n

Taking logs of both sides (base 2) (i.e. given ay = x then logax = y)

if 2h >= 2n then h >= 1 + log2n

(log(2*n) becomes log(2) + log(n) and log22 = 1)

As n gets large the inequality

2h >= 2n tends to 2h = 2n

Therefore the average number of comparisons for a binary search is approximately

log2n + 1

www.onthenet.com.au/~grahamis/int2008/week05/lect05.html

www.csc.twu.ca/rsbook/Ch13/Ch13.1.html

www.tbray.org/ongoing/When/200x/2003/03/22/Binary