One thing that has always frustrated me about typical binary search routines
found in libraries is they always assume you can easily produce a copy of the
item you are searching for. What I mean is they typically follow this pattern,
int BinarySearch<T>(IList<T> list, T item);
which takes a list of some type, left open here, and searches for it in the
list. This is great if the very next thing you do next is insert item into
the list. You already have item in hand so this signature is
perfect. But if you
don't have a ready item, and item is difficult to
produce, you are stuck.
This happened to me just recently in my code. I had a
sorted list of a data structure representing a span of source, similar to a TextPointer from WPF, called a
TextRange. My version can quickly return the
offset of the beginning of the span but, since TextRange tracks source changes,
creating a new TextRange is expensive. When text in the editor is
change, I receive the location of
the change in an event. How do I figure out which one of my TextRanges was modified? Since
the list is sorted, the obvious choice is to use a binary search which yields O(Log N)
performance. Since my TextRanges are store in a List<TextRange>,
obviously I should
call List<T>.BinarySearch(). Unfortunately, TextRanges are expensive to produce;
too expensive for a per-keystroke operation (I am glossing over some details for
the sake of brevity; you need to trust me this is true). I have one of two choices, write my
own BinarySearch() or create a fake TextRange that is a special
TextRange that
is less expensive to create and only used for searching. Neither choice is
pleasant. Hand-rolled binary searches are not hard to right but, for some
strange reason, I always get them wrong the first time.
What I needed is prebuilt and tested version of BinarySearch() that looks more like,
int BinarySearch<T, K>(IList<T> list, K key, Converter<T, K> converter);
What this binary search routine does is find some key value in the list
given a function that can convert any list item into the key. In my example, I
would call it like,
int location = BinarySearch(textRangeList, location, TextRangeToLocation);
where TextRangeToLocation looks something like,
int TextRangeToLocation(TextRange range) { return range.Location; }
I don't need to produce a TextRange. I have the location I am looking for
handy, I just need to know what, if any, TextRange contains the
location. This method
would do just that. You can even use a lambda function in C# 3.0 to get this
all on one line,
int location = BinarySearch(textRangeList, location, (n) => n.Location);
This is starting to look like a good idea. Let's code it up. Here is the
method I ended up with,
publicstaticint BinarySearch<T, K>(IList<T> list, K value,
Converter<T, K> convert, Comparison<K> compare) {
int i = 0;
int j = list.Count - 1;
while (i <= j) {
int m = i + (j - i) / 2;
int c = compare(convert(list[m]), value);
if (c == 0)
return m;
if (c < 0)
i = m + 1;
else
j = m - 1;
}
return ~i;
}
I added an extra parameter, compare. This is required because I
didn't put any limits on what K can be and I need a way to compare
one K with another. Many types, such as int, already know how to
compare values of their own type. Typically types that are comparable, such as
int, string, etc., implement
IComparable<T>. We can accommodate such types by creating an
overloaded version of BinarySearch() that looks like,
publicstaticint BinarySearch<T, K>(IList<T> list, K value,
Converter<T, K> convert) where K : IComparable<K> {
return list.BinarySearch(value, convert, (n, m) => n.CompareTo(m));
}
which uses a lambda function to create a Comparison<T> for any
type that supports IComparable<T>. Beware that this doesn't
work well with reference types because it doesn't check for null.
You can make it safer by putting struct in the
where clause but that seemed overly restrictive to
me.
Now that I have a BinarySearch() that meets my needs better than the built-in
BinarySearch(), my next thought was I should evangelize the benefits my version
of BinarySearch() to the team responsible for maintaining the CLR
collections and, maybe, someday, I will have it as
a native part of the CLR; maybe, someday. Then I remember extensions methods. In
C# 3.0, with a slight modification to the above methods, they can appear as if
they were already part of the CLR.
publicstaticclass ListExtensionMethods {
publicstaticint BinarySearch<T, K>(this IList<T> list, K value,
Converter<T, K> convert, Comparison<K> compare) {
int i = 0;
int j = list.Count - 1;
while (i <= j) {
int m = i + (j - i) / 2;
int c = compare(convert(list[m]), value);
if (c == 0)
return m;
if (c < 0)
i = m + 1;
else
j = m - 1;
}
return ~i;
}
publicstaticint BinarySearch<T, K>(this IList<T> list, K value,
Converter<T, K> convert) where K : IComparable<K> {
return list.BinarySearch(value, convert, (n, m) => n.CompareTo(m));
}
}
My version of BinarySearch() now
appears as part of the methods of any type that implements IList<T>.
This allows us to write,
int location = textRangeList.BinarySearch(location, (n) => n.Location);
just like it was a native part of the CLR. No more hand-rolled
BinarySearch()s!
Now that we have a ListExtensionMethods class, what other missing methods
could we add? Here are a few that I thought would be useful,
publicstaticvoid Sort<T, K>(this List<T> list,
Converter<T, K> coverter, Comparison<K> compare) {
list.Sort((n, m) => compare(convert(n), convert(m)));
}
publicstaticvoid Sort<T, K>(this List<T> list,
Converter<T, K> converter) where K : IComparable<K> {
list.Sort((n, m) => convert(n).CompareTo(convert(m)));
}
publicstaticvoid Swap<T>(this IList<T> list, int index1,
int index2) {
var value = list[index1];
list[index1] = list[index2];
list[index2] = value;
}
publicstaticvoid Shuffle<T>(this IList<T> list) {
int count = list.Count;
Random r = new Random();
for(int i = count - 1; i > 1; i--)
list.Swap(i, r.Next(i - 1));
}
The Sort() methods above are allow you to extract a key for
sorting using the same methods as above. These are not strictly necessary since
constructing the lambda function I use is not that complicated but I believe they
make the code more readable if you are consistent in how you extract keys from a
list, so these are convenient wrappers that allow you to use the same methods to
sort and search. Swap() and
Shuffle() are throw-ins. They are really only useful for demo code
or, in my case, test code. I used them to test the Sort() method
which I used to test the BinarySearch() method.
This might be an unsupported scenario, not sure, but couldn't you just pass "null" as the item parameter to the binary search, and include any data in that's required in an IComparer<T> ?
As a little example: I was writing a river-of-news viewer, and had an sorted collection of items. I needed to find which item intersected at a given Y-coordinate, so I didn't have an item to locate with as such, but I found I could do:
int pos = _items.BinarySearch( null, new IntersectionComparer( ypos ) );
Where my comparer looks like the following (boring details omitted):
#region IComparer<Item> Members
public int Compare( Item x, Item y ) { if( x == null && y == null ) { throw new InvalidOperationException(); } else if( x != null && y != null ) { throw new InvalidOperationException(); } else if( x != null ) { int start = x.YPos, end = x.YPos + x.Height;
Very cleaver. I count this in the bucket of "fake" keys. Certainly null is a very cheap fake key. I find the compare method a bit complicated so I still probably wouldn't use this technique but it is certainly better than rolling your own binary search.
BTW, this wouldn't work, unfortunately, for struct keys (since null is not assignable to a struct) but they ususally don't suffer from the problem of being difficult to construct.
Keyed Binary Search
One thing that has always frustrated me about typical binary search routines found in libraries is they always assume you can easily produce a copy of the item you are searching for. What I mean is they typically follow this pattern,
which takes a list of some type, left open here, and searches for it in the list. This is great if the very next thing you do next is insert
iteminto the list. You already haveitemin hand so this signature is perfect. But if you don't have a readyitem, anditemis difficult to produce, you are stuck.This happened to me just recently in my code. I had a sorted list of a data structure representing a span of source, similar to a
TextPointerfrom WPF, called aTextRange. My version can quickly return the offset of the beginning of the span but, sinceTextRangetracks source changes, creating a newTextRangeis expensive. When text in the editor is change, I receive the location of the change in an event. How do I figure out which one of myTextRanges was modified? Since the list is sorted, the obvious choice is to use a binary search which yields O(Log N) performance. Since myTextRanges are store in aList<TextRange>, obviously I should callList<T>.BinarySearch(). Unfortunately,TextRanges are expensive to produce; too expensive for a per-keystroke operation (I am glossing over some details for the sake of brevity; you need to trust me this is true). I have one of two choices, write my ownBinarySearch()or create a fakeTextRangethat is a specialTextRangethat is less expensive to create and only used for searching. Neither choice is pleasant. Hand-rolled binary searches are not hard to right but, for some strange reason, I always get them wrong the first time.What I needed is prebuilt and tested version of
BinarySearch()that looks more like,What this binary search routine does is find some key value in the list given a function that can convert any list item into the key. In my example, I would call it like,
where
TextRangeToLocationlooks something like,I don't need to produce a
TextRange. I have the location I am looking for handy, I just need to know what, if any,TextRangecontains the location. This method would do just that. You can even use a lambda function in C# 3.0 to get this all on one line,This is starting to look like a good idea. Let's code it up. Here is the method I ended up with,
I added an extra parameter,
compare. This is required because I didn't put any limits on whatKcan be and I need a way to compare oneKwith another. Many types, such asint, already know how to compare values of their own type. Typically types that are comparable, such asint,string, etc., implementIComparable<T>. We can accommodate such types by creating an overloaded version ofBinarySearch()that looks like,which uses a lambda function to create a
Comparison<T>for any type that supportsIComparable<T>. Beware that this doesn't work well with reference types because it doesn't check fornull. You can make it safer by puttingstructin thewhereclause but that seemed overly restrictive to me.Now that I have a
BinarySearch()that meets my needs better than the built-inBinarySearch(), my next thought was I should evangelize the benefits my version ofBinarySearch()to the team responsible for maintaining the CLR collections and, maybe, someday, I will have it as a native part of the CLR; maybe, someday. Then I remember extensions methods. In C# 3.0, with a slight modification to the above methods, they can appear as if they were already part of the CLR.My version of
BinarySearch()now appears as part of the methods of any type that implementsIList<T>. This allows us to write,just like it was a native part of the CLR. No more hand-rolled
BinarySearch()s!Now that we have a
ListExtensionMethodsclass, what other missing methods could we add? Here are a few that I thought would be useful,The
Sort()methods above are allow you to extract a key for sorting using the same methods as above. These are not strictly necessary since constructing the lambda function I use is not that complicated but I believe they make the code more readable if you are consistent in how you extract keys from a list, so these are convenient wrappers that allow you to use the same methods to sort and search.Swap()andShuffle()are throw-ins. They are really only useful for demo code or, in my case, test code. I used them to test theSort()method which I used to test theBinarySearch()method.4:44 PM | Comments [2] | #Programming
05/25/2007 5:08 AM
This might be an unsupported scenario, not sure, but couldn't you just pass "null" as the item parameter to the binary search, and include any data in that's required in an IComparer<T> ?As a little example: I was writing a river-of-news viewer, and had an sorted collection of items. I needed to find which item intersected at a given Y-coordinate, so I didn't have an item to locate with as such, but I found I could do:
int pos = _items.BinarySearch( null, new IntersectionComparer( ypos ) );
Where my comparer looks like the following (boring details omitted):
#region IComparer<Item> Members
public int Compare( Item x, Item y )
{
if( x == null && y == null )
{
throw new InvalidOperationException();
}
else if( x != null && y != null )
{
throw new InvalidOperationException();
}
else if( x != null )
{
int start = x.YPos, end = x.YPos + x.Height;
if( _scrollPos < start )
{
return 1;
}
else if( _scrollPos > end )
{
return -1;
}
else
{
return 0;
}
}
else if( y != null )
{
int start = y.YPos, end = y.YPos + x.Height;
if( _scrollPos < start )
{
return -1;
}
else if( _scrollPos > end )
{
return 1;
}
else
{
return 0;
}
}
else
{
throw new InvalidOperationException();
}
}
#endregion
Stu Smith | http://www.feedghosst.com/ | stuAT NOSPAMfeedghost dot com
05/25/2007 12:15 PM
Very cleaver. I count this in the bucket of "fake" keys. Certainly null is a very cheap fake key. I find the compare method a bit complicated so I still probably wouldn't use this technique but it is certainly better than rolling your own binary search.BTW, this wouldn't work, unfortunately, for struct keys (since null is not assignable to a struct) but they ususally don't suffer from the problem of being difficult to construct.
Chuck Jazdzewski | www.removingalldoubt.com | chuckjAT NOSPAMmicrosoft dot com