Data Structures
A while ago, I wrote a
href="http://scienceblogs.com/goodmath/2009/05/finally_finger_trees.php">couple
of posts that claimed to talk about finger trees. Unfortunately, I really
botched it. I'd read a bunch of data structure papers, and managed to get
myself thoroughly scrambled. What I wrote about was distantly related
to finger trees, and it was useful to help understand how fingertrees work -
but it was not, in any way, shape, or form, actually a description of
fingertrees. Since then, I've been meaning to write a proper post explaining
finger trees - but with the work on my book, and…
From the April Communications of the ACM, the Kode Vicious column is on The Data-Structure Canon.
The reader question is:
In most areas of science there are a few basic underlying laws that inform the rest of the study of a given subject. Physics, chemistry, and electrical engineering all have these basic equations. What are the basic equations in computer science? Or is computer science baseless?
In other words, what's the fundamental intellectual basis of computer science? Well, according to KV, it's data structures!
If there were any set of basics that I wanted to hammer into software…
In the Haskell stuff, I was planning on moving on to some monad-related
stuff. But I had a reader write in, and ask me to write another
post on data structures, focusing on a structured called a
zipper.
A zipper is a remarkably clever idea. It's not really a single data
structure, but rather a way of building data structures in functional
languages. The first mention of the structure seems to be a paper
by Gerard Huet in 1997, but as he says in the paper, it's likely that this was
used before his paper in functional code --- but no one thought to formalize it
and write it up. (In the…
As an alert commenter pointed out, I left out one really important thing in
my earlier post about finger trees. That's what I get for trying to write when
I'm sick :-). Seriously, this is a point that's implied by the post as it stands, but never explicitly stated - and since it's really important, it
should be stated explicitly.
The monoid-annotated tree structure doesn't replace the original
data structure: it's superimposed on it.
So, as I said: cons-cell style lists are ubiquitous and beloved by
functional and non-functional programmers. In finger trees, you're
not getting rid of them…
For ages, I've been promising to write about finger trees. Finger trees are
an incredibly elegant and simple structure for implementing sequence-based
data structures. They're primarily used in functional languages, but there's nothing
stopping an imperative-language programmer from using them as well.
In functional programming languages, lists are incredibly common. They show up everywhere; they're just so easy to use for recursive iteration that they're ubiquitous. I can't think of
any non-trivial program that I've written in Haskell, Lisp, or OCaml that doesn't use lists. (Heck, I've…
This post is very delayed, but things have been busy.
I'm working my way up to finger trees, which are a wonderful
functional data structure. They're based on trees - and like many
tree-based structures, their performance relies heavily on the balance
of the tree. The more balanced the tree is, the better they perform.
In general, my preference when I need a balanced tree is a red-black
tree, which I wrote about before. But there are other kinds of balanced trees,
and for finger trees, many people prefer a kind of tree called a 2/3 tree.
A two-three tree is basically a B-tree with a…
Last week, I promised to continue my discussion of ropes. I'm going to
break that promise. But it's in a good cause.
If you're a decent engineer, one of the basic principles you should
follow is keeping this as simple as possible. One of the essential skills
for keeping things simple is realizing when you're making something
overly complicated - even if it's cool, and fun, and interesting.
Working out the rope code for more complex operations, I found myself
thinking that it really seemed complex for what I was doing. What I'm
doing is writing myself a text editor. I'm not writing a string…
It's been a while since I've written about any data structures. But it just so happens that I'm actually really working on implementing a really interesting and broadly useful data structure now, called a Rope.
A bit of background, to lead in. I've got this love-hate relationship with
some of the development tools that Rob Pike has built. (Rob is one of the Unix
guys from Bell labs, and was one of the principal people involved in both the
Plan9 and Inferno operating systems.) Rob has implemented some amazing
development tools. The two that fascinate me were called Sam and Acme. The
best and…
Another cool, but frequently overlooked, data structure in the tree family is called the B-tree. A B-tree is a search tree, very similar to a BST in concept, but optimized differently.
BSTs provide logarithmic time operations, where the performance
is fundamentally bounded by the number of comparisons. B-trees also
provide logarithmic performance with a logarithmic number of
comparisons - but the performance is worse by a constant factor. The
difference is that B-trees are designed around a different tradeoff. The B-tree is designed to minimize the number of tree nodes that need to be…
Last post, I described the basic idea of the binary heap data
structure. But I didn't talk about how to implement it - and there's
a very cool way of implementing it - optimally space efficient,
very fast, and with respectable cache locality for moderate sized
datasets.
The idea is to use a compact, array-based implementation of a binary tree. If we put the nodes in a graph into an array in the order in which they would by traversed by a breadth-first traversal of the tree, then you'll get a
representation of a graph in which the children of the node at position N in the tree will be…
One of the most neglected data structures in the CS repertoire is
the heap. Unfortunately, the jargon is overloaded, so "heap" can mean
two different things - but there is a connection, which I'll explain
in a little while.
In many programming languages, when you can dynamically create
objects in memory, the space from which you allocate the memory to
create them is called the heap. That is not what I'm talking about.
What I am talking about is an extremely simple but powerful structure
that is used for maintaining a collection of values from which you want
to be able to quickly remove the…