Fundamental Java Classes with Generics
Vlad Patryshev, 02/07/2005 (with an 2007 update for version 6)
Introduction
Java already has a rich library that deals with abstract types like Set, List
(aka tuple), Map; and it seems like there is not much need to add more.
Nevertheless, for years I've been adding classes and methods to my private library.
Then Java generics went out of the closet - and I have discovered that
with generics one can make miracles... I mean, develop and start using
really generic and mathematically sound classes. So I refactored my
library to generics, and here it is. Below I will use Java features
and classes from version 6.
Collections Revisited
Users, both beginners and those that are too busy to upgrade, often mix
interfaces and abstract classes from Collections framework with their concrete
implementations that combine abstract ideas with specific strategies: for
instance, List has two implementation strategies, LinkedList and ArrayList.
The fact that it is linked or array-ed, has nothing to do with its core
functionality; implementations can vary, and what's important from the
client point of view is that they are all Lists - and another thing is
important: performance; but we definitely need to separate performance
concerns from presentation.
So, having this in mind, here's
the current hierarchy of fundamental classes in Java:
Iterable
It is an abstraction of an enumerable entity; all it has is iterator() that
returns an Iterator to scan through components. It does not mean that elements
are in a certain order - there is a big conceptual difference between having
a certain order and an existence of an order in which elements could be listed,
with none of such orders being an attribute of the entity.
We cannot know the number of elements in an Iterable; in principle, we could
define boolean isEmpty() as
{ return iterator().hasNext(); }, but in many cases this definition
can be costly. For instance, the Iterable can be a resource on the internet,
each next() returning a byte - checking its emptiness would
involve opening a connection. It would be interesting, on the other hand,
to to have Iterable File and URL readers.
It would be interesting to find out whether there is an analog of Iterable
in mathematics.
Collection
It is an Iterable that has knows its size, can have methods for adding and
deleting elements, check whether an element is a member of the collection,
and which can be converted to an array - using the fact that its size is
known in advance. No exact match in mathematics; a bag, or a multiset,
can be considered as a model if we could take Collections out of the
hierarchy and treat them as a separate entity.
List
It is a Collection where elements have a linear order. List is known as
Sequence in IDL, and as tuple in mathematics. It has the first element,
the last element, and can have methods for inserting and deleting elements.
Two lists are declared equal if they have the same size and all its consecutive
elements are equal.
Set
It is a Collection where elements do not repeat. Sets are modeled after sets in
Set Theory: two sets are equal if they contain the same elements (Extensionality
Axiom). Sets do not have any additional operations; the only additional
method equals() makes them distinct from other possible types
of collections.
SortedSet
It is a special kind of Set that is linearly ordered. SortedSet
in Java extends Set - which basically means that order is used only for storing
data. While two SortedSets that have the same elements but a different comparator
are equal(), it is impossible to addAll() from one
SortedSet to another unless their comparators are equal().
Map
Map represents a function defined on a set and taking values of a
certain type. Technically speaking, a Map is fully determined by a Set of
its key-value pairs. A Map can return a value for a specified key, the whole
set of its keys, keySet() and a collection of its values; this collection is
neither a set (can have repeating values) nor a List (order does not matter).
A Map can have methods for adding and removing key-value pairs.
Foundation Classes
The classes reviewed above were enough for Java Before Tiger. Now,
with generics, more can be offered to effectively and efficiently write
Java code using collections and avoiding 'for' loops and plain arrays,
two legacy constructs that we inherited from Fortran and C. Below I'll
talk about two foundation classes and three utility classes with a bunch
of static methods for dealing with collections. The most fundamental notion
in mathematics is a...
Function
In Java, we can define interface Function<X,Y>: it has
just one method, Y apply(X x). An objectivised function can be
handled now as an entity: we can compose and invert functions, we can apply
them to collections, we can instantiate functions from other objects.
There is a big difference between a Function and a Map: a Function is defined
on a type, and a Map is defined on a Set. If we have a Map m,
we can define a Function function(m): it will return
m.get(x) for any X x, but to get a Map
from a Function, we will need a Set of keys. Also, equals()
is defined for Maps, but for Functions it is very impractical to define
equals(): we would need to scan through the whole range of
type values to check whether two Functions are equal.
Actually, in the package com.myjavatools.lib.foundation,
Function.java is not an interface, but an abstract class with one abstract
method, apply(). Other methods provide the functionality I
wrote above: function factory
static Function<X,Y> function(Map<X,Y> map);
Map<X,Y> toMap(Set<X> keys)
- a Map view of the function, with Set<X> keys as keys;
Map<X,Y> toMap(Collection<X> keys).
List<Y> apply(List<X>)
- this one returns a List view of function values on the provided list.
Notice, it is a view, and the user will want to use a copy constructor
to avoid recalculating function values every time an element is retrieved;
on the other hand, the user is free to deal with this List view, and avoid
creating new objects. Saving memory does not seem to be a big issue these
days, but memory management, creation and disposal of objects may take too
many resources.
Similar methods Iterable<Y> apply(Iterable<X>) and
Iterator<Y> apply(Iterator<X>) apply a Function to
an Iterable and an Iterator.
Function has two methods for composition,
one is static, to apply this function after the parameter function:
f.compose(g) is a function that returns
f.apply(g.apply(arg)); another one is a static method,
which looks closer to classical mathematics:
<X,Y,Z> Function <X,Z> compose(Function<X,Y> f, Function<Y,Z> g)
Filter
A Filter, a.k.a. predicate, is a Function<X,Boolean>, that is, a function
that returns logical values. One could argue regarding overusing "boolean",
but Java does not give much of a choice.
Filter has an alias for apply(): accept().
Two methods in this class do the actual data filtration, the same way grep
function does: Iterator<T> filter(Iterator<T>) and
Iterable<T> filter(Iterable<T>). The latter is, of
course, a view Iterable, not a new array or something; the user is free to
create a "hard copy" of the filtered Iterable, when necessary.
Pair
This is an implementation of Map.Entry<X,Y> interface,
only key and value are named left
and right components; with the addition of swap()
method. Theoretically, Pair<X,Y> could serve as an alternative to
List<X> implementation of the notion of tuple from mathematics.
Utility Foundation Classes
Iterators
This class contains several static
cat() methods that concatenate collections. Namely,
cat(Arrays.asList("abc", "def"), Arrays.asList("ghi", "jkl"))
will return a virtual Collection that contains the four strings. This method
concatenates Collections; similarly one can concatenate Iterables or Iterators.
Maps
This class contains two groups of methods in this class: creational and functional.
Creational methods build maps from materials that the user supplies:
Map<X,Y> toMap(X key1, Y value1, X key2, Y key2)
and the like will create a map that contains just these two keys;
you can pass an array with keys and values (odd elements are keys,
even elements are values), and toMap() will create a
Map based on this array. Another toMap() takes a vararg array
of Map.Entry instances and creates a virtual Map based on this array.
Neither of these methods cares to copy arrays; they are all just views.
There is a caveat regarding using these creational methods: it is your
responsibility to guarantee that all keys are distinct; it is a contract
of Map interface that keys in the map are all distinct, and can form a Set.
None of the creational methods checks whether keys are distinct. Generally
speaking, no existing implementation of Set or Map interface, except probably
UnmodifiableMap and UnmodifiableCollection, can guarantee this distinction.
When you add elements to a Set, they are all distinct - but what if you change
the elements later? There are no observers to react to such changes.
The purpose of functional methods is to implement standard or almost standard
operations on Maps. Map<X,Z> compose(Map<X,Y>, Map<Y,Z>)
returns a virtual Map which is the two Maps' composition. Four map()
methods apply a Map to a List, a Collection, an Iterable and an Iterator respectively.
Map<X,Y> restrict(Map<X,Y>, Collection<X>) restricts a
Map to a Collection.
The following functional methods do create
Set or Map instances. So far there is no factory method that would return
the right Map or Set instance; I just use LinkedHashMap and HashSet as an
ultimate container choice. This area definitely need some development.
Set<X> resolve(Map<X,Y> map, Y y) returns a Set of X
such that map.get(x).equals(y).
Map<Y,X> inverse(Map<X,Y>)
returns a new map that is inverse to the original map; if none exists,
an exception is thrown. Another method, revert(Map<X,Y>)
revert a Map even if it is not an one-to-one. It returns, for a
Map<X,Y> f it returns a
Map<Y,Set<X>> g such that for each Y y in
g.keySet() the value of g.get(y) is a
Set<Y> of all those X x that
f.get(x).equals(y).
Example of Usage
In the following sample program the files in the specified directory are
grouped by their types, as defined in the provided list.
package com.myjavatools.lib.foundation;
import java.io.*;
import java.util.*;
public class Sample1 {
public static void main(String[] args) {
String folderName = args.length < 1 ? "." : args[0];
Map<String,String> fileToType =
Maps.toMap("gif", "image",
"jpg", "image",
"jpeg", "image",
"png", "image",
"java", "source code",
"cpp", "source code",
"hpp", "source code",
"class","binary",
"obj", "binary",
"exe", "binary",
"dll", "library",
"lib", "library",
"so", "library",
"sl", "library");
Function<File,String> extension = new Function<File,String>() {
public String apply(File file) {
String name = file.getName();
return name.substring(name.lastIndexOf('.') + 1);
}
};
// the function returns file type for a file
Function<File,String> fileType = Function.function(fileToType).compose(extension);
// the list of files in the folder
List<File> contents = Arrays.asList(new File(folderName).listFiles());
// the same files grouped by their file types
// only during this operation a new container is created.
Map<String, Set<File>> filesGroupedByType =
Maps.revert(fileType.toMap(contents));
for (String type : filesGroupedByType.keySet()) {
System.out.println(type + ":");
for (File file : filesGroupedByType.get(type)) {
System.out.println(" " + file);
}
}
}
}
Conclusion
Although these simple classes are very versatile, and very non-intrusive
if used correctly, that is, in a natural and simple way, I have a feeling
that this is only the beginning of a long way to switching from XIX century
programming style - "matrices with indexes" to the more advanced "operator" style.
Here are the links:
documentation,
source code jar,
binary jar,
the full archive: source, docs, tests, project file for JBuilder.
Everything is free, as in free thought.
I am thankful to the great Borland's JBuilder team for creating the beautiful JBuilder 2005 that I've been using while developing this.
Yes, I was a Borland employee for 7 years - but this seems to be the first time I thank my colleagues.