Stated slightly more formally, any key should be equally likely to hash to any of the M locations.
In practice, we can't check that a hash function satisfies this condition, since the probability distribution on the keys is usually not known. For instance, if the hash table is implementing the symbol table in a compiler, the compiler writer (who also writes the symbol table) cannot know for sure what kind of variable names will appear in each program to be compiled.
So heuristics are used to approximate this condition: try something that seems reasonable, and run some experiments to see how it works out.
You might also want to use application-specific information. For the symbol table example, you might use information about the variable names that people often choose. For instance, it might be common for programs to have variables such x1, x2, x3, etc. You would want the hash function not to collide on these names.
In fact, ideally a hash function should depend on all the information in the keys. As a simple example, suppose the keys are words from an English text. If you choose M = 26 (one location for each letter of the alphabet) and the hash function returns the first letter of the word (minus 1), then all words beginning with S, of which there are MANY, would hash to the same location, but almost none would hash to the location for X.
The division method often approximates the desired condition: h(k) = k mod M, where M is a prime.
This is even harder to achieve in practice than the earlier condition.
A good approximation is double hashing with one of these two schemes:
The load factor of a hash table with M entries and N keys in it is defined to be lambda = N/M.
Fact: The average length of each linked list is N/M = lambda.
Notice that for chaining lambda can be either smaller than, equal to, or larger than 1. We will consider the case when N might be larger than M, but not too much larger. Notice that as long as N is O(M), then lambda is O(1). But if N gets larger, for instance if N = M^2, then lambda is O(N), which is not constant.
Insert: Average time is O(1) (same as worst case), since you just compute h and then insert at the beginning of a linked list.
Unsuccessful Search: Average time is O(1 + lambda): O(1) time to compute h(k), then O(lambda) time (on the average) to scan the linked list at location h(k) until discovering that k is not in the hash table.
Successful Search: Average time is O(1 + lambda/2) (which is O(1 + lambda)): O(1) time to compute h(k). On the average, the key being sought will be in the middle of the linked list, so lambda/2 comparisons will be done until finding k.
Delete: This is essentially the same as successful search (assuming you never try to delete something that is not in the table).
Assume that the hash function ensures that each key is equally likely to have each permutation of {0, 1, ..., M-1} as its probe sequence.
Unsuccessful Search: O(1/(1-lambda))
Insert: Essentially same as unsuccessful search.
Successful Search: O((1/lambda)*ln(1/(1-lambda))), where ln is the natural log (base e = 2.7...).
Delete: Essentially same as successful search.
The reasoning behind these formulas requires more sophisticated probability than for chaining.
But we can do some simple sanity checks:
The time for searches should increase as the load factor increases.
For unsuccessful search: as N gets closer to M, lambda gets closer to 1, so 1-lambda gets closer to 0, so 1/(1-lambda) gets larger. At the extreme, when N = M-1, the formula 1/(1-lambda) = M, meaning that you will search the entire table before discovering that the key is not there.