Some nits with A, B, C.
Classes A, B and C are a bit more than you mention...
Every IP address belongs to some network. Some portion of address denotes a network, the rest is address of host in that network.
The size of network portion depends on the bit pattern at the beginning of it:
>>>>>>>>>>>>>>>>>>>>>>>>>> (from [link|http://www.webopedia.com/DidYouKnow/2002/March/IPaddressing.html|Webopedia])
Class A Network -- binary address start with 0, therefore the decimal number can be anywhere from 1 to 126. The first 8 bits (the first octet) identify the network and the remaining 24 bits indicate the host within the network. An example of a Class A IP address is 102.168.212.226, where "102" identifies the network and "168.212.226" identifies the host on that network.
Class B Network -- binary addresses start with 10, therefore the decimal number can be anywhere from 128 to 191. (The number 127 is reserved for loopback and is used for internal testing on the local machine.) The first 16 bits (the first two octets) identify the network and the remaining 16 bits indicate the host within the network. An example of a Class B IP address is 168.212.226.204 where "168.212" identifies the network and "226.204" identifies the host on that network.
Class C Network -- binary addresses start with 110, therefore the decimal number can be anywhere from 192 to 223. The first 24 bits (the first three octets) identify the network and the remaining 8 bits indicate the host within the network. An example of a Class C IP address is 200.168.212.226 where "200.168.212" identifies the network and "226" identifies the host on that network.
Class D Network -- binary addresses start with 1110, therefore the decimal number can be anywhere from 224 to 239. Class D networks are used to support multicasting.
Class E Network -- binary addresses start with 1111, therefore the decimal number can be anywhere from 240 to 255. Class E networks are used for experimentation. They have never been documented or utilized in a standard way.
<<<<<<<<<<<<<<<<<<<<<<<<<<
So, as we see, all addresses on the Internet belong to an A, B or C network. Certain network inside A space (as well as others inside B and C space) are declared unroutable, never to appear on "real" internet. 10.x.x.x is a class A network that's reserved for this purpose. Another A network, 127.x.x.x is reserved for loopback addresses (so you can have 127.0.0.1 as well as 127.0.0.2 and so on)
However, as Internet grew, it turned out that the smallest network, class C, is not small enough, and there are not enough of them. On the other hand, class A was too big, and there are too many of them.
That's how the concept of subnet was born. We create a bit mask that tells us to interpret some part of host number in IP address as a subnet. Say, we have a class B address 168.20.x.x . Normally, our hosts would have to assume that any host with the address that follows this pattern is on local network. But, since a LAN with 64K devices on it is unwieldy, we introduce 256 subnets, 256 devices each. Now, our netmask is 255.255.255. Each device has to know its netmask to determine whether to send a packet to a gateway or put it directoy on the wire.
Note that subnet masks are not arbitrary. For example, with addresses like 168.20.x.x, you cannot have a mask of 255.0.0.0 . Those addresses are class B, so first 2 octets have to be a network for the benefit of "big" internet routers (they keep tracks of routes to networks according to class A, B and C, they don't know anything about subnets).