at least on the nanosleep function is the bug listed in the man pages:
BUGS
The current implementation of nanosleep is based on the normal kernel timer mechanism, which has a resolution of 1/HZ\\ s (i.e, 10\\ ms on Linux/i386 and 1\\ ms on Linux/Alpha). Therefore, nanosleep pauses always for at least the specified time, however it can take up to 10 ms longer than specified until the process becomes runnable again. For the same reason, the value returned in case of a delivered signal in *rem is usually rounded to the next larger multiple of 1/HZ\\ s.
As some applications require much more precise pauses (e.g., in order to control some time-critical hardware), nanosleep is also capable of short high-precision pauses. If the process is scheduled under a real-time policy like SCHED_FIFO or SCHED_RR , then pauses of up to 2\\ ms will be performed as busy waits with microsecond precision.
But this doesn't seem to account for the 20 ms limitation you're seeing, except potentially on low end hardware. Could it be that the loop is increasing the delay up to 20 ms; but when the processor is getting longer delays, branch prediction is able to speed the loops? This is mearly speculation on my part, but I thought I'd throw it out there.