Dealing with a lookup table

Post #121,610

10/17/03 5:35:05 AM

I have a C program that needs to make big tables of sines and cosines of angles.

There are several ways to proceed.

1) Make one enormous static table that has global scope, big enough to handle the largest possible set of values. This has the advantage of speed, but it's possible that one would have overlapping simultaneous use of the table by differing functions, so this is ruled out.

2) Make a static table inside each function that needs one. This bloats the code but seems like the best compromise between speed and data isolation.

3) Dynamically create the tables when they are needed. This eats running time - the overhead of mallocing the tables adds about 5% of time over 2), but this is the safest method of all and results in smaller (=faster loading) code. Speed, however, is essential - a calculation may happen hundreds of times every second, and there will be parallel calculations going on. Does the overhead of loading the code match the overhead of dynamically creating the tables? A typical table is about 200x31 doubles.

What's the "standard" way of dealing with an enormous table?

Note, the functions that need these tables will eventually live in a shared library.

-drl

Post #121,621

by Simon_Jester

10/17/03 8:09:43 AM

Probably 1...but

how static is the info? If it's truly static, and all you're doing is lookups - data isolation isn't really going to be an issue. I'd either go with an external function and scope the data by file or simply go with shared memory. (It also sounds like performance is critical and I'd probably add mutexes and multithread the app.)

If the data is static with regard to only to a particular function (and different functions can be used at different times) your trade-off is going to be memory-use vs. access speed. Throw everything into a single large chuck of memory and access will be fast. Create the memory on the fly and memory use will be less, but you'll take a speed impact.

Post #121,622

by static

10/17/03 8:09:58 AM

Why is simultaneous access a problem?

Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed

Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please

-- "Anything but Ordinary" by Avril Lavigne.

Post #121,645

by jb4

10/17/03 10:51:16 AM

I echo the question of the other respondents:

Why would you rule out 1), if all you're doing is looking up data? (If you're also writing data, then I understand, but that wasn't what I inferred from your query.)

jb4
"There are two ways for you to have lower Prescription-drug costs. One is you could hire Rush Limbaugh's housekeeper ... or you can elect me President."
John Kerry

Post #121,652

by deSitter

10/17/03 11:04:08 AM

Re: I echo the question of the other respondents:

Yes, I misstated it - there are 10 tables of coefficients for rather complex time series that get updated under certain conditions - not on every throw in general, but at unpredictable intervals. They are "semi-permanent", in that they live far longer than the characteristic time interval of the simulation, but actually get recalculated 10s of times per second.

-drl

Post #121,687

by tuberculosis

10/17/03 1:27:36 PM

I vote for option 4

Which is - estimate number of tables you might have simultaneously, multiply by some safety factor, preallocate these chunks of memory (statically if you like) and then manage their use yourself. You save the mallocs while still maintaining isolation. IOW, create a pool of tables - make each member of the pool the size of the largest table.

For extra fun - build it so it dynamically allocates more tables if the pool runs out but never free any of them - just mark them for reuse.

In Java, you can't escape the creepy feeling.

--James Gosling

Post #121,758

by deSitter

10/17/03 9:15:28 PM

Re: I vote for option 4

That's a excellent idea.

This is why I am a shitty programmer - that solution is obvious in hindsight.

-drl

Post #121,913

by boxley

10/19/03 8:53:31 PM

Split the two functions

one module that makes and maintains the tables another that reads and sends in updates. Call the first from the second when needed. The first can then control locking and memory.
the non programmer answer.
thanx,
bill

"You're just like me streak. You never left the free-fire zone.You think aspirins and meetings and cold showers are going to clean out your head. What you want is God's permission to paint the trees with the bad guys. That wont happen big mon." Clete
questions, help? [link|mailto:pappas@catholic.org|email pappas at catholic.org]

Post #122,765

by tablizer

10/24/03 10:56:52 PM

Dammit, use relational tables

I remember having this kind of issue in a Fortran program in my early years. I hugged Xbase in that it no longer was a worry as long as I had the disk-space. App languages are usually shitty at large structure handling.

Simply make a table with a column for each "dimension". Example:

Table: TrigTable
-----------------
rowID (optional)
SinValue
CosineValue
theValue

Need more dimensions? No problem:

Table: 3Dspace
--------------
X
Y
Z
theValue

Simple Dimple! You do-everything-with-app-code people need a cluestick.

________________
oop.ismad.com

Post #122,776

by admin

10/25/03 12:20:43 AM

Again, read the problem description.

He's recalculating this stuff 10s of times per second. Put it in tables and he's going to pay both file and index maintenance overhead when all that's really needed is an array.

You do-everything-with-tables people need a cluestick.

Regards,

-scott anderson

"Welcome to Rivendell, Mr. Anderson..."

Post #122,823

by tablizer

10/25/03 5:08:07 PM

not clear on environment of this thing

"a calculation may happen hundreds of times every second"

I wasn't sure this meant it needed "real-time" response or not. Is it a large batch job, or something like real-time image recognition?

Most desktop DB engines cache a lot of stuff in RAM, I would note. It only starts to becomes disk intensive when it runs out of available RAM and can't use indexes nor the natural order of rows that fit the grain of requests. But, at least it does not crash at that point.

Does he need something that "degrades gracefully", or is it predictable-speed-or-bust?

________________
oop.ismad.com

Post #122,826

by deSitter

10/25/03 5:38:08 PM

Re: not clear on environment of this thing

They are planets - a lot of them might be moving around at once, and you might have several simulations going at once. Every planet needs a separate set of table of sines and cosines as functions of time for each instance. You don't want to recalculate these tables constantly, rather caclulate them for two times that are "close" with respect to the approximation and then linearly interpolate them, which is much faster than doing 31*19 sines and cosines for every planet and then slogging through a perturbation table with hundreds of coefficients.

So Todd's idea was the best one, you create a pool of tables and then leave them lying around until memory needs to be reclaimed for whatever reason. This is a middle ground between just hogging a buttload of memory with static tables vs. mallocing and freeing them over and over again.

The perturbation tables come from a spectral analysis of the output of JPL's DE404 numerical integration of the Solar System - so they are extremely accurate over a specific interval. 1/4 to 1/2 second of arc in 4500 years is realistic, even for Pluto. (It's an amazing thought that the exact details of the motion of the Earth, Moon and planets over a 4500 year interval can be modeled in a few hundred K of C code.)

-drl

Post #122,846

by tablizer

10/26/03 1:52:10 AM

You'll have to wait 4500 yrs to know if you did it right :-)

________________
oop.ismad.com

Welcome to IWETHEY!