One line description of data model for Perl

11/22/04 10:37:48 AM

You're not helping

I already know what I don't know :)

Post #184,764

by ChrisR

11/22/04 10:45:04 AM

Perl: Everything is...

...just one possible way, out of many, to do something.

(VB: Everything is just one screwed up mess.)

Post #184,765

by broomberg

11/22/04 10:48:39 AM

Over simplifying

Perl often maps to human language. Created by a linguist, not a Comp-Sci guy. So I don't think you CAN have a single line like that.

You can say MOST things are scalars, which are individual data elements that are usually a strings, but can also be numbers or references (not quite pointers, but very close).

You then have GLOBS and FILE HANDLES, which I decline to go into detail.

You then have arrays and hashes.

Pretty much everything else is built on these.

And yes, since this is off the top of my head, I've probably missed some stuff.

Post #184,838

by tablizer

11/22/04 8:57:08 PM

linquistic versus data personalities

Perl often maps to human language. Created by a linguist, not a Comp-Sci guy. So I don't think you CAN have a single line like that.

It seems that some developers tend to think in terms of data structures and others in terms of languistics. Perl is probably the further side of the linguistic side. I lean toward the data structure personality, with tables being my fav "data structure". OO fans tend to be liguistical in my experience.

________________
oop.ismad.com

Post #184,834

by static

11/22/04 8:05:46 PM

To pick up from what Barry said.

Perl: everything is whatever you need it to be.

Wade.

Is it enough to love
Is it enough to breathe
Somebody rip my heart out
And leave me here to bleed

Is it enough to die
Somebody save my life
I'd rather be Anything but Ordinary
Please

-- "Anything but Ordinary" by Avril Lavigne.

Post #184,848

11/22/04 11:17:13 PM

No, everything is whatever Barry needs it to be

DWBM. Do what Barry (Ben?) wants.

Can I have a list of typeglobs? A hash of file handles? I can find answer to these questions on my own, thank you very much. The problem is, I don't get the logic. Next time I have a question, I need to go look things up again. I don't have this problem with almost any other language (Forth and APL would be exceptions, but I never tried to reallyprogram in either).

Post #186,002

12/4/04 5:16:08 AM

The logic

The elements of an array are scalars. The values of a hash are scalars. Anything that you can put into a scalar can be an element of an array or a hash value. Since you can store a typeglob or a file handle in a scalar, the answer to both of your questions is "yes".

Note that I said the values of a hash. The keys of a hash are always strings. If you use a reference to a complex data structure as a hash key, it won't store the data structure in the hash, just a string identifying the data structure. Depending on how you use the hash this may or may not matter to you. Usually ont. (It would matter if you're using the keys of the hash to try to remove duplicates, you'll remove duplicates but you'll also stringify things.)

Cheers,
Ben

I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)

Post #186,016

12/4/04 12:06:16 PM

Keys...

The keys of a hash are always strings. If you use a reference to a complex data structure as a hash key, it won't store the data structure in the hash, just a string identifying the data structure. Depending on how you use the hash this may or may not matter to you. Usually ont. (It would matter if you're using the keys of the hash to try to remove duplicates, you'll remove duplicates but you'll also stringify things.)

You can also use the Tie module, to store other things as the key.

Post #186,021

12/4/04 4:54:08 PM

Tie is NOT a module (and it sucks)

It is a built-in to the language allowing an object with the right methods to masquerade as a native data structure. To get the object with the right methods you would write a module, but the tie itself is not implemented as a module. People get all worked up about this, but it is really a simple concept. The one that you're thinking of would be written like this (untested):

\n  package Tie::Original::Keys;\n  use strict;\n\n  sub TIEHASH {\n    my $class = shift;\n    my $self = bless {\n      hash => {},\n      original_key => {},\n    };\n    while (@_) {\n      $self->STORE(splice @_, 0, 2);\n    }\n    return $self;\n  }\n\n  sub FETCH {\n    my $self = shift;\n    my $key = shift;\n    return $self->{hash}{$key};\n  }\n\n  sub STORE {\n    my $self = shift;\n    my $key = shift;\n    my $value = shift;\n    $self->{hash}{$key} = $value;\n    $self->{original_key}{$key} = $key;\n  }\n\n  sub DELETE {\n    my $self = shift;\n    my $key = shift;\n    delete $self->{hash}{$key};\n    delete $self->{original_key}{$key};\n  }\n\n  sub CLEAR {\n    my $self = shift;\n    $self->{hash} = {};\n    $self->{original_key} = {};\n  }\n\n  sub EXISTS {\n    my $self = shift;\n    my $key = shift;\n    return exists $self->{hash}{$key};\n  }\n\n  sub FIRSTKEY {\n    my $self = shift;\n    # reset each() iterator\n    my $a = keys %{$self->{hash}};\n    return each %{$self->{hash}};\n  }\n\n  sub NEXTKEY {\n    my $self = shift;\n    return each %{$self->{hash}};\n  }\n\n  sub SCALAR {\n    my $self = shift;\n    return scalar %{$self->{hash}};\n  }\n\n  1;\n

after you've done all of that you can write things like:

\n  tie my %is_seen, 'Tie::Original::Keys';\n  $is_seen{$_} = 1 for @non_unique;\n  my @unique = keys %is_seen;\n

and the stringification issue that I named is gone.

That's well, fine and dandy. But there are a host of problems with it.

Using tie has a huge performance overhead. Here is a much faster solution to the above problem:
```
\n  my %original_key;\n  $original_key{$_} = $_ for @non_unique;\n  my @unique = values %original_key;\n
```
So you see that the technical note about keys is just that, a technical note. Sometimes you need to know it, but if it matters to you, it is easily worked around.
Using tie is horribly confusing to people. That is because the language sets up strong expectations that native data structures really are native data structures, and now you violate those expectations. Without those expectations it would be a very simple idea, with those expectations there is considerable surprise.
Historically tie has been a bit buggy. A fair fraction of the bugs that I know of in Perl involve tie in one way or another. That is because of the implementation, where every internal function in the API checks for whether a data structure has "magic" associated with it, and if it does it does something based on that magic. So the implementation is scattered throughout procedural code. And there are bugs. (It should work pretty well now though.)

Now one personal note. Many people seem to think that tie is somehow "very cool" and is a sign of really interesting stuff being exposed from within Perl. It isn't.

Tie is a band-aid for a self-inflicted wound.

Perl goes through a lot of work to make a specific set of data structures available to you. It is a well-chosen set; it is surprisingly hard to find an algorithm in which the naive implementation in Perl using those data structures (particularly hashes where they make sense) is not the same as the sophisticated algorithm. (Different constants though.) However sometimes they are not what you need. When they are not, then you have to rewrite a lot of code to be able to get custom data structures (aka objects) that do exactly what you want to. Or else you can use tie and avoid a lot of the rewriting.

For an example of how to solve the same problem by not creating it in the first place, see Ruby. Everything is an object. The objects for native data types are accessed in the same way that your user objects are. If you want something that is the same as a native data type only slightly different, you can just write your own object for it. If you want something that is the same as a native data type but has extra capabilities that is also easy - just have a new method. (In Perl you wind up having to write crap like tied(%foo)->some_method_call(); Ugh.)

Now you can object that a native data type has a lot of behaviour. Implementing all of that is hard. Sure you could, but would you in practice?

Well it turns out that you only have to do the heavy lifting once. Ruby supports mixins, that make it possible to support a complex API from a simple one. So you can exactly parallel what Perl does. You implement several base methods, mixin the right class, and now you support the full API of a native datatype. Only it doesn't look like magic when you do it, because the idea fits into the language as a whole rather than being a hacked-on piece of "magic".

Cheers,
Ben

I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)

Post #184,846

11/22/04 10:40:00 PM

Well, Lisp - everything is a list or an atom...

Perl, everything is a atom, a list, or a hash. *

Except for typeblobs, file-handles, etc.

That said, I tend to think that Perl's datastructures are primarily implemented as pointers. (or references, if you prefer). Read in a file, the pointer to each line being stored off into an array. Want to reverse the file? Reverse the array of pointers. Want to split a line? Take a line and create pointers to each of the words in the array.

The kewl part is that you never had to work with the pointers directly.

Post #184,847

11/22/04 11:14:04 PM

Atonm, list, hash...

And what the hell are references? And why can't I have a list of lists, only a list of references to lists? Or can I?

Post #184,849

by broomberg

11/22/04 11:30:35 PM

References are essentially pointers

but with a bit of built in smarts.

How about you describe the type of data structure you are trying to implement and we go from there.

Post #184,857

11/23/04 7:16:43 AM

I am not trying to build anything in particular just now

I am trying to grok Perl.

For examlple, in TCL I can write

set A {aaa aaa {bbb bbb} }

and that would be list of lists

In Perl, can I write

my @A = ("aaa", "aaa", ("bbb", "bbb"));

and have a list of lists? Or will it be flattened? Or will I get (aaa, bbb, 3) because list literal is evaluated in scalar context, yelding the list's size? I now have to look it up. I need some guiding principle that makes sense.

Post #184,932

by cforde

11/24/04 1:32:29 AM

some answers

my @A = ("aaa", "aaa", ("bbb", "bbb"));

is equivilent to
my @A = ("aaa", "aaa", "bbb", "bbb");
what you probably want is
my @A = ("aaa", "aaa", ["bbb", "bbb"]);
This will create an array with three elements: 'aaa', 'aaa' and an anonymous array containing the two elements 'bbb' and 'bbb'.

Perhaps [link|http://www.oreilly.com/catalog/advperl/excerpt/ch01.html|this chapter] will help.

Have fun,
Carl Forde

Post #184,942

11/24/04 10:01:21 AM

OK, another arbitrary distinction to remember

I guess Larry Wall is a linguist - must be an English language linguist. Just enough rules to make exceptions really hurt.

Here is another question: when we take a reference with \\@xxx syntax, do we take the reference to xxx variable or to the data structure it contains? In other words, say we have this piece of code:

@xxx = (1, 2, 3);
$rx = \\@xxx;
@xxx = (9,8,7);
print($rx->[0] . "\\n");

What gets printed? I'd like to guess that it's "1", but I am just confused enough to be unsure.

Post #184,946

11/24/04 11:11:23 AM

It's easy enough to test...

Here is another question: when we take a reference with \\@xxx syntax, do we take the reference to xxx variable or to the data structure it contains? In other words, say we have this piece of code:

@xxx = (1, 2, 3);
$rx = \\@xxx;
@xxx = (9,8,7);
print($rx->[0] . "\\n");

What gets printed? I'd like to guess that it's "1", but I am just confused enough to be unsure.

the question you're asking is if once you've created the reference if you change the data that the reference was accessing, does it maintain the original data?

A better question might be : how do you gain access to the data to change it?

Compare it to :
   @xxx = (1,2,3);
   $rx = \\@xxx;
   my @xxx = (9,8,7);
   print ($x->[0] . "\\n");

Post #184,949

11/24/04 12:31:51 PM

Yes it is easy to test.

I want a lojical explanation, not "look up this", "test that". Don't you think it's abysmal when such a fundamental and simple question gets the answer along the lines of "test it"?

The answer is "9".

Sooooo. $rx refers to the _variable_name_ @xxx.

It's not a pointer, not really. It's more like a true C++ reference. In C++, there is a world of difference between

char *a = b;

and

char *&a = b;

On the other hand, when we do $rx = [1,2,3]; it behaves like true pointer.

There is a simple rule in there, just waiting to pop out. I can't quite grasp it. My stream of consciousness follows.

I guess the point would be that in Perl, variables are not names of (pointers to) memory areas, but rather "resizable" and type-safe memory areas themselves. It's like having a virtual machine where the word at a given address can contain arbitrary-size array, or either string or number - no matter. The address (variable name) stays the same.

On the other hand, the "kind" of variable cannot be changed (array, hash, scalar, typeglob(?))

In this context, the difference between [] amd () becomes very interesting... I have a feeling that one of them is a literal, the other an operator. Or something.

OK, I am rambling. But I think that the learned company here is pushing me in the right direction. Thank you.

Post #184,969

11/24/04 4:56:19 PM

Testing has some disadvantages....

...it's possible that a particular piece of code may be ambigious - meaning that the compiler writer can choose how to implement it.

So, simply "testing it" isn't always a solution.

However, in this case, the question of what is happening (and why) require first understanding what did happen. (Thus, "test it")

My rambling non-official option:

scalars are probably implemented as pointers (they point to something) with logic to do necessary things, such as convert between integer and characters on the fly
arrays are a structure, with each element being a scalar.
hashes are a structure (same as array), except that key and value are scalars

So...an array and a hash cannot be a member of an array or hash. (Back to that whole reference thing).

@xxx = (1, 2, 3); --- creates an array xxx and give its the values to the list (1,2,3) (lists and arrays are different - lists are a collection of scalars, lacking the structure of an array. ie: can't get a size of a list, iirc)

$rx = [1,2,3]; --- the [] create an anonymous array (ie: this IS an array). and allows the $rx to reference this array.

so @xxx = (1,2,3), $rx = \\@xxx; is very different from @xxx = (1,2,3), $rx = [1,2,3]; (there are 2 arrays created in the 2nd example).

now...to really play with your mind, 2 things....

You can create N number of references to a variable...

$reference4 = \\\\\\\\"hello!";

How to reference this reference to a reference to a reference?
print $$$$$refenence4;
Perl also has Symbolic references -
$variablex = 1;
$symbolic_x = "variablex";

print $$symbolic_x;

Post #186,004

12/4/04 5:37:15 AM

It isn't arbitrary

You're just not paying attention to the grammar of the language, so it looks arbitrary. If you ignore the grammar of English you'll constantly get tripped up as well. That's life. If you approach Perl saying, "Here's how it should work", you'll be disappointed and shortly after that very frustrated. If you approach it asking, "How does it work?" you'll find that there are rules and it really works according to them.

In Perl there are two basic contexts. List context. And scalar context. In list context any data structure will try to produce a flat list. In scalar context you'll get one thing. The distinction is entirely grammatical. Generally assign stuff somewhere that takes a list, and you get list context. Pass data into a function and you'll get list context. Assign it to a scalar and you'll get scalar context.

So when you see my @A = ("aaa", "aaa", ("bbb", "bbb")); what happens is that there is a list context imposed on the RHS (you're assigning to an array, which takes a list of things) so the RHS is flattened out into a list. Namely ("aaa", "aaa", "bbb", "bbb"). Nothing strange going on with data structures. It is all a question of what the grammar says.

Now to answer the question you had here, the \\ operator takes a reference to something. Think of it as like taking a pointer to a variable in C or C++. (Except that the memory-management is taken care of for you.) Therefore $rx points to whatever @xxx currently has. Updating @xxx is the same as updating @$rx, and vice versa.

If this bothers you, please describe what you'd expect from similar code with pointers in C or C++. Perl is just acting the same way.

If you want a private array you can use the anonymous array constructor, []. In the example that you gave, you'd get the result you were hoping for from:

\n@xxx = (1, 2, 3);\n$rx = [@xxx];\n@xxx = (9,8,7);\nprint($rx->[0] . "\\n");\n

Cheers,
Ben

I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)

Post #184,852

by FuManChu

11/23/04 12:44:27 AM

References: pointers in languages that don't have pointers

Post #184,864

11/23/04 11:01:34 AM

okay...look at it this way....

a scalar is a LISP atom...the basic unit in Perl, and you get to it via a $.

An array is a collection of scalars. This is why you have to have a reference to have a list of lists.

Likewise a hash is a collection of keys, which have to be scalars (see the Tie module if you want it to be something else) to scalars.

A Perl reference merely is a scalar to something else (such as a scalar, array or hash).
    $scalarRef = \\$scalar
    $listRef = \\@list
    $hashRef = \\%hash

To access -
   $scalar = $$scalarRef
    @list = @$listRef
    $hashRef = %$hashRef (iirc)

Post #186,003

12/4/04 5:23:10 AM

PERL DOES NOT STORE LISTS!!!

You'll save yourself a lot of confusion if you memorize the above sentence.

A list in Perl is a temporary data structure that (in list context) is passed somewhere. Once the operation is done, the list is gone. Vanished. Kaput. You might store the list in an array. But an array is not a list. An array is a bag that contains a list of things.

So you can't create a list of lists in Perl. Doesn't exist. If you try you'll get one long flat list. You can create an array of references to arrays which accomplishes the same goal. But it isn't a list of lists.

Cheers,
Ben

I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)

Post #186,056

12/5/04 4:04:07 PM

OK, in that case, what is (a,b,c) ?

[a,b,c] is an anonymous array constryctor. That I understand. What is (...) thing? What does it produce?

Post #186,058

12/5/04 4:21:12 PM

In which context?

In list context it produces a list that something is hopefully done with. In scalar context it returns the last element in the list (if the list is empty, then undef).

But my point is that the list is a transient thing. It is not a data structure that will exist to the next statement: it is produced and consumed within one statement. It is how Perl passes data around, not how Perl stores data.

If you try to think of it as anything other than the semantics indicated by Perl's grammar, you'll confuse yourself. Because that is all it is.

Cheers,
Ben

I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)

Post #186,064

12/5/04 5:14:47 PM

In the context of grammar and syntax

What kind of syntactic structure is it? For examply, in

@a = (1,2,3);

and in

myfunc(1,2,3)

are the (...) constructs syntactically different? They are in C++.

Also, what is "list"? I sort of understand "array". But that ephemeral "list" thing just throws me off.

Post #186,107

12/5/04 11:06:00 PM

Simple answer: there is no syntactic difference

In most code the two behave exactly the same. If you're trying to build a simple model of how Perl works, stop there.

Now for some the gory details and minor quibbles. In function calls you can get a slightly different behaviour, and it is a judgement call about whether to think about this as there being a syntactic difference, or whether you think of the way it behaved in assignment to be the same as the function case - just with the default behaviour. I'd lean towards saying that the syntax is the same - but only functions can specify non-default behaviour.

Explaining that will take a bit.

First of all what is a list? A list is what you'd guess naively, an ordered list of things. Think of shoving a bunch of stuff on the stack - that's a list. In fact I believe that that's how Perl actually works internally, it sticks pointers to the list on the stack, and calls an operation that is supposed to do something with those arguments. When the opcode finishes, the stack is cleared. This all happens at the C-level, all actual Perl data structures are kept in the heap. They have to be because their lifetime is indeterminate when the data is created. (Note that in Perl 6 even lists will be created in the heap to allow continuations to be added to the language. This is very similar to Python vs Stackless Python.)

Arrays and hashes and most functions will accept a list and do something intelligent with it. If you assign the list to an array, the array is filled with that list, element 0 goes to element 0, element 1 goes to element 1 and so on. If you assign the list to a hash, it is interpreted as a list of key/value pairs and the hash is set up accordingly. If you call a function the elements of the list are aliased to @_, and the function is supposed to get the arguments from there.

Hopefully the idea of a list is fairly clear - it is supposed to be straightforward.

Now we come to the non-straightforward part. Normally when you call a function, everything is expanded out. So if you see something like:

\n  foo(@stuff, @more_stuff, bar());\n

then foo will get a list containing everything in @stuff followed by everything in @more_stuff followed by what bar() returned. But there are exceptions. For instance:

\n  push(@stuff, @more_stuff, bar());\n

adds the contents of @more_stuff and the return of bar() to the end of @stuff. The push built-in has something (mis)named a prototype that causes the generated list to be (\\@stuff, @more_stuff). You still get a list - just a slightly different one than you would expect.

This was originally shoehorned in for backwards compatibility. In Perl 1-4 these built-ins had special behaviour, and there was no good way to get the same behaviour for user-defined functions because you didn't have references. Through Perl 5 the capability to write functions that behave the same way as these special built-ins has been added. Using this capability is generally a very, very bad idea. For a full explanation of how it works, what it does, and why it is a bad idea, read [link|http://library.n0i.net/programming/perl/articles/fm_prototypes/|FMTYEWTK About Prototypes].

Now that I've explained this whole prototype thing, you see that it is possible for functions to get a slightly different list than you'd expect just looking at the argument list. However the default behaviour is, "expand everything expandable out into a flat list". If you're assigning to array you get this same default behaviour. With functions you have some control over how this list expansion works. With array assignment you don't.

Cheers,
Ben

I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)

Post #186,140

12/6/04 9:35:02 AM

OK, I think I get it.

Here goes:

Perl has scalars, arrays and hashes for data structures (scalars can be references). For passing a few things around, Perl uses lists. Lists are not goddamned arrays, goddamn your goddamned eyes, you goddamned sissy puspus programmer! Got it?

In all seriouosness, I think I understand much better now. Thank you very much for taking your (even more valuable now) time and producing a lucid explanation. This deserves to be written in some kind of global FAQ place, if it's not already there.

Post #186,156

12/6/04 11:49:06 AM

Yup, sounds like you've got it

As for a global FAQ, the perldata manpage attempts to explain this, along with the full syntax involved. The information is there - but the problem for the reader is extracting the information that you need from the mass of information that you might possibly need. (A problem which is complicated by the fact that different people need different pieces of information.)

Cheers,
Ben

I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)

Post #185,007

by daemon

11/25/04 12:37:20 AM

lets try another viewpoint

nix vs frame or winders
perl acts on information presented in a variety of ways, there is no single correct way of doing things but a ton of wrong ways to do it. Having spent the best part of last week under the hood of perl I found:
Unix has a variety of ways to handle a string, as a number, date(pick a date type), alpha numeric, hex, decimal, ascii etc.

Windows and other systems arbitrarily assign what it thinks the data set should be representative of.

Perl is like nix, you must clearly decide ahead of time what a data set represents then manipulate using those rules.

Since Im not a perl programmer by no means, feel free to disregard or correct.
regards,
daemon

that way too many Iraqis conceived of free society as little more than a mosh pit with grenades. ANDISHEH NOURAEE
clearwater highschool marching band [link|http://www.chstornadoband.org/|http://www.chstornadoband.org/]

Post #186,001

12/4/04 5:08:26 AM

Sorry for not responding in this thread earlier

I've been distracted...

Well I also once had a reply all typed up, and the power went out. :-(

As others have indicated, it is better to approach Perl thinking linguistically rather than in terms of data. In one line Perl's attitude can be summed up by, "Perl tries to DWIM." Here's a quick overview of Perl's overall attitude towards data:

A thing is a scalar. Scalars can hold any kind of thing, strings, floating point numbers, integers, references, etc. Perl may do implicit casts behind the scenes at any time.

You usually hold a list of things in an array. An array in scalar context gives you the number of elements in the array. An array in list context gives you the elements. You can access any one with subscripting. You can also push, pop, shift, unshift, and splice into the array.

An "of relationship" is held in an associative array, which everyone just calls a hash (because that is how it is implemented).

The gruesome details can be found in [link|http://www.perldoc.com/perl5.8.0/pod/perldata.html|perldata].

There are corners of the language which don't fit in this simple model. For instance filehandles. Over time those are being cleaned up, for instance as of Perl 5.6 you can just use a private scalar as a filehandle and it just works.

That said the issue isn't generally the data structure - that is pretty simple - it is knowing what Perl is planning to do with it. And that is all linguistic. Rather than give you a tutorial on that, I'll just respond to each of your posts that shows some confusion.

Cheers,
Ben

I have come to believe that idealism without discipline is a quick road to disaster, while discipline without idealism is pointless. -- Aaron Ward (my brother)

Post #186,014

12/4/04 11:52:30 AM

No worries.

When you did not respond in the first 24 hours, I was a bit worried.

Then your announsment came, and my conclusion was: this man has his priorities straight :)

Post #186,022