Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Erlang inventor: Why do we need modules and files? (erlang.org)
252 points by kungfooguru on May 24, 2011 | hide | past | favorite | 97 comments


You don't need both modules and files. Decades of Smalltalk development experience backs this up.

Files should be an implementation detail. It should be just one back-end option out of many for persisting code.

There needs to be some way of conceptualizing a set of code. The programmer needs to perform operations on sets of code. (Merging, diffs, patches, etc...) Programmers need to store code and share it with other programmers. Files are good at doing all of these things, but none of these have anything to do with some special inherent quality only files possess.

Smalltalkers have been operating on the level of granularity of individual methods for quite awhile. It brings a lot of nice flexibility.

I would suggest that Namespaces have enough overlapping functionality with other "set of code" type entities, that they can just subsume the role of those other entities. The Java community has shown that it's workable for every project to have its own namespace. Namespaces also resolve a lot of name collision problems that keep cropping up. (Isn't that dandy?)

EDIT: The entire set of libraries for a variety of programming languages is readily available. It would be a cool project to re-granulize the commonly used library corpus for a particular programming language at the level of individual functions. There would be a lot of small-granularity dependency information that wouldn't be available but some of it could be inferred. Statically typed languages could have more of it inferred than dynamically typed languages, but there could still be a lot for the latter.


Plain text files are a communication protocol.

Files allow you to give code to your friends without extraordinary means, to use generalized programs like git and grep, and to post a copy up on the web.

Files also mean that your IDE doesn't have to try to be an OS. (Still can if it wants to, hello Emacs.)


Plain text files are a communication protocol.

That's just plain wrong. They are something you can send over communication protocols. They are a very useful tool for sharing, transmitting, and storing information. I'm fine with files as all of those things. When files subsume conceptual territory -- this is when there's inertia.

It would be like people insisting that novels can only exist in the form of bound rectangular slabs of dead tree. Once you free concepts from implementation details, new flexibility is enabled.

Files allow you to give code to your friends without extraordinary means, to use generalized programs like git and grep, and to post a copy up on the web.

Yes, but to fulfill these functions, they do not have to stake out conceptual territory, like becoming the granularity at which your version control system must operate. Again, decades of Smalltalk development shows that you can use files to give code to friends, use programs like git (Monticello is git for Smalltalk) and grep, and post copies on the web. Smalltalkers do all of the above without files subsuming more conceptual function than just being files -- a particular means of storing and sharing information.


So what plain-text source encodings are is a serialization format. Now that's interesting, because running with that thought, what if there were multiple equivalent serializations for the same program? What if one were more convenient than another for certain transformations? Would that let us build better tools?

At some level, and in some domains, this already exists: the multiple serializations of RDF (N3, Turtle, RDF/XML, etc) plus OWL is very loosely equivalent to Prolog. There are very popular domain-specific non-textual programming environments too – the three obvious ones are Excel, LabVIEW and Max/MSP.

There's got to be something in that for someone.


So what plain-text source encodings are is a serialization format.

Thank you! Well put! Actually, source code is just a human-readable serialization format. There's a lot to be done with that idea as well.


Isn't a lot of this the whole Lisp thing again? (The human-readable serialization format for Lisp is, as I understand it, damn close to being an ASCII serialization of the AST for Lisp, hence...)


Well, yes and no. Lisp is an ASCII serialization of the Lisp AST. This isn't true for most other languages. There is no reason why source for most languages couldn't be serialized as a combination of the AST and the stream of lexical tokens. This would have a few interesting advantages. For example, everyone could have the source formatting of their preference.

EDIT: Another example -- someone could implement a language homomorphic with Python without significant whitespace and curly brackets.


Even for lisp, you would have to define an extended syntax for it and modify the parser to additionally read and store whitespaces, comments and newlines. This way you will be able to serialize your source to plaintext without loosing the original format. Non text sourcecode also allows for way smarter tooling, tree diffs on the AST, graphical repls and improved rendering of sourcecode.


Even for lisp, you would have to define an extended syntax for it and modify the parser to additionally read and store whitespaces, comments and newlines.

Note I said AST + stream of lexical tokens. If you store the whitespace and comments as tokens, such information can be preserved, including formatting hints.


Minor pedantry: Most variants of Lisp are not precisely a serialization of the AST. The read engine and macro engines have to pass over the code before it is truly an AST. For Common Lisp I believe technically either can be invoked at any time, but I am not a Common Lisp language lawyer.


I took the "flat" approach in the Loom server code (see https://loom.cc/source). Formerly I used all the object-oriented module techniques, but finally I scrapped all that for direct calls to functions in a flat name space. I found it more flexible and easier to understand, and it didn't force me to think about artificial boundaries. This makes the Perl code similar to my C code. I don't even use many static functions in C, preferring everything to have a unique name.

I also wrote a functional language interpreter in C. The language is called Fexl (see http://fexl.com/code/). The C code obviously uses a uniquely named flat function space. However, the Fexl language itself has far more flexibility, since you can easily create functions within functions.

For example you can easily do things like this:

  \test_print =
  (
      \test1 = ...
      \test2 = ...
      ...
  )

  \test_read =
  (
      \test1 = ...
      \test2 = ...
      ...
  )
In that case you're not at all worried about the names test1 and test2 conflicting with anything else. It's very lightweight, just like the Fibonacci example shown on this thread.

If you really want to "export" functions declared inside a scope, you can do it like this:

  \handy_module =
  (
      ...
      \fun1 = ...
      \fun2 = ...

      \return return fun1 fun2
  )
Then to grab the functions you say:

  handy_module \fun1 \fun2 ... (now use fun1 and fun2)
You can change the names too, like this:

  handy_module \f1 \f2 ... (now use f1 and f2)
There's no extra magic in the language, you're just applying the module to a handler which grabs the exported functions.


> You don't need both modules and files. Decades of Smalltalk development experience backs this up.

While decades of elisp development shows us that without modules, nobody will want to use your language to write programs spanning more than one file. =\


Yes, you definitely need modules. But I think there are better ways of using modules than we have been.


I understand his frustration, but a single global namespace is not the solution. This would be a regression to the place that PHP has been trying to escape from.

If modules aren't available, then people will invent pseudo namespace qualifiers (e.g., misc_foo). A global namespace will become like the .com TLD--a few early landgrabbers with the cool names like foo and bar, then a bevy of latecomers with fooo and baaar67.

The module/namespace concept isn't a 100% solution, but it's sufficient. People understand hierarchy. It's simple and effective. And it also solves the global-visibility/encapsulation issue that he sidestepped.


I don't think its really a single global 'namespace'.

It seems that instead of using actual names for the key in the k/v database, you would use something closer to an ip-address. give the function a unique id, and then make it have a really good description. To find the function, you use the description, and then reference the unique id.

To reference it in your code, you would bind the unique id to the descriptive name.

What I want to know is how you prevent massive duplication of functionality in this k/v database (with slightly different argument conventions or implementation details, for example).


You'd have the duplicate functionality problem anyhow, as function tables essentially work as k/v databases already. The keys are the method signatures (e.g. name, parameters, return type) and the value is the actual implementation.

Unless your language syntax specifies that argument conventions (e.g. names or ordering of parameters) are part of the method signature, you're always going to have to deal with the problem of foo(int a, char b) mapping differently from foo(int i, char c) or foo(char b, int a). You could do this, but it's not really widespread as far as I know. I know of no language that deduplicates functions based on implementation details.


Furthermore it's undoable in general. Equivalence of lambda terms being undecideable and so on.

Point being, at some point functions become sufficiently different that it's hard to tell. What if you write one md5 function in Erlang, another in whatever that Ruby syntax for Erlang is called. I'm sure the Erlang they both generate, or bytecode, or what-have-you (I don't know how BEAM or HIPE work on the inside) are very different, to the point that checking that they are truly equivalent is going to be near-impossible, even for a very strictly defined program like MD5.


Is misc.foo better than misc_foo?

My point being - a point also raised by Joe - is that just "adding dots" doesn't solve anything.


I would rather have misc.foo (or in python terms, from misc import foo), than this_is_a_unique_function_name_that_does_foo, or '0000100302foo' with a ton of metadata to explain what it does and why it's better than every other similar function.

I'm trying to imagine visually parsing a program where the function names are merely unique identifiers that don't necessarily relate to their function, and it's not going well.


From what I understand, it'd be more:

from global_database import foo291 as foo

and you'd have lots of meta information associated to foo291 so you could easily find it by doing:

db-search blah


<quote> I'm trying to imagine visually parsing a program where the function names are merely unique identifiers that don't necessarily relate to their function... </quote>

Just to be pedantic: you're doing this already.

int add(int a, int b){ return a - b; }


The function names would still be user readable. The compiler and/or runtime would translate what you say to what you mean (think c++ mangling).


> My point being - a point also raised by Joe - is that just "adding dots" doesn't solve anything.

It's visually much easier to parse foo.bar_baz.quux than foo_bar_baz_quux.

Also, It lets symbols inside the module call their friends by quux instead of foo_bar_baz_quux.


> Also, It lets symbols inside the module call their friends by quux instead of foo_bar_baz_quux.

This is the main point in support of modules: they make you use names just like you are used to in real life. In a given context, shorter names suffice. When more contexts are involved, then you need qualifiers.


The global namespace in PHP is awesome. I don't have to learn a framework to solve a problem, there's a function for it.

Namespace qualifiers in names are a fine solution to the collision problem. That's effectively what's there already except every darn function has to be in a module.

Names don't have to be treated like property, you can invoke policy on them and change bad names. His dreams about rich metadata probably include version tags and hashes that would prevent this from silently breaking anything or in a way that couldn't be repaired by a tool.

Modules don't solve the problem you think they do, they just make the name of all functions longer and harder to figure out. They're not hierarchical.

He addressed issues of visibility/encapsulation and gave a great example of where modules fail to solve the problem. Though I don't quite follow his suggested fix, it looks like using lexical scope to encapsulate fib/3 but my Erlang is actually a bit rusty.


> Names don't have to be treated like property, you can invoke policy on them and change bad names.

A lot of PHP projects involve working with existing components or apps. Some of these use the global namespace with very obvious names, eg. phpBB's User, CodeIgniter's Session. Try and combine two components that use the same obvious names, and everything explodes in a giant mess of E_ERROR.

(It's the same deal with globally-scoped constants:

  define('ACTIVE',1);
  define('ACTIVE','active');
Makes life very interesting.)


OK, I might have change my mind about that. I have been lucky enough not to have had to use multiple selfishly named libraries I guess.


Where all these grand proposals break down is when you actually have to deal with real, domain-specific problems. It's not too hard to clearly express what the split_string() function or the multiply() function do, so namespace collisions aren't a problem. But what about the function that "assigns" an "activity" to a "user?" All of a sudden, every single person is going to mean something different by those three words, and it suddenly requires a huge amount of context to find that function. Searching for something that assigns activities to users as some sort of global search is worthless; the only stuff you care about is the set of functions within your code base that deal with those concepts, or perhaps within some library that you understand and whose notions of users, activities, and assignment matches what you need to do. So you need some unit to pull in that's larger than just a bag of functions; you need a group of functions that collectively operate on the same sets of data in consistent ways. And that's just a fundamentally hard problem; finding the right split so that you have a set of functions that can be used together, that are reusable, that don't rely on or expose additional concepts or libraries, that's all very difficult. Just flattening the function namespace, killing modules, and doing global searches doesn't do anything for that problem. Again, as much as we all wish that the proper unit of reusability in programming is a single function, since it makes life easier, in most complicated cases that's simply not the case. That's why we have classes, or modules, or namespaces, or packages in the first place: they're all various attempts to group together functions that work together on the same kinds of data, that share the same understanding of that data and that work towards the same goal. Just punting on that encapsulation problem doesn't make it go away.


He is probably imagining that the name of that function would be "nameofproject_assign_user_activity/2", and that you would typically use it along with some others from the same source, following some conventions. Seems that obviates modules to me. In Erlang you give the name of the module with a function anyway, so names like that harm nothing. It would be real nice to type "map" instead of "lists:map" though.


Sure . . . I mean, C doesn't exactly have namespaces, and neither did PHP at first, and you can write code in them, so it's not that it's somehow not possible to uniquely name functions. But it's pretty annoying, and I'd say that many people who are used to modules and namespaces miss them when they move to a world that doesn't have those things. They exist as A) syntactic sugar, so that you can just say assign_user_activity() instead of my_company_my_project_assign_user_activity and B) as a strong organizing principle (potentially that has tooling around it). The former point is just annoying; my argument was that the latter is really the point of modules or namespaces.

Maybe I want to find all the functions related to assignment: how do I do that? Do I just look on disk? Do I use tags or metadata? Do I assume some naming convention? It's a problem you have to solve somehow, and solving it requires effort, because it's an explicit organizational effort. Modules or namespaces or packages or classes give you a way to say "this stuff goes together" and to encapsulate and abstract large units of related functionality, and to do so in a way that tools and programs understand. In a Java IDE, my IDE will auto-complete all the functions on a class; in a REPL I can just print out all the methods on an object. That's a useful organizing principle, and it's one that you lose if you totally flatten the namespace.

You could argue that organization isn't one-dimensional or hierarchical, which is what namespaces and such force on you, and it's a reasonable statement. But replacing it with nothing explicit and relying on extensive developer metadata and documentation so you can search for things? That seems even more idealistic. When was the last time you saw every single function, even private ones, documented and tagged so correctly and up-to-date on a real-world, large-scale project that you'd rely on them to serve as the only organizing principle in your application?


Just to play devil's advocate, you might be able to handle this by hacking types through Erlang's pattern matching syntax.

Thus, you might have a function assign_activity(("KeeferUser", user_no), ("KeeferActivity", activity_no)) and rely on the compiler to match the "type" in the first field of each tuple you pass. It's not pretty, but it is possible.


There are plenty of ways to skin that particular cat, and languages exist and are successful without modules or namespaces or classes (like C). It's just a question of whether or not the proposal in the article is an improvement upon anything, and I think it's reasonable to argue that no, it's not; people added those features to more recent languages because they solved real problems. Replacing explicit, required, rigid categorization with optional, flexible categorization isn't necessarily a win: it could just mean that people don't actually think about the problem very hard, and you end up being unable to find all the things related to X when you actually decide you need to understand how X works, because people didn't bother adding in the "X" tag in the function metadata to everything related to X.


C has modules all right. A .c/.h file pair is essentially what many other languages call a module, and have all the problems that the modules in Joe Armstrong's fibonacci example also has.


This is significantly similar to part of a note I wrote a month or two ago, about 'Software as web' http://www.hxa.name/notes/note-hxa7241-20110314T2011Z.html

Functions calling (linking) each other, and being built out of available pieces, ideally at a global scale -- this is just like the structure of the web.

* Functions should have URLs

* Functions should be augmented with metadata, like their language/platform

* There should be more and more fully defined IMT/MIME data types for lots of data

(Although I am more casual about this idea, and am only thinking half-seriously.)


So you'll start prefixing all names with a string so they don't clash. Then you'll notice all your related code uses these strings, what a waste, so you'll create a "module" in order to be able to drop the string in a defined context. Thus namespaces/modules are reinvented. Function drives form.


Let me rephrase this: Search don't sort.

There cannot be a single organizational hierarchy that is intuitive to all users.

So lets try a flat namespace where every class and method can be found both by name, metadata, comments etc. Think the chrome universal search autocomplete.

Content should be addressable by more than any one namespace!


Isn't the problem more about access than storage?

We naturally recoil from the global namespace idea because we (rightly) anticipate huge issues with organization and duplication, so we want a hierarchical structure (files and modules) to keep our functions organized.

How about leaving our storage hierarchical for organizational purposes, but streamlining access? Instead of always using tedious import and require statements, simply call/use your functions, objects, gems, plugins, etc. directly and let the compiler and/or runtime infer from your usage which you are referring to in the case of ambiguity. Only if the ambiguity cannot be resolved in this manner would the programmer need to be explicit. Intelligent metadata and indexing could also add a lot of power to this sort of system.

Currently even our best languages require a large amount of cruft and legwork that is only really necessary in those 5% of cases where ambiguity can't be automatically inferred away. Seems like optimizing for the edge case if I've ever seen it.


No one has mentioned that modules/packages hold together code that shares concepts.

Surely getting rid of Java packages and keeping the classes would be far less extreme than what Joe is suggesting for Erlang -- yet then we would have a zillion "meanings" of things like Image, Server, PDFFile, etc.

What about the Erlang functions that aren't "file2md5" and "downcase_char"? Is he way over-generalizing, or do Erlang programs typically just munge data in obvious ways?


You can package code without making the package a formal part of the namespace. Having a global namespace doesn't preclude packaging code, even if they are traditionally mixed as one thing. Many Smalltalk's have a single global namespace, it works pretty well.


Modules are units of encapsulation.

In my experience, if you have trouble deciding where a function "foo" needs to go to -- you probably have divided your code across arbitrary/poor boundaries.

Modules should have clean API's that do not leak implementation details and of course the implementation itself.

If you break down the barriers between modules, you remain with only implementation details.

A well-written module that's divided across meaningful boundaries can usually be read in separation of the project it is a part of and actually understood as whole.

All that said, I think storage of code could be done far better than serialization in text files, but that's another matter, and whatever code editing form we use, we should still have "modules".


Sounds like you didn't read the article, he specifically addressed encapsulation and an alternative way of doing it.


I read the article, and I couldn't find any explanation of how to regain the lost encapsulation.

I see him mentioning anonymous inner functions and that "reusing modules" is hard. Did you mean to refer to this?

Anonymous inner functions are more difficult to share across multiple components that share their intricate knowledge of the implementation details and would be stored in a module. I do not think they are a viable replacement for an explicit boundary around multiple components.

About reuse of modules being hard -- I think this again stems from poorly designed module APIs and semantics. Well-designed modules (e.g: STL's modules, Haskell's data structure modules) are very easy to re-use.

I completely agree with him on the difficulty/arbitrarity of module naming, though. I don't think modules should be given a made-up name, but rather found by more informative "meta-data" and addressed by auto-generated unique names.


I'm referring to the section where he talks about a new syntax for hiding helper functions so as not to expose them to the outside world. That was specifically about addressing your encapsulation concern.


From a collaboration point of view, the thought of programming with single-function modules sounds like a nightmare. Now you don't have to find and install matching versions of five libraries, you have to find suitable versions of a hundred functions.

Perhaps nice people step in to stop the suffering and provide packages of function versions which are mutually compatible and which are usually used together -- but that's modules again.


Perhaps nice people step in to stop the suffering and provide packages of function versions which are mutually compatible and which are usually used together -- but that's modules again

Not exactly. You've sort of moved the large globs of stuff from the back-end to the front-end, where the grouping is more useful.

I'm reminded of difficulties, like the kind faced when purchasing 1.5" electrical tape. That stuff's mostly used only by pro electricians, so you won't find it retail, and it's even hard to find a particular brand from a contractor's supply outfit, and when you do, you have to buy it in big lots, like a whole box of 10 rolls. What if you just want to fix a drum and you need a particular make of tape made by Scotch and you only want one roll? Out of luck.

So say you need a particular function. You end up importing this entire module, which has dependencies that also are defined in large-granularity terms (other modules) that have their own dependencies. So to use one function, you get saddled with a whole heap of dependency overhead. You're really paying the price for "a whole box" where what you really need is just one particular thing.

But if everything was stored in easily accessible public repositories at the granularity of individual functions, this wouldn't be the case. You'd be able to pull the particular version of the function you need, and it would just pull the particular versions of the functions it depends on, and so on.

Things would be a whole lot more memory efficient. Another way to think of it: modules are a lot less modular than than they really should be.


So say you need a particular function. You end up importing this entire module, which has dependencies that also are defined in large-granularity terms (other modules) that have their own dependencies. So to use one function, you get saddled with a whole heap of dependency overhead.

Just playing the devil's advocate here: in this use case, why not just write the function yourself?


Because, if there was a way to quickly find and import the right function (with small-granularity dependencies), it would be much faster in a great many cases.


> The more I think about it the more I think program development should viewed as changing the state of a Key-Value database.

This is pretty much how I view my programming in Common Lisp.


The more I think about it the more I think program development should viewed as changing the state of a Key-Value database.

This is also how a lot of Smalltalkers operate.


Yea, Joe is slowly coming around to the idea of image based development instead of file based development. He's a few decades too late to be original, but it's still a great idea.

Of course, throwing out files has consequences on the tools you use, big time. Ask any Smalltalk'er.


Of course, throwing out files has consequences on the tools you use, big time. Ask any Smalltalk'er.

You can ask me. Who said anything about throwing out files? I use file-based tools all the time. I've grepped files and diffed them. I've exported a Class as a file, done some operation on it with another tool, then filed it back into the image. Whatever tool is best for the job. There's in-image tools, but I'm not limited to them.

Not being shackled to files has tremendous positive consequences. For example, I am free to code in the debugger almost 100% of the time. Even irb and iPython can look a bit restricted in comparison. If I do something esoteric to low-level code and crash the image, there's a transactional log of my code changes I can recover from, almost with impunity.

The choice of text editors for main development is rather restricted, though. (But with a pattern of very short methods as the norm, and the ability to customize the browsers at the level of individual methods, this isn't that big a deal.)


I'll venture a guess that you didn't read my profile.


Nope. Didn't.


Can you say more about that or do you have a blog post on this somewhere that you can link to?


Common Lisp can be compiled one function at a time, and functions belong to packages, which are pretty much key-value databases you can modify, where keys are strings (name of the function) and the values are the functions themselves. So if there is something wrong with some function, I just update the value in the "database" and all callers will call the newly defined function from now on. Nothing magical really. This workflow is supported by the CL IDEs such as SLIME.


I think this just exacerbates the problem with erlangs flat namespace, right now it is impossible to have 2 versions of a library inside the same vm because they both exist inside the same namespace, it is insane the amount of people that have had very obscure errors because they happened to call a module "http.erl"

I would appreciate a solution that is a first class solution 'in code' as opposed to some special case with module loaders, but I would like to see people talking about the problems they are solving before pontificating about solutions


This is kinda off the point, but I'm surprised at the spelling and punctuation mistakes in that mail--generally high-profile hackers have excellent written English.

i.e. "but their isn't", "Do we need module's at all?", "do suggest alternative syntax's here."


> spelling and punctuation mistakes > excellent written English

Erlang's syntax, naming, and abbreviation conventions are straight schizophrenic.

Keywords, directives, method/module names, variables, and arguments are randomly spelled out, others shortened, if you're lucky with underscores or CamelCase, there are some familiar C style conventions but not widespread, directory structure seems to be highly project dependent, etc. - the list goes on.

It's a very "cluttered" language and the lack of a strong proficiency in written English as you mention clearly shows (to me) in the language/framework.


Keyword/directive spelling is British and any resemblance of C style conventions is a coincidence, because nobody cares about them in the Erlang world. Yes, Erlang has a weird syntax. Get over it.

I do agree to the point that standard libraries could have a more unified style. But the language is simple and concise.


He may be dyslexic or he may just have rushed this to print and to hell with the typos.


I'm not sure his native language is english; he may be swedish.


I'm almost positive he's British, or at least he has a British accent of sorts: http://www.infoq.com/interviews/Erlang-Joe-Armstrong

Also, regarding his writing, he did at least preface his post with This is a brain-dump-stream-of-consciousness-thing. Despite being a stickler for grammar rules myself, if I don't proofread my writing, there are bound to be plenty of grammar errors and typos that pop up, especially if my brain is really flying with ideas and my fingers are struggling to keep up. His post had that feel to it as well. A bunch of ideas, semi-related, bouncing ideas around haphazardly.


I went to a talk he gave a year or so ago on a set of libraries he'd written. He was just like that in person. A complete mad professor.

He showed us his early design documents of erlang, including original compile times and performance measurements. Awesome stuff.

He does sound British, but there's a certain foreign-ness about the way he speaks (probably picked up from Ericsson).


His native language is English. He's just sloppy. He's been in Sweden for decades and speaks Swedish with a typical English accent.

Source 1: My opinion. I've met him many times.

Source 2: His thesis states that he started at the Ericsson CSLab in 1985. CSLab was in Sweden.

Source 3: This (http://www.cse.chalmers.se/~rjmh/Armstrong/bits.ps) set of slides states that he has a B.Sc. from UCL (London) from 1972 and started at CSLAb in 1986 (not 1985, seems the dates are a bit fluid).


Emacs is using single namespace, and people are just contributing .el scripts here and there (put in the right place, they get automatically loaded - (I'm just an user of these .el scripts - mainly for lisp/lua development))

Also "C" - single namespace, and though very verbose sometimes, google it, and you'll find result (saved me many times looking for Win32 API, GTK, Cairo, lua api, etc.)

But what about data? Static data, vars, etc. Also some languages/systems have initialization/deinitialization of the module (register/unregister, etc.)

But in general I like the idea, and thought about it, now I'm even thinking more.

He talks about putting them in database, well each database would have to have a name - maybe that's the name of the package (and you can rename), and you can merge. And if the DB is say SQL - you can even operate on merging databases way better than the methods of "ar, lib (msvc), ranlib, etc." or whatever the language/runtime provides


Erlang modules hold nothing but functions. In most popular languages there can be mutable data, too, either public or private; this extra complication is what would tip it over from interesting to silly in my mind. For Erlang it strikes me as worth exploring, and then maybe the result could be generalized to other languages.

He considers private implementation-only functions using letrec, but what about multiple public functions sharing the same privates? You can model them as an object (or rather a nullary function returning a tuple or some other representation of a table, since Erlang modules have functions only). Hurrah, we've reified modules, oh well.

I'll bet these points are addressed in the long followup thread.

(Obligatory mention of Zooko's triangle: http://www.zooko.com/distnames.html )


This issue of where to place functions is something I have been curious about, too.

In the Lisp community, what are the standards for where to store your generic methods as opposed to defined classes they work with? Rephrased, if I define classes Foo and Bar, and I write generic method foobar (accepts as params instances of Foo and Bar), where do I put foobar?

In the C++ world, where should I put my friend functions that suffer a similar lack of obvious home?

I have often seen solutions where some package/class is chosen arbitrarily as the "proper" home for these cross-class communicators, but I have long felt like this is a compromise rather than good organization. And yes, I recognize that there are at least two kinds of organizations: In what file is my code? In what namespace is my code? I am concerned with the namespace aspect.


>Rephrased, if I define classes Foo and Bar, and I write generic method foobar (accepts as params instances of Foo and Bar), where do I put foobar?

This sounds similar to the expression problem.

http://www.infoq.com/presentations/Clojure-Expression-Proble...


Generics/friends/multimethods are one of the solutions identified on the slides you posted (and protocols are interesting), but I did not see mention of appropriate namespaces for these.


Erlang's module system isn't just a namespace mechanism; it really lends itself well to stuff like gen_server where you need to provide a set of callbacks. I'm surprised he didn't even touch on this in his post, as it definitely seems like one of Erlang's strengths to me....



Joe Armstrong is a really interesting guy. I think he was my favorite interview in Coders at Work.


Armstrong doesn't seem to be saying anything about one of the more damning problems he points out - encapsulation. All functions in a module are available to each other, but hidden from code not in that module.

His solution of simply storing all code in a key-value DB only makes this worse in exchange for the dubious benefit of removing the ? "where do I put this new function?". Since he makes a big deal of encapsulation, his solution seems questionable.

EDIT - Please don't upvote this. I'm wrong in an incredibly silly way, and my wrongness doesn't need to be rewarded. If you feel generous, upvote ericflo instead for pointing out just how badly I missed the point.


Actually he addresses that and proposes syntax to address it.

    let fib = fun(N) -> fib(N, 1, 0) end
    in
       fib(N, A, B) when N < 2 -> A;
       fib(N, A, B) -> fib(N-1, A+B, A).
    end.
Where everything in the "in" block would be hidden.


You are absolutely right. When I read his post, that's squirreled away in a discussion about syntax and I missed it. I stand corrected.


Perhaps I misunderstand, but couldn't one just as easily provide package scoping in a name/value store if the key was a hierarchical dotted-identifier?

Edit: I think I just answered my own question...the question of what exactly that dotted-identifier should be has just as much burden as the current "where do I put this" question.


It seems that this might maker code easier to write (the first time around) at the expense of making it more difficult to read. That's totally counter to the prevailing wisdom.

I'd also be curious to know what metadata the functions could be tagged with to make them sufficiently easy to find. The module and project that a function or type belongs to provides a lot of contextual information about it; coming up with metadata which captures that information without simply duplicating it (and therefore reinventing modules under another guise) sounds fairly non-trivial.


Intriguing idea.

Extending it to absurdity:

All functions should be stored in a searchable key/value database, where the key is a cryptographically signed canonical representation of the function source code. By "canonical representation" I mean a translation to a canonical form of the lambda calculus, so as to capture the complete meaning unambiguously.

This database would, of course, translate the lambda calculus to erlang, lisp, haskell, php, what-have-you, on demand at the view layer.

Metadata would need to be extensive, but we could crowd-source that Wikipedia-fashion.


The problem is exclusive containment. Folders/directories are an example of hierarchical exclusive containment. The author suggests avoiding the problems associated with this by getting rid of the hierarchy.

However, the author is onto something: storing functions in a database. Using the file system results in exclusive containment (file can't be in two different directories - link don't count) that a database can avoid.

If you are lost, think of Gmail's tags with nesting. You can still have the hierarchy but data can be "located" under different tags. A lot of times the problem is using the file system as the back end. Python's modules, for instance, do not inherently enforce exclusive containment; the file system imposes this constraint.

I had a chat with Jeff Bone a year ago after I read his partially-related rant here: http://www.xent.com/pipermail/fork/Week-of-Mon-20091109/0545...

I wish I could paste everything he said to me in here; he pointed me to a lot of examples of good file system / graph database examples that get a lot of these things right. Quick further reading list: Original ReiserFS design, BeFS design, FluidDB, LUFS, and Tannenbaum's design for the directory services and file system of Ameoba.


I like this idea. The problem I see is when you have different, but equally valid implementations.

In this case, it would make sense to back the name up with the developer/company name. But with no further decoration, as discussed. That way, it's clear which function implementation you're referring to. You can still have the mainstream (popular? chosen by the project's team?) implementations be referenced by function name alone. Any other implementations would need the dev/company name. That way you can have different parts of the project still able to reference different implementations of the same function.

The alternative would be to decorate the function with the implementation detail. And I think that's going against the simplicity he's after. Worse still is you'd still need to know you'll be supporting multiple implementations up front.


You already have this with CommonJS packages and modules as used by Node, RingoJS, Akshell and others.

The latest additions to require() mean that if you do something like require('foo') and there's a directory 'foo' in your require path that contains a package.json file (i.e. if you have package foo installed via the likes of npm), then the main module mentioned in that package.json file (which contains the meta data Joe speaks of) is immediately available to you. The module isn't a function, but rather an object, but then JavaScript isn't a purely functional language.

Now, admittedly, currently you need to pre install the packages, but there's no reason why require couldn't be modified to install them for you if they're unavailable or, if your code ran on a platform like Akshell, the packages wouldn't already be available to you.


Actually in node at least modules can export functions directly, just assign a function to `module.exports` and when you require() that module you'll get a function back. Plus you can add attributes to that function since functions are also objects.


That's cool, I had overlooked that. I guess the point is that there's no key value store for modules, whereas for packages there's npm.


My first thought after reading the beginnings of his bulleted list: PHP? Haha. Still reading. It feels almost surreal to be reading the brain dump // stream of consciousness of someone who's created something so great. He talks just like you or I. Very fascinating.


There's nothing special about getting something like that done. There's nothing special about people who spend time on that kind of stuff either, save for the fact that they spend time on it. It's not difficult to be at the top, what's difficult is getting there and having enough resolve and persistence to do it. Only change is painful.


Does anybody remember Apple Dylan (http://wiki.opendylan.org/wiki/view.dsp?title=Apple%20Dylan)? I never used it myself, but I remember reading a lot about it in the late 1990's because I was using Macintosh Common Lisp at the time. In the Apple Dylan IDE, every function, class, and module was an object in an object database; there were no files at all. The IDE was at least 10 years ahead of its time, and as a result the performance was terrible. But it's surprising that nobody has revived the idea of a source database.


the idea of storing code in DB isn't new. there are couple "exotic" system which do it, SAP R/3 being one of them.


Dependent typing has some interesting things to say about modules, or lack thereof. Cayenne[1] is a functional language that doesn't have modules as its type system is expressive enough to cope with the necessary encapsulation -- you can create the module structure just by using types.

Unfortunately, dependent types make typechecking undecidable so there are drawbacks to this approach...

[1] http://en.wikipedia.org/wiki/Cayenne_(programming_language)


This is how PHP works, and I think it's smart. But to avoid the pitfalls of PHP, the new global functions need to be named in a way that's logical and unlikely to step on any toes.


..Really? There's a reason they are and have been trying to move away from this script kiddie style of programming..


Wouldn't it be interesting to draw a parallel with natural languages. All english words belong to a single namespace, in a way. And even though English language is highly homonymic we don't have difficulties understanding it, for the most part. We figure out the meaning based on the context. In this sense PL's namespaces seem to be analogous to the natural language context.


Many of these ideas are already present in picolisp


Erlang meet Forth!!!


All machine state should be globally, transparently persisted like this if you ask me. Sort of like a persistent heap.


  #import stocklib as lib
  import fancylib as lib

  a = lib.Lib()


Or even more well recognised:

  try:
      import cStringIO as StringIO
  except:
      import StringIO
  
  strio = StringIO.StringIO()


Honestly the stupidest shit I have read this month. PHP did that too, and they always do smart things right?

The guy might have a PhD, but he knows nothing about usability of languages. He also dissed OOP, but I guess having concepts that accurately represent data are shit too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: