Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Global Forecast System source code (noaa.gov)
88 points by geetee on Jan 31, 2015 | hide | past | favorite | 27 comments


Independently I was just wondering tonight why weather forecasting does not use some Machine Learning techniques as there is an abundance of training data. Wouldn't any number of regression ML algorithms find a more accurate prediction model? If Google did weather....


ML techniques certainly are being used to predict the weather, maybe just not at NOAA or the Weather Channel. The weather derivative industry[0] is an $8 billion dollar industry, making it larger than the entire rocket launch industry (public and private combined). Just as there are Elon Musks in the rocket launch industry, there are maverick hedge funds out there making a killing by predicting the weather using the latest Big Data tools.

[0] http://en.wikipedia.org/wiki/Weather_derivative


Sometimes Machine Learning is the wrong approach. Direct mathematical modeling, if the underlying causal phenomenon are well known, bypasses the whole learning aspect of an explanatory system. Machine learning might help make sense of the residual noise that can't be accounted for with direct mathematical modeling. But at the level of sophistication that I've seen (predicting wind patterns using Finite Element Analysis methods mapped to topographical patterns fine-grained enough to model small buildings), I can't imagine any machine learning methods that could make any sense of the residual noise.


I agree that Machine Learning is not always the right approach but wouldn't it be a relatively inexpensive experiment to see if it could be used for more accurate predictions? I'm surprised that topographical patterns are mapped that precisely when ASOS and AWOs stations are so far apart. Also the weather data from commercial airplanes is so far above the surface that modeling small buildings would be meaningless.

I heard a talk from ex-Googler climate.com (now Monsanto) guys. Their forecasts were accurate enough to build a very profitable insurance business. I bet they used Bayesian rather than FEM techniques.


Is it that 'physics simulation' forecasting is best for short timescales (e.g. < 1 week in the uk), whilst ML can be better for estimating next year's weather statistics in some particular place? (Because your butterfly effect means that simulating the actual weather out to 1 year 'accurately' is not possible).

Presumably any short-term ML forecasting would use the physics simulation as it's main input, and then try to improve slightly on it (e.g. maybe you observe that in a particular small area, the rainfall is on average 10% more than the physics simulation, so your ML method would add 10%, giving you a slight forecast improvement)?


Probably because if it worked they'd already be doing it.


Do they also make raw sensor data available for free? I.e. if I wanted to build my own forecast system based on this code, I would still need all the data they collect with weather balloons, right?



To create a global NWP system from this code you would need observations (soundings, SST, satellite measurements) as well as a super computer.


WCOSS (Tide & Gyre): IBM iDataPlex/Intel Sandy Bridge/Linux

208 trillion calculations/sec; 10,048 processing cores; 2,590 trillion bytes of storage

[xref: http://www.emc.ncep.noaa.gov/GFS/doc.php#comopesys]


You can get 1.5% of that processing power in a single graphics card now. So in 7 years, we should be able to buy that much power for about $350 :)


Here's a really interesting read about the trials of upgrading to WCOSS: http://www.washingtonpost.com/blogs/capital-weather-gang/wp/...


I'd assumed that 'trials' meant 'snafu in a big software project'.

But actually it's about the butterfly effect making it impossible to get exact agreement between the old and new systems for forecasts longer than about 5 days.



You don't want the "raw sensor data". You need it after it's been processed to the appropriate level, to retrieve geophysically-relevant quantities (like temperatures and wind speeds) from the data that is measured. That is not done by the forecasters.


Fortran 90 is still the language computational stuff is made in today. I'm not convinced, however, that that choice is really for performance reasons over cultural reasons.

Still, Fortran 90 is hell of a lot better than Fortran 77, which unfortunately is the language of choice for some of the people I'm being asked to work with.


I'm a part of this community. I assure you speed in number crunching is a huge reason this is written in F90.


Indeed, automatic (and largely implicit) use of SIMD and OpenMP by the compiler is very very good with F90 compared to almost all other widely used languages.


I'm curious (really curious, although this might sound like a troll), have you ever tried a benchmark test on this? Things like this [0] come up every so often, and make me wonder about the efficiency advantage for coding in Fortran.

This is also relevant and insightful [1]. The top answer talks about Fortran's strict aliasing semantics.

[0] http://unriskinsight.blogspot.com/2014/06/fast-functional-go...

[1] http://stackoverflow.com/questions/146159/is-fortran-faster-...


[0] is a classic example of comparing programming languages attacking a problem with different algorithms.

I see that the author insults Fortran a couple of times, and I don't seen any indication that they tried to implement the better algorithm in Fortran. Did I miss something?


If you look at Julia[0] they have quite a lot of performance benchmarks and you will see that Fortran is always at or near the top. What is really important, from a numerical code point of view, is that with Fortran it is easy to be at the top by operating on matrices and vectors etc. You do not have to torture your mind and think too much to optimize your code, "remove the IF in the loops" is often just what you need.

I am writing code in Fortran 77/90 every day, I really really enjoy it. Calling your Fortran code from Python is also so easy that you can really have clear cut between data management in Python and computational work in Fortran.

[0]: http://julialang.org/


inertia would have something to do with it, like there's an 'anal.sh' file which even made it into the documentation.

I can't seem to access the SVN repo, I wonder how/if they unit tested it all.


As somebody that used to work at NOAA (not at NCEP, but worked with a lot of people at NCEP).

No, there is no unit testing!

anal.sh is short for analysis.


This is really cool. If I knew the slightest bit about Fortran, I'd be distracted for weeks.


imo, fortran90 is about the easiest language to follow - if language is all that is stopping you (and you know some other language already), then you should just pile straight in.

From looking at some of the code, what is hard for me to follow is the 'overall view'. The documentation (of the model) looks quite nicely done, but there must be a huge amount of fluid dynamics etc etc knowledge, that would take me years to learn.


Why don't they just release this on Github?

Also, it would be rather interesting for a rewrite of this by the open source committee.


Your first sentence sniped me and I almost wrote a Big Response before I read the second line and realized you're obviously joking…




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: