convore.json/groups/inscight/episode-5-reproducibility-in-scientific-computing/messages.json


			
				
					
					
						
						
							
							
							[{"user_id": 10421, "stars": [], "topic_id": 11345, "date_created": 1299527210.052048, "message": "there should be no external scripts for producing plots. If you produce a plot, the script should be version controlled and include identification of the revision/inputs that produced it.", "group_id": 3435, "id": 290192}, {"user_id": 10421, "stars": [], "topic_id": 11345, "date_created": 1299527370.269733, "message": "How do people feel about the notion that you should be keeping lots and lots of data about your runs : (which version of the compiler you used, compiler flags, operating system version, versions of all the linked libraries, hardware IDs, other processes being run on your machine simultaneously... ) Some people claim these things are *totally necessary* .", "group_id": 3435, "id": 290242}, {"user_id": 10421, "stars": [], "topic_id": 11345, "date_created": 1299527142.461201, "message": "random number seeds and random numbers that seed more random numbers should be recorded with the output.", "group_id": 3435, "id": 290167}, {"user_id": 10411, "stars": [], "topic_id": 11345, "date_created": 1299526080.3392079, "message": "archive10 link from Puneet https://www.protogeni.net/trac/archive10/", "group_id": 3435, "id": 289925}, {"user_id": 10421, "stars": [], "topic_id": 11345, "date_created": 1299526906.3429799, "message": "I think VisTrails is interesting. http://www.vistrails.org/index.php/Main_Page", "group_id": 3435, "id": 290103}, {"user_id": 10421, "stars": [], "topic_id": 11345, "date_created": 1299527007.228014, "message": "Other tricks I've collected include: automating inclusion of revision numbers in every output file generated.", "group_id": 3435, "id": 290123}, {"user_id": 10421, "stars": [], "topic_id": 11345, "date_created": 1299527071.4319761, "message": "Including a full set of input data in the output is also helpful.", "group_id": 3435, "id": 290139}, {"user_id": 10611, "stars": [], "topic_id": 11345, "date_created": 1299528522.651423, "message": "I would also be interested in hearing Sumatra discussed (I've been meaning to incorporate it into my scientific code for a while, but never found the time):\n- http://neuralensemble.org/trac/sumatra/wiki", "group_id": 3435, "id": 290469}, {"user_id": 11246, "stars": [], "topic_id": 11345, "date_created": 1299565745.532517, "message": "worth a mention http://www.stanford.edu/~pgbovine/cde.html", "group_id": 3435, "id": 294660}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1300069583.6459601, "message": "You may find interesting this discussion in the NITRC forum http://www.nitrc.org/forum/forum.php?thread_id=2133&forum_id=2", "group_id": 3435, "id": 344623}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1300069792.375531, "message": "@katyhuff I agree, they are necessary. ...or... the code should be tested in many platforms and different configurations to demonstrate that the software behaves the same in all them. What we have seen usually in Dashboards is that versions and flags DO matter. See for example http://public.kitware.com/dashboard.php?name=itk", "group_id": 3435, "id": 344629}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1300464186.79884, "message": "We probably should talk about the Elsevier Executable Paper Challenge: http://www.executablepapers.com/", "group_id": 3435, "id": 383096}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1300466263.0739641, "message": "@spidr Thanks for posting about CDE, I was just looking for a tool like it.  This is Great !", "group_id": 3435, "id": 383449}, {"user_id": 11246, "stars": [], "topic_id": 11345, "date_created": 1300513985.3659, "message": "@luisibanez Greate to hear you are going to make it on this episode.  Really enjoying your recent reproducibility blog posts.  I am surprised that Elsevier is sponsoring the Executable Paper Challenge.  CDE is really impressive.  It takes some disk space, but it is amazing.  There are not even issues with glibc versions.", "group_id": 3435, "id": 387099}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1300545857.1037631, "message": "@spidr I'm very interested in hearing more about CDE. It looks ideal for using it along with the Insight Journal. We have been brainstorming that instead of sending source code, and data, along with the papers, the authors could prepare a virtual machine (virtualbox or wmvare) and ship the entire file of the virtual machine. Of course, the drawback is that such VMs will be rather large files (> 2Gb).... but ... with CDE... we could figure out the minimal VM that is required to run the programs related to the paper.   It will be like building customized computers fully dedicated to run the programs needed to replicate a paper.        Does that sound doable ? from what you have seen in CDE ?", "group_id": 3435, "id": 388507}, {"user_id": 10411, "stars": [], "topic_id": 11345, "date_created": 1300559055.3832819, "message": "@luisibanez For true provenance, I think that the VM idea is one of the better ways to go.  Yes it takes a lot of disk space now, but if OS requirements do not grow as fast as the hard drive space, then we have a win.\n\nThe thing that VMs fail to account for is custom architecture.", "group_id": 3435, "id": 389036}, {"user_id": 11246, "stars": [], "topic_id": 11345, "date_created": 1300574191.761714, "message": "@luisibanez, that is what is seems to be.  You get what you need and only what you need.  It use it in the future, you need a running linux-2.6 kernel, which I suppose will be around for a long time, and will be emulated or runnable in a VM for even longer.  Of course, if it doesn't run on Linux in the first place, then it doesn't help.", "group_id": 3435, "id": 389720}, {"user_id": 11246, "stars": [], "topic_id": 11345, "date_created": 1300574405.4855809, "message": "@scopatz, VM are nicer, but OS bloat seems to scale with storage ability.  And it is one thing to store locally, and another to have a bunch of people DL over the network.  BTW, I have heard lots of positive feedback on the THW 2011 use of the VM.  It was really the way to go.  Easy.  Everyone can do it.  Everyone is on the same page.", "group_id": 3435, "id": 389731}, {"user_id": 10411, "stars": [], "topic_id": 11345, "date_created": 1300580947.8888309, "message": "@spidr Yeah, the THW SCBC 2011 use of VMs was amazing.  I have suggested this at work because it was so successful.\n\nIt seems like there must be a way to have minimalist OSes that don't scale up with space.  LIke who needs a package manager for a provenance environment, or a GUI for most science work?", "group_id": 3435, "id": 389994}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1300668451.2607901, "message": "@scopatz Could you send me a link to SCBC ? I'm not aware of what they did with VMs.", "group_id": 3435, "id": 394607}, {"user_id": 10411, "stars": [], "topic_id": 11345, "date_created": 1300687764.966547, "message": "@luisibanez The info is at the bottom of this page (http://hackerwithin.org/thw/plugin_wiki/page/wilson_plenary), but I can't find the link to the VM anymore.  It is quite likely they took it down.", "group_id": 3435, "id": 395609}, {"user_id": 23187, "stars": [], "topic_id": 11345, "date_created": 1300730227.766705, "message": "This might be relevant to some of the points raised here: Provenance Aware Storage System (PASS)  http://www.eecs.harvard.edu/syrah/pass/  I don't have any experience with it, but the concept seems sound -- enlist the OS to do a lot of tracking while you're busy working.  A CS PhD student here brought it to my attention, so we may have a chance to experiment with it here in the near future.", "group_id": 3435, "id": 400672}, {"user_id": 10411, "stars": [], "topic_id": 11345, "date_created": 1300817582.8667979, "message": "@nbest937 I am skeptical at best of things like PASS.  It seems like they are reinventing the wheel in a lot of places that don't really require it.  I feel like provenance shouldn't be about recording every little thing that the computer does.    Maybe we can bank some of this discussion for the actual episode...", "group_id": 3435, "id": 409762}, {"user_id": 10421, "stars": [], "topic_id": 11345, "date_created": 1300829109.8039081, "message": "Oh, I missed all the stuff about THW and the vm until just now. It hasn't been taken down, but the link may be outdated. I'll double check. Here's one. We used lubuntu, so it was about a gig.  http://hackerwithin.org/thw/static/vm/", "group_id": 3435, "id": 411975}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1300833625.425822, "message": "http://reproducibleresearch.net/index.php/RR_links", "group_id": 3435, "id": 412507}, {"user_id": 23187, "stars": [], "topic_id": 11345, "date_created": 1300834364.834893, "message": "This may be of interest: http://www.jstatsoft.org/", "group_id": 3435, "id": 412587}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1300834973.3922801, "message": "From Victoria Stodden:\nhttp://www.ijclp.net/issue_13.html\nThe paper:\n\"Enabling Reproducible Research: Open Licensing for Scientific Innovation\"\nhttp://www.ijclp.net/files/ijclp_web-doc_1-13-2009.pdf\n", "group_id": 3435, "id": 412654}, {"user_id": 10421, "stars": [], "topic_id": 11345, "date_created": 1300847465.420583, "message": "@luisibanez thanks for the victoria stodden paper. I've just scanned it, but it looks like it provides a very coherent perspective on the hurdles to research reproducibility.", "group_id": 3435, "id": 413569}, {"user_id": 10411, "stars": [], "topic_id": 11345, "date_created": 1300864912.330514, "message": "Another open access journal: http://theoryofcomputing.org/", "group_id": 3435, "id": 415108}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1301154495.1089859, "message": "http://www.kitware.com/blog/home/post/105    http://blog.stodden.net/2011/03/19/a-case-study-in-the-need-for-open-data-and-code/    MUST WATCH VIDEO: http://videolectures.net/cancerbioinformatics2010_baggerly_irrh/", "group_id": 3435, "id": 446335}, {"user_id": 20326, "stars": [{"date_created": 1303104482.065258, "user_id": 10411}], "topic_id": 11345, "date_created": 1302752803.091521, "message": "I really wish this episode would have had someone with some level of skepticism about this approach to reproducibility. Anthony at times sort of played this role, but as moderator I think he stayed, well, moderate. \n\nI'm left wondering:\n - Isn't \"reproducibility\" shorthand for \"independent reproducibility\"? If we bundle everything into a VM we can (hopefully!) repeat computations and then get the same results\u2014this goes a long way to proving we didn't fabricate those results, but to what extent does this aid *reproducibility*, though? It seems like this could almost make us more likely to depend on environmental details. Reproducible results can work with alternate numerical details.\n - I guess we have to run the actual computation in the VM? Is the overhead worth it? If our resources are fixed and our computations expensive, do we have to give up expense other areas, like model size and verification procedures? Could that be a bigger obstacle to reproducibility?\n - Is the VM approach even amenable to how we have the ability to work on real-world high-performance computing resources?\n - Do VM software packages have mature hardware emulation for all hardware realistically used by scientists for computing? I'm thinking of GPUs especially.\n - When do we have to do this? Benchmark cases? Validation cases? Cases where we do science? Cases where we do engineering or consulting without doing new scientific research? All of the above?\n - Is the commercial software model completely incompatible with reproducible science? The proprietary software model?\n - Are these sorts of questions being addressed?\n\nDuring the podcast I felt like anyone who was skeptical of the proposed practices was being represented as motivated by stubbornness and counterproductive pride. I think there are actual conceptual and practical issues at hand which weren't addressed much in the episode and are important to how we do science.", "group_id": 3435, "id": 677588}, {"user_id": 21370, "stars": [{"date_created": 1303104521.1943901, "user_id": 10411}], "topic_id": 11345, "date_created": 1302786818.977771, "message": "There are enough skeptics (and apathetics) out there already. Their position is easy, they stay with the lazy option of \"It is too hard...\", \"we have done it this way for years...\", \"we are doing just fine...\". They are accustomed to the mediocrity of just publishing for their resumes, and have not much to contribute to the discussion.\n\n\nReproduciblitiy stands for both \"Self\" reproducibility and \"independent reproducibility\". Today, we don't have either of them. The fact that Self reproducibility is absent, speaks volumes about the lack of scientific education of many researchers, and the widespread practice of sloppy techniques.\n\n\n\"Reproducibility Starts at Home\":\n1) Can I repeat what I did yesterday ?\n2) How about what I did last year ?\nHow about my paper of 5 years ago ?\n\n\nI would like to see a survey of how many researchers can honestly answer \"Yes\" to that question, and then back their answer with a demonstration where they run their data of 5 years ago, and show the same results.\n\nResearchers who practice reproducibility in their own labs, will have no trouble then sharing the same recipe outside.\n\nThe VM option is a way of getting away from the EXCUSE, that \"it is too hard to replicate something that was run in another environment\".  With a VM, you have \"the same computer\".   Is there an overhead ? yes. It is 15%.  Does it matter ? NO. Even if the overhead was 300%. It is a lot smaller than the infinite overhead of NOT being able to replicate.\n\nThe bigger obstacle to reproducibility is NOT TECHINCAL, it is not hardware, it is not software, it is not Data. It is the HUMAN INERTIA. It is the lazy attitude of naysayers that can easily find reasons \"why this will not work\" or reasons \"why this will be too hard\". Such people shouldn't be doing scientific research. That's the wrong mindset for this business.\n\nI can't imagine Michelson sitting at his desk whining \"It is too hard to measure the speed of light\"... and then lazyly extending that argument to \"it can't be done\".\n\n\nHave VM matured enough ? YES.\nThey run your bank account.\nThey run your medical records.\nThey run the largest databases in the world.\n\nCan they simulate GPUs ? maybe not. So what ?\nIs that a reason for the remaining 99% of computational research to get cozy in the lazy option of \"it is too hard...it can't be done...let's keep living in mediocrity...\" ?\n\n\nYou ask about cases where we do \"science\"...\nLet me repeat this again:\n\n   WE ONLY DO SCIENCE WHEN WE MANAGE TO REPLICATE.\n\nThat IS science. That's by definition the \"scientific\"\nactivity.  Science is not \"publishing\", science is not wearing white coats, not writing apprently complicated equations. Science is the systematic method of gathering knowledge by building up hypotesis and testing them in REPRODUCIBLE experiments.\n\nIF YOUR EXPERIMENTS DO NOT REPLICATE THEN YOU ARE NOT DOING SCIENCE.\n\nThe fact that people can still publish without replication, is simply an indication of the decadent corruption of the publishing establishment, and a reason why we have to get rid of it.\n\nCommercial applications are perfectly fine, as long as abusive licenses do not get on the way of replication and free distribution of information. Have you read the license of the commercial applications that you use ? Did you notice the clause where you authorize them to inspect all your computers with a 10 day notice ? or the clause where you are required to ask for their perimission before you publish benchmarking studies ?\n\n\nThe obstacle to reproducibility is not technical. It is simple lack of scientific education. Young researchers have been corrupted by the people in my generation, with the narrow-minded motto of \"Publish or Perish\", that serves only the economic interests of publishers.\n\nI have a new motto:\n\n      \"REPLICATE OR DISAPPEAR\"", "group_id": 3435, "id": 684325}, {"user_id": 11246, "stars": [], "topic_id": 11345, "date_created": 1302799837.350688, "message": "A higher level of reproducibility as @mikegraham pines for is desireable.  For\ninstance, code that is written closer to a library format is more easily\napplicable to new cases than code that is one-off with hard-coded parameters.\n\nBut, as @luisibanez, states, there is a lot of low-hanging fruit that needs to\nbe picked before we reach that stage.  In my experience, the majority of\nresearchers are not capable of self-reproducibility.\n\nFrankly, there is a lot of dishonesty out there of one degree or another, and\nthis would shake out those practices.  There are also many inadvertant bugs.\nThese facts are what truly underlie the open data/reproducibilty resistance.\n\nIt is also about progress.  For \"independent reproducibility\" to take place,\nreproducibility in the same environment is almost a necessity.  The \"publish or\nperish\" model is good at propagating careers, not at propagating knowledge.  The\ntrue impact of many works is transient at best.  Like the unequal lasting impact\nof open source software versus proprietary software, reproducible research has a\ngreater impact.\n\nTypical scenario: a graduate student comes in to the lab, receives a pile of\nundocumented spagetti code from the prior student.  They spend a couple of years\ntrying to reproduce the results from their predecessor to no avail.  They must\nwaste time redo-ing what was done, with the faint hope of getting it to work.\nOn top of that, they are charged with making improvements.  However, there are\nno incentives to do it right, validate their results, make their work\nreproducible, because the time is not rewarded or required.  The cycle continues\nwith poor quality research and progress at a snail's pace of what it could be.\n\nA VM may be the way to go, but I have serious doubts over the viability of\ntransmission over the internet.  Someone may easily download 15 article PDF's in\na day, but downloading 15 VM's in a day is not feasible.  A layered model where one\ncan download the article, then the source code, then the data, then the VM based\non their interests and needs may work better.\n", "group_id": 3435, "id": 687214}, {"user_id": 21310, "stars": [], "topic_id": 11345, "date_created": 1302799676.7725451, "message": "I agree that a huge part of the problem is human inertia, and maybe a fear of \"complications\" -- having to admit there are issues with the code and having to spend more time debugging and figuring out a better way to do it, delaying the paper... And of course the rest is just inherited culture -- students learning from mentors who don't enforce reproducibility, usually won't get into it by themselves, and I don't think they should be entirely blamed for that. You can argue that they should be thinking about how they're doing what they're doing, and not just miming their predecessors like monkeys.That it's a question of intellectual rigor vs. laziness. But give me a break. It's kind of hard being the one grad student in the lab who has to break the mold when you've got an professor breathing down your neck pressing your for results and paper drafts and everyone else is doing it \"the company way\". The only solution is usually to do it on your own time, but you're already doing loads of work on your own time. There's no \"own time\" left. \n\nAs a side note, I worked in experimental microbiology for almost ten years, and I can guarantee that even there, where you'd think reproducibility would be far more deeply ingrained, you will see reproducibility issues being hand-waved away so the paper can get done. Then when people have trouble reproducing results, the original authors will blame reagents, materials, different \"hands\"... And because those classically do account for a lot of differences in results, it's very difficult to evaluate where the truth lies. Plus there's all the little protocol tweaks that aren't described in the paper, which is sometimes done on purpose to slow down competitors. That is probably the worst aspect of it all...", "group_id": 3435, "id": 687186}, {"user_id": 21310, "stars": [], "topic_id": 11345, "date_created": 1302813178.1697969, "message": "@spidr I like the idea of a layered model a lot more than VMs. I feel that part of the problem with the VM approach (much as others have commented earlier) is that it allows complacency and poor coding practices in the sense that people can get away with doing things like hard-coding parameters and so on. It seems to me that the layered model would better enforce good practices like total separation of the logic and the content. It's not that difficult to keep things separate when you're writing the code; it's a lot harder to do after the fact when updating or improving the tool. The scenario of the poor grad student wrestling with spaghetti code is spot on in that regard.", "group_id": 3435, "id": 690288}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1302817304.8035009, "message": "@gglobster Even more reasons for enforcing reproducibility verification. A couple of publishers are waking up and raising up to the challenge. See for example, the brand new BiomedCentral journal http://blogs.openaccesscentral.com/blogs/bmcblog/entry/open_network_biology_and_the", "group_id": 3435, "id": 691571}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1302818079.3282881, "message": "@spidr  Sure, VMs do not have to be the only solution. Again, the problem is\nnot the technical one. The problem is one of commitment to reproducibility.\nOnce we are committed to reproducibility, that goal can be achieved with BAT\nscripts, or bash scripts, or python scripts, or R and Latex, or CMake and\nC++... Jon Claerbout did it 30 years ago with plain Make and C.\nhttp://sepwww.stanford.edu/data/media/public/sep//jon/reproducible.html.\nTechnology is not the problem.  The problem is to wake up the hordes of zombies\nthat have grown accustomed to publish rubbish that none can replicate, and who\nthink that such practice is acceptable.   They are surrounded by the complicit\nregard of reviewers and readers who settle for the mediocrity of an obsolete\npublishing system.\n\nI'll be the first one to celebrate the creation of common software platforms\nwhere clean, well designed and well documented code is available for all to\nshare and develop. However, I don't think we can use its absence as a excuse to\npostpone reproducibility.  Anyone can TODAY create reproducible reports with the standard tools available in a Linux machine. There is no excuse for starting NOW. It takes 10 minutes to create a repository in github. Then you can go and populate with makefiles, latex and C, or C++, octave, R,....   We have been running the Insight Journal", "group_id": 3435, "id": 691721}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1302818176.6499851, "message": "@luisibanez for 5 years with technology that was available 10 years ago. http://www.insight-journal.org.  All the technology is out there.  Educating researchers in the basics of the scientific method is what we are missing.", "group_id": 3435, "id": 691730}, {"user_id": 29606, "stars": [], "topic_id": 11345, "date_created": 1302821920.7652659, "message": "@spidr hi i'm the author of CDE, please email me personally if you want to chat more about possible applications.  i think it's quite usable for the goal you have in mind.  you can find my contact info at http://www.stanford.edu/~pgbovine/cde.html", "group_id": 3435, "id": 692206}, {"user_id": 29606, "stars": [], "topic_id": 11345, "date_created": 1302822439.8422799, "message": "@luisibanez please email me personally if you'd like to chat about possible applications of CDE.  thanks!", "group_id": 3435, "id": 692269}, {"user_id": 21370, "stars": [], "topic_id": 11345, "date_created": 1302978816.9910541, "message": "@pgbovine Excellent, I just email you directly. I'm looking forward to use CDE for the next generation of the Insight Journal.", "group_id": 3435, "id": 712913}, {"user_id": 21370, "stars": [{"date_created": 1303106245.3766551, "user_id": 10411}], "topic_id": 11345, "date_created": 1303050964.824219, "message": "Reproducible Research Workshop:\nTools and Strategies for Scientific Computing\nhttp://www.mitacs.ca/goto/amp_reproducible\n\nJuly 13-16, 2011\nUniversity of British Columbia in Vancouver.\n\n\nCommunity Forum on Reproducible Research Policies\nhttp://kingkong.amath.washington.edu/rrforum/\n\nDescription of the forum\n========================\nWe are inviting a number of people from the editorial boards of journals, leadership\npositions in professional societies, and program managers from funding\nagencies. Our goal is to facilitate a discussion of issues such as:\n\n* What is the meaning of \"reproducible research\" in computational science?\n* How is computational research best preserved?\n* What policies will encourage reproducibility?\n* What should journals require and/or provide to accompany computational publications?\n* Should funding agencies support public databases for code and data?\n\nMore at\nhttp://kingkong.amath.washington.edu/rrforum/", "group_id": 3435, "id": 716869}, {"user_id": 20326, "stars": [{"date_created": 1303104616.448612, "user_id": 10411}], "topic_id": 11345, "date_created": 1303071664.0595169, "message": "@luisibanez I really appreciate your time replying to me and your clear passion. As I said in my original post, I felt like any skepticism was met with negative characterization and perhaps even strawmanning. My questions were based on whether using the solutions presented in the podcast might be ineffective or counterproductive to having reproducibility; I feel like you've addressed mostly things I didn't say in your reply to me. In particular, it seems to me like you repeatedly respond to my questions of the tone \"Is this a good idea?\" with responses like, \"Why are you whining about how hard this is?\"\nIt seems evident that we think of the term \"reproducibility\" differently. Being able to re-do an exact thing you've done seems to be what I would call \"repeatability\" or something; I think of independent reproducibility (by yourself or others) to be what I call \"reproducibility\", and this idea to be the bigger hallmark of science. I apologize for any miscommunication our different terminology for these companion concepts have caused.\nI realize you will not agree with my values here, but I think your attitude to dismissing VM performance and compatibility issues is too dismissive. As a field, science needs to balance rigor and progress. Currently, I agree that the level of rigor is too low in the area of reproducibility, but I'm not sure this element of the solution best aids reproducibility, and if it does I find it *extremely* important to consider performance and compatibility. Scientists often operate at the edge of their resources and even at the edge of computing in general. (I know I regularly run simulations on some of the worlds biggest supercomputers.) You cannot just pull out a number like \"15%\" and drop the issue or look at incompatible next-generation technology or computing resources and say \"forget them\". Discovery requires pragmatism.\nI think you might have misunderstood what I meant by \"do science\", so I apologize for not explaining myself better. I am an engineer, so I run simulations both in the pursuit of research (doing science) and in the process of synthesis (doing engineering). Different rules and types of rigor apply to these pursuits, and I was hoping for some logical way to think about how to approach the two.\nI'm actually excited about the prospect of improving reproducibility in scientific computing and raising standards in this area. I appreciate all the effort you seem to be expending for it.", "group_id": 3435, "id": 718385}, {"user_id": 10411, "stars": [], "topic_id": 11345, "date_created": 1303106354.855767, "message": "@luisibanez I also really appreciate you defending your opinions here and fielding mike's questions while I was out.  Would you be interested in continuing this conversation on an episode as well?", "group_id": 3435, "id": 723021}, {"user_id": 10411, "stars": [], "topic_id": 11345, "date_created": 1303106135.8799641, "message": "Jeeze, I leave for a week and this level of epic-ness happens.\n\n@mikegraham I like the distinction that you make between reproducibility and repeatability.   They are in fact two separate issues.   Nor do I believe that we are anywhere close to to reaching palatable solutions to either.  Frankly, I am not even certain that repeatability is the weaker condition to reproducibility, as you seem to imply.\n\nI personally have a couple of conflicting opinions that I think sum up the trouble in this space nicely:\n\n* Until the reproducibility/repeatability tools are ubiquitous (everyone is doing it) and invisible (no one thinks about doing it while it is happening), then the tools will be too difficult or annoying to be used by most computational scientists.    Take version control and testing as examples of successes here.\n\n* Enforcing people to use tools or a certain stack limits innovation.  I was jumped on during the episode for saying this.  If I hadn't practically invented things like version control, numpy, and cloud systems on my own years ago, I would not have the same respect or understanding for these tools that I do now.  And every once in a while, someone comes up with something better than what is out there...\n\nIn response to your original post, I will say that the idea of proprietary code runs counter to the idea of science.  If something is open source, it is not science.  If something is open source, it may be science.\n\nI feel the same way about proprietary journals.  Time was not that long ago when the publisher actually needed to physically print the journal.  The physical object was a commodity that could be traded for money/goods/services.  Since not many people desired academic discourses on a particular topic, there was a premium.\n\nHowever now, electronic versions of code and journals are far more important.  In an economic sense they are worthless because they are effectively infinitely copy-able.  Because an infinite number of copies of a journal article or a piece of code could be made, the per-unit value is zero!\n\nThe historical reason for the premium has disappeared.  Openness is far more valuable to the world then extortion by the few.   \n\nAlong with reprodicibility / repeatability comes the expectation of free exchange of information.  Technical achievements (and therefore some aspects of engineering) can be made in a closed fashion.  But it is not science in a strict sense if not everyone can go and attempt to verify the results.\n\nMy 200 cents...  I really appreciate your criticisms on this topic.  Would you like to do a show about it ;) ?", "group_id": 3435, "id": 722963}]