Monday, February 16, 2009

Method history - time dimension of code repositories

Recently I've been looking into various aspects of integration between various version control systems and Itellij IDEA - one of the best Java IDEs around. Although they did a great job with their new version in terms of performance and feature set, but I feel that VCS is conceptually on the same level where VisualStudio with MS SCC API was decade ago. Yes, JetBrains have added integration of changes between SVN branches, but frankly speaking, do you really need this is your IDE or external tool will work for you?
I feel that we need more fresh ideas in this space. Something that you typically don't expect, out of version control like git-bisect. For instance, it would be much more interesting if I will be able to view revision history on the level of individual methods in Java code, rather than on the file itself. This way I would be quickly answering questions like "who and why wrote this piece of s..t?". This question very often comes to me even if I'm hacking in my own code, not only someone else :) Git-blame is similar, but it does not understand the programming language (which IDEA does very well), so it is operating on the level of lines, rather than meaningful pieces of code.

Telescopic text – a form of lightweight hypertext

I don’t like helps and software documentation. I’m only looking there if there is no other choice. And I’m getting really frustrated reading it, because typically the assumption that every technical writer is doing about reader’s knowledge is wrong when it’s applied to me (bad luck?). I’m either know too little (so the text is cryptic) or they are trying to explain me everything from the first grade of elementary school so I’m spending ages scrolling down dozens of pages in order to find what I’m looking for.
Wouldn’t it be nice to have a system, which shows you condensed high-level description, and if you don’t know something, you drilling down there? Sounds like a hyper-text and Wikipedia is doing that for ages? Not exactly, when you are clicking on some term, you are leaving the page that you are reading at the moment. Typically reader wants to stay on the same page, but get more details here, right in the text. Recently I found an example of such approach called telescopic text, where you can click on the piece of text and "expand" it. So the technical writer would be creating a text with maximum details assuming that the reader is an idiot. After that one can markup some pieces of it into "collapsible" chunks. Well, it is probably not that simple, because we need normal text on any level, so some form of morphoanalysis might be involved, but it should be doable at least for English. System should be able to identify correlation between "expand" behavior and act proactively – if someone doesn’t what is the "database", we can automatically expand "first normal form" as well. Ideally the system should be able to learn user's level of knowledge in the specific area and present text in the form most conveniently suitable for her (e.g. minimum amount of expand/collapse needed).
Learning might finally get more fun. :)

Saturday, August 09, 2008

Web-based IDE for Ruby on Rails

I was following a couple of web-based IDEs like CodeIDE or ECCO. All of them seem to be more toys rather then actually useful tools. My understanding is that this is partially because of modest project goals and partially - limitations of technology. Let me expand why I believe it can work.
The advantage of web-based IDE is obvious – anyone, who is doing software development for a living knows how many you need to do before start coding. The idea of having web access to already configured environment where you can contribute anytime you want from any browser looks attractive. But the limitations are obvious as well. Do you think it will work in C/++ world? I really doubt that. Even with modern web-based sharing solutions like WebEx that does not sound realistic. Will it work for huge JavaEE project? Well, easy processing of large text amounts in Google Docs or Adobe Buzzword still not comfortable (yet :). We need a technology that produces output, viewable in browser and does not require megabytes of coding. It looks like Ruby on Rails sounds like an ideal choice (especially taking in mind that good RoR IDE is still missing).
Imaging being able to issue all rails commands with nice and clean UI, have a nice refactoring code editor in flash and even ability to debug RoR application. Of cause UI is the most trickiest part – it will not be possible to mimic standalone IDE, we need to do the same tweak that GMail creators did when they were re-thinking classical email client folder tree. But this certainly doable and it would minimize threshold for contribution in open source projects.
It is also very important to provide seamless integration with version control. In fact, the whole development process can be built around tasks, and task descriptions can act as a descriptions for check-ins (and thus provide meaningfull history of changes).
Also I assumes that this would broaden the audience of developers, so we need to make sure that only quality pieces of code gets to the repository. We need to have good automatic checks built-in the system, that prevents check-in of bad code (similair to FxCop in TFS) and we need to provide a possibility to manually review check-ins before integrating them into main branch (that's why git might have advantage over svn)

Sunday, May 25, 2008

Feature elimination in RapidMiner

Recently I've discovered RapidMiner - data mining application. It has quite good, but huge tutorial, where you can read section 3 and 4 to get the impression of what this tool can do for you. Essentially, as input it gets sample data, where each instance is described as a vector of features (e.g. size, weight, color) and indication to which class this instance belongs to. The processing typically consists of pre-processing (converting of feature values in more appropriate format, eliminating features, that does not bring any additional information, etc.), machine learning (it has quite a few algorithms) and post-processing. In the software, you have something similar to IDE, where you can visually construct process from the existing building blocks. Typical problem with this approach is how to map outputs of previous block to inputs of the succeeding one. They seem to choose the most simplest solution, assuming that all outputs has certain type, and there can be only one input/output of certain type. Obviously they were targeting on data mining specialists, so user should not worry about such details.
I was trying to apply it to my task, which essentially is a subset of the whole process. I have a vector of features and I know for sure the way to classify them - take maximum value among features (which is either -1, 0 or 1) and use it as class. I need to get smallest possible subset of features that still get 100% match with initial classification. It appears that there are 2 ways of doing that in RapidMiner:

  • Directly alter Model file saved on disk - extremely difficult because model file is just a serialized xml-dump of java class, so even the smallest one is 70K
  • Use ability to generate new attributes based on existing set as a model

And here I faced the problem, which is typical for all WYSIWYG editors, that helps you to build your program from building blocks. As long as you are doing samples - everything work fine. If you need to do one step off-road, you are in dust. I can create a new attribute, but it will have type regular, but all validators expects to see labels which has type predicted(label). And user is not intended to change types of attributes, because it is dangerous.
I'm apologizing if there is a simple way of solving my task, but I think it is a very good illustration of how programming works. If you are constructing a new development environment, and even if you are targeting on non-developers, you need to make sure that there is "advanced" way, where you have no constraints. No one can predict all the cases how your system will be used. Just hide a candy deep enough so only geeks can find it.