Your code as a cloud, word cloud that is...

What if you needed to look at your code at a high level? To try to get a feel for what's there? What if you could take a "word/tag cloud" and apply it to your code?

Say maybe Kinect SDK samples?

image

Or maybe you'd like to see some text as a cloud?

Like the Kinect SDK Readme's?

image

And what if you wanted to get the code that does this for you code?

Source Code Word Cloud Generator

Generate word cloud form your code to see what your code is about and what it does. A word cloud is a set of randomly arranged keywords, variable and class names etc. used in your code. The size and the color of each word expresses it's usage frequency. Rarely used words are small and pale. It might give you a hint about how good or bad your code base is and how to improve it.

Currently supports languages: c#, Java, VB.NET. [GD: and the latest drop also included Text/TXT files]

image

Motivation

Recently during a seminar Kevlin Henney showed us a post form Phillip Cal├žado's blog - See How Noisy Your Code Is http://fragmental.tw/2009/04/29/tag-clouds-see-how-noisy-your-code-is.

The idea behind it is very simple. A tag cloud (word cloud) is a visual representation for text data. Words are usually placed on some rectangular area and the importance of each tag is shown with font size and/or color. This format is useful for quickly perceiving the most prominent terms in analyzed text. Wordle http://www.wordle.net/ is one of the free tools to build such clouds. You can paste any text or a website URL and in a few seconds you get an idea what the website or text is about.

Reading the Tag Cloud of your Code Base

So if you take your code remove comments, literals, block some very common words (like company name) and generate a word cloud of it, you will get an interesting picture to discuss with your colleagues in a coffee corner.
  • If words "if", "then", "else", "switch", "case" are first what you see - your code is sprinkled with conditionals!
  • Is "string" in your words top 10 ? - Congratulations if you write text processing software, otherwise in might be a bad smell.
  • Are you writing API or a library so you should see word "public" in front rows. If you are not working on a library or API, the word public might be a signal to think on better protection.
  • Do you see your classes at first glance or are they far away in background? Behind "int", "byte", "array" etc.? Is your code in your domain language?

 

Usage note: If you're not getting much of a result, your word cloud seems pretty empty, try adjusting the max font size.

image

For example, here's the same code with 72 (the default);

image

And 24;

image

(The default value of 72, on the system I was using it on, threw me... I thought I was doing something stupid because the results were so limited... doh!)

There's two areas I found very interesting in the project, the text analysis and the cloud generation.

image

Here's a snip from the text extraction;

public class TextExtractor : BaseExtractor
{
     public TextExtractor(IEnumerable<FileInfo> files, IProgressIndicator progressIndicator)
         : base(progressIndicator)
     {
         Files = files;
     }

     protected IEnumerable<FileInfo> Files { get; set; }

     public override IEnumerable<string> GetWords()
     {
         foreach (FileInfo fileInfo in Files)
         {
             ProgressIndicator.SetMessage(Shorten(fileInfo.FullName, 60));
             using (StreamReader reader = fileInfo.OpenText())
             {
                 IEnumerable<string> words = GetWords(reader);
                 foreach (string word in words)
                 {
                     yield return word;
                 }
                 OnFileProcessed();
             }
         }   
     }

SNAGHTML5b84057d

A snap of the entire Solution;

image

By the way, if you're interested in just the word cloud control itself, as a stand alone item and/or the details and logic behind it, please check out, Word Cloud (Tag Cloud) Generator Control for .NET Windows.Forms in c#

Follow the Discussion

Comments Closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.