Google Verification

google-site-verification: googleebc67622dfac97fe.html

blogit ergo sum

25 December 2011

Verification

google-site-verification: googleebc67622dfac97fe.html

15 January 2011

My Web Browsing Turned into a Newspaper

My life is an open book.
What the machine learning people I follow on Twitter are reading.
And a custom paper.

18 October 2010

The Effect of Test Set Selection on Classification Accuracy

I was looking at some prediction results for the UCI Michalski and Chilausky soybean data set and wondered how they depended on test set selection. Some had classification accuracy as high as 93.1% accuracy on a 25% training set and  97.1% on 290 training and 340 test instances.

A few weeks ago I had been asked to find the best classifier for the soybean data set based on test set of 20% of the data. The remaining 80% could be used for training. That gave 306!/(245!x61!) = 1.3 x 10^65 possible splits of the 306 data points into training and test sets. Could some of these splits lead to better results than others for the classifiers I was about to use?

The WEKA data mining package was used for classification. WEKA has many classifiers that can be run on a data set and their performance to be compared.

WEKA also has a programming interface so I used it to write some Jython tools to explore the performance of a range of classifiers.

One of these tools was run on the soy-bean data to find the training/test splits with best and worst classification accuracy. The results were

Classifier Best Accuracy Worst Accuracy
Naive Bayes 100% 70.5%
Bayes Net 100% 75.4%
J48 (C4.5) 95% 69%
JRip (RIPPER) 98.4% 70.5%
KStar 96.7% 65.6%
Random Forest 95% 62.3%
SMO (support vector machine) 96.7% 82%
MLP (neural network) 100% 77%
Fig1. Best and worst accuracies for selected WEKA classifiers run on different training/test splits


That was quite a range of test set accuracies for different training/test splits. My simple genetic algorithm may not have found the extremes of the distributions so the actual range may have been higher.

When I ran the test set selection script a second time (Fig 2) it found a 100% SMO accuracy. The second test was set up to find a single training/test set split that gave best results for all classifiers at once. It also had a slightly different pre-processing. The 4 duplicate instances were removed and the troublesome single 2-4-5-t sample was left in. Therefore I expected it to give worse results thanthe pre-processing used for the results in Fig 1.

Classifier Correct (out of 60) Percent Correct
Naive Bayes 57 95 %
Bayes Net 59 98.3 %
J48 58 96.7 %
JRip 60 100 %
KStar 60 100 %
Random Forest 59 98.3 %
SMO 60 100 %
MLP 60 100 %
Fig2. Best accuracies for selected WEKA classifiers all run on the same training/test split

Both the above results were for the default settings of each of the WEKA classifiers. The WEKA classifiers all have parameters that can be tuned and it is possible to select subsets of attributes so they can give better and much worse results than the defaults. However the default parameters are usually close to the best so these are probably good indicators of the best best possible accuracies.

It appears that the training/test split of a data set can change classification accuracy by more than 30%. This was observed on a well-known and widely used classification data set.

07 September 2010

13 August 2010

Wrote a Blog Post

I have not posted here recently but I wrote blog post for PaperCut last week.

20 March 2010

Blogger supports logical symbols

e.g. ¬(A ∨ B) ⇒ ¬A ∧ ¬B 


All symbols: ¬, ∧, ∨, ⇒, ⇔

04 January 2010

C++ Continues to Surprise

Someone was asking questions about const_cast<>() a few days ago. I was not quite sure how it would work because I try to use as little of the C++ language as possible and it possible to get  by in C++ without const_cast<>().  To find out exactly how it worked I tried it out with a test case. The following code gave the same output on g++ on Vista and OS X.


int i = 3;
const int* ptr = &i;
*const_cast<int*>(ptr) = 11;
if (&i == ptr && i != *ptr) {
     std::cout << "Cannot happen: &i=" << &i << " == ptr=" << ptr << " but i=" << i << " != *ptr=" << *ptr << std::endl;

}
The output in both cases was Cannot happen: &i=0x22fe6c == ptr=0x22fe6c but i=3 != *ptr=22

How can a single memory address hold two different values?

The disassembly was

push  %ebp
mov   %esp,%ebp
sub   $0x18,%esp
int i = 3;
movl  $0x3,0xfffffffc(%ebp)            (i in bp-4)
const int* ptr = &i;
lea   0xfffffffc(%ebp),%eax            (&i in eax)
mov   %eax,0xfffffff8(%ebp)            (ptr in bp-8)
*const_cast<int*>(ptr) = 11;
mov   0xfffffff8(%ebp),%eax            (ptr in eax)
movl  $0xb,(%eax)                      (*ptr set to 11)
if (&i == ptr && i != *ptr)
lea   0xfffffffc(%ebp),%eax            
cmp   0xfffffff8(%ebp),%eax
jne   0x403214 
mov   0xfffffff8(%ebp),%eax 
mov   (%eax),%eax
cmp   0xfffffffc(%ebp),%eax
je    0x403214

The disassembly matches the C++ code. i is stored at bp-4 and ptr is stored at bp-8 so the C++ code should work. The observed behaviour does not match the disassembly.

This cannot be right. I guess I found a bug in g++.

27 October 2009

My Time in Sweden

I lived in Sweden from 1988 to 1991. Here is a map which showed where I lived in Stortorget in Gamla Stan in Stockholm.


View Larger Map


I lived in the red building in the two left photos below which are taken from Stortorget. The photo on the right is of the same building taken from Kåkbrinken, the alley to the left the red building.
  

The photo on the left below is the main street in Gamla Stan and the photo on the centr is of the Grand Hotel as seen from the shore of Gamla Stan and the photo on the right is Karloniska Hospital where I worked.
  

After I left Stockholm I moved to Umeå which is shown on the left below. When I lived there I used to visit Vaasa in Finland shown on the right.
 

When I lived in Sweden I took vacations in Norway including Lofoten on the left and Tromsø in the centre of the row of photos below. I also took the Hurtigruten
  


Photo Credits
Milton CJ
StortTorget - Gamla Stan Originally uploaded by Roberto1956
Gamla Stan Originally uploaded by L e n o r a
Downtown Gamla Stan Originally uploaded by calvaryzone
2007_0331finlandsweden0120 Originally uploaded by Arend 
Karolinska Hospital park on a sunny morning Originally uploaded by tove!
Kuester
Lofoten ... fjord panorama (HDR) Originally uploaded by nigel_xf
From our hotel window - taken at midday (that's the moon!) Originally uploaded by Squiz1210 (probably back in January 09!)
Hurtigruten Originally uploaded by klutts
 Vaasa in december 1 Originally uploaded by Foide
djurgarden Originally uploaded by daniel dssd

13 October 2009

Machine Learning While I Work

I am setting up Postfix so I have spare time as I try things out. This post is about the things I am reading or watching in the background.

Taskforce on Context-Aware Computing
I went to a lecture called  Open Mobile Miner (OMM): A System for Real Time Mobile Data Analysis. There is a video here, a description of OMM here and lecture slides here (pdf).

Shonali Krishnaswamy's group are making software that does some analysis of data on a smart phone before uploading it, thereby reducing the phone's power consumption by reducing communications. Their examples include ECG output, traffic congestion metrics and taxi location data. The data in their examples is scalar and sampled at 0.5 Hz or less so it is hard to see why a simple store-and-forward scheme would not achieve much the same thing. I guess I need to read their publications more deeply.



Statistical Learning as the Ultimate Agile Development Tool
by Peter Norvig is an overview of modern practical machine learning. The summary is focus on the data, not the code.



Learning Theory
by Mark Reid was an introduction to some theoretical aspects of machine learning presented in a summer school in Canberra in January 2009.


Now some videos of how machine learning can be applied to models of the face.

Changes of facial features on the of dominance, trustworthiness and competence dimensions in a computer model developed by Oosterhof & Todorov (2008).


Now it is time to start watching a video on distributed computing

Swarm: Distributed Computation in the Cloud from Ian Clarke on Vimeo.

18 September 2009

My First Upside Down Post

˙ʇǝsdıɥɔ pǝsɐq ɯɹɐ s’ǝןɐɔsǝǝɹɟ 'uoƃɐɹpdɐus ɯɯoɔןɐnb 'ɯoʇɐ ןǝʇuı ˙sɹɐʍ ɹossǝɔoɹd ˙sʞooqʇǝu puɐ sǝuoɥd ʇɹɐɯs ɟo ǝɔuǝƃɹǝʌuoɔ ǝןqıssod ˙ǝƃuɐɥɔ ǝʌıɹp ןןıʍ ǝɔuǝɹǝɟɟıp ǝɔıɹd 000'1$ ˙sʌ 052$ ǝɥʇ ˙pǝʇıns-ןןǝʍ os ʇou sǝop pɹoʍ ʇɟosoɹɔıɯ ǝןıɥʍ pnoןɔ ǝɥʇ ɯoɹɟ ןןǝʍ sʞɹoʍ ʎpɐǝɹןɐ ǝɹɐʍʇɟos ɹǝɥʇo puɐ uozɐɯɐ 'ɯoɔ˙ǝɔɹoɟsǝןɐs 'ǝןƃooƃ ˙ǝɹɐʍʇɟos ʇɟosoɹɔıɯ ǝɥʇ doʇdɐן ǝɔɐןdǝɹ ʎɐɯ pnoןɔ + ʞooqʇǝu os ƃuıʇndɯoɔ pnoןɔ oʇ pǝʇıns-ןןǝʍ ǝɹɐ sʞooqʇǝu ˙ʇɟosoɹɔıɯ puɐ ןǝʇuı oʇ ʇsoɔ ʇɐǝɹƃ ʇɐ sǝןɐs doʇdɐן 0001$sn ǝzıןɐqıuuɐɔ ʎɐɯ sʞooqʇǝu 052$sn ɯɹǝʇ ɹǝƃuoן ǝɥʇ uı ʇnq ǝıd ɔd ǝɥʇ ƃuıʍoɹƃ ǝɹɐ sʞooqʇǝu ʎןʇuǝɹɹnɔ ¿uıɐɥɔ ǝnןɐʌ doʇdɐן puɐ ɔd ǝɥʇ oʇ op sʞooqʇǝu ןןıʍ ʇɐɥʍ ˙sǝןɐs ƃuıʇsıxǝ ɟo uoıʇɐzıןɐqıuuɐɔ pnoןɔ ǝɥʇ oʇ ʇuǝɯǝʌoɯ ɟo sǝɔuǝnbǝsuoɔ ǝɯos sɹoʇıʇǝdɯoɔ ɹıǝɥʇ ɹǝʌo ǝƃɐʇuɐʌpɐ ƃuıɔıɹd ɐ ǝʌǝıɥɔɐ ןןıʍ sıɥʇ ǝpıʌoɹd uɐɔ oɥʍ sɹǝıɹɹɐɔ uoıʇɐɔıunɯɯoɔǝןǝʇ ǝɥʇ ˙suoıʇɔǝuuoɔ ʞɹoʍʇǝu ǝןqɐʇɹod puɐ ǝןqɐıןǝɹ 'ʇsɐɟ sǝɹınbǝɹ ƃuıʇndɯoɔ pnoןɔ ʎɥdɹnɯ ˙ɹɯ sʎɐs „'ɹǝʇʇǝq ʇnq ƃuıɥʇou uǝǝq s,ʇı 'dɯnɥ ƃuıuɹɐǝן ǝɥʇ ɹǝʌo ʇoƃ ǝʍ ǝɔuo ˙sǝƃɐʇuɐʌpɐsıp ןɐǝɹ ʎuɐ ɥʇıʍ dn ƃuıɯoɔ pǝssǝɹd-pɹɐɥ ǝq p,ı ˙ǝɹoɯ ʎuɐ suoıʇɐןןɐʇsuı ǝɹɐʍpɹɐɥ ןɐɔısʎɥd ɹoɟ ƃuıʇıɐʍ sʎɐp puɐ sɹnoɥ puǝds ʇ,uop ǝʍ„ ˙sɹǝʇndɯoɔ uʍo sʇı ɟo ǝuou ɥʇıʍ ʎuɐdɯoɔ ʇuǝɯdoןǝʌǝp qǝʍ ɐ ǝɯoɔǝq sɐɥ ʇı - ǝɹɐʍpɹɐɥ ןɐɔısʎɥd ɹoɟ ʇno ƃuıʞɹoɟ pǝddoʇs sɐɥ ʇı suɐǝɯ ɥɔıɥʍ 'ǝɔıʌɹǝs ןɐnʇɹıʌ ǝɥʇ ɹoɟ ɹnoɥ ɹǝd sʇuǝɔ 08 oʇ 01 sʎɐd ǝƃuɐɹo ʎɔınɾ ˙ǝɯoɔǝq sɐɥ ƃuıʇndɯoɔ pnoןɔ ɯɐǝɹʇsuıɐɯ ʍoɥ sǝʇɐɹʇsnןןı (9002 qǝɟ 02 ǝƃɐ ǝɥʇ) ɯɐǝɹʇsuıɐɯ ǝɥʇ spuǝɔsɐ ƃuıʇndɯoɔ pnoןɔ ”˙ƃuıʍoɹƃ ןןıʇs sı ןɐɹǝuǝƃ uı ƃ3“ ˙sʇdǝɔuoɔ pɹɐʍɹoɟ ʇɐ ןɐdıɔuıɹd 'ssnɐɹʇs ןןıʍ pıɐs ”'sʇods ʇɥƃıɹq ǝɹɐ ǝsǝɥʇ“˙˙˙ ˙pǝʌɹǝsqo ǝʌɐɥ sʇsʎןɐuɐ 'ǝɔuɐɥɔ ƃuıʇɥƃıɟ ɐ ǝʌɐɥ sɯǝpoɯ ƃ3 puɐ sdıɥɔ ɥʇooʇǝnןq puɐ ıɟ-ıʍ 'sdƃ ǝʞɐɯ oɥʍ sɹopuǝʌ puɐ — ɥʇʍoɹƃ ʇɐɥʇ ɟo ɥʇƃuǝɹʇs ǝɥʇ uo ʎɹɐʌ sʇsʎןɐuɐ — ɹɐǝʎ sıɥʇ ʍoɹƃ oʇ pǝʇɔǝɾoɹd ǝɹɐ sǝןɐs ǝuoɥdʇɹɐɯs ˙pɐǝɥɐ ɹɐǝʎ ǝɥʇ uı sʇods ʇɥƃıɹq ƃuıʞǝǝs puɐ sɥʇƃuǝɹʇs ǝɹoɔ oʇ ƃuıʞooן ǝɹɐ sɹopuǝʌ dıɥɔ ʇsoɯ 'ǝןıɥʍuɐǝɯ˙˙˙ ˙sʇǝʞɹɐɯ ʍǝu uǝdo oʇ ʎɐןd ǝuoɥd ʇɹɐɯs ɐ ƃuıɹɐdǝɹd ǝq oʇ pǝɹoɯnɹ sı ˙ɔuı ןןǝp ɹǝʞɐɯ ɔd ˙ǝuoɥdı s’˙ɔuı ǝןddɐ ʎq pǝʌɹǝs ʇǝʞɹɐɯ ǝɥʇ ɟo ǝɯos qɐɹƃ oʇ ƃuıdoɥ 'ʍoןs sǝʇɐɹ sǝןɐs ɔd sɐ spıɯ pǝʇǝƃɹɐʇ sɐɥ — ɹǝʞɐɯ dıɥɔ ʇsǝƃɹɐן s’pןɹoʍ ǝɥʇ — ˙dɹoɔ ןǝʇuı sɐ 'uoıʇıʇǝdɯoɔ ǝʌɐɥ ןןıʍ ɯɯoɔןɐnb˙˙˙spןǝɥpuɐɥ puɐ sdoʇdɐן uǝǝʍʇǝq dɐƃ ǝɔıɹd ǝɥʇ ǝƃpıɹq sʞooqʇǝu sɐ ǝƃɹns oʇ pǝʇɔǝɾoɹd ʎɹoƃǝʇɐɔ ɹǝʇʇɐן ǝɥʇ ɥʇıʍ 'sǝɔıʌǝp ʇǝuɹǝʇuı ǝןıqoɯ puɐ sʞooqʇǝu 'sʞooqǝʇou ɹoɟ ʇǝsdıɥɔ uoƃɐɹpdɐus sʇı uo sısɐɥdɯǝ pǝɔɐןd sɐɥ ɯɯoɔןɐnb 'ǝןıɥʍuɐǝɯ sdıɥɔ :ʇsɐɔǝɹoɟ ssǝןǝɹıʍ 9002 ssǝןǝɹıʍ ɹɔɹ sǝuıɥɔɐɯ dx sʍopuıʍ ɹo nʇunqn ʎןןɐnsn ǝɹɐ ʇɐɥʍ ƃuoɯɐ ǝɥɔıu ɐ puıɟ ןןıʍ ɯɹoɟʇɐןd ǝןıqoɯ ǝɔɹnos-uǝdo s’ǝןƃooƃ ʇɐɥʇ ƃuıʇʇǝq sı - ʎɐpoʇ ʇǝʞɹɐɯ ǝɥʇ uo sʞooqʇǝu ɟo ʎʇıɹoɾɐɯ ʇsɐʌ ǝɥʇ uı punoɟ ɹossǝɔoɹd 072u ɯoʇɐ ןǝʇuı ǝɥʇ ɹoɟ ǝןqısuodsǝɹ - ɹǝʞɐɯdıɥɔ ǝɥʇ ʇɐɥʇ sʇsǝƃƃns oɥʍ '”ǝɔɹnos ǝןqɐıןǝɹ“ s’ʇɐǝqǝɹnʇuǝʌ oʇ ƃuıpɹoɔɔɐ s’ʇɐɥʇ ˙ǝɹɐʍpɹɐɥ ʇǝsdıɥɔ ǝןqɐʇıns ɥʇıʍ sɹǝɹnʇɔɐɟnuɐɯ ʇɹoddns oʇ ƃuıɹɐdǝɹd sı puɐ '0102 ʇnoɥƃnoɹɥʇ puɐ 9002 ǝʇɐן uı sʞooqʇǝu pǝsɐq-pıoɹpuɐ ɟo ʎɹɹnןɟ ɐ ƃuıʇɔǝdxǝ sı ןǝʇuı sʞooqʇǝu pǝsɐq-pıoɹpuɐ ɟo ǝsıɹ ɹoɟ ƃuıʎpɐǝɹ ןǝʇuı ʞooqʇǝu pǝsɐq-pıoɹpuɐ ɟo ǝsıɹ ɹoɟ ƃuıʎpɐǝɹ ןǝʇuı ʎoʎ %6˙12– puɐ bob %7˙12– pǝuıןɔǝp sʇuǝɯdıɥs ʇıun ɹossǝɔoɹd ɔd ǝpıʍpןɹoʍ 'ɯoʇɐ ʇnoɥʇıʍ ˙ǝuıןɔǝp ɔıʇɐɯɐɹp pıoʌɐ ʇǝʞɹɐɯ ǝɥʇ dןǝɥ oʇ ɥƃnouǝ ʇou ʇnq ǝɔuɐɯɹoɟɹǝd ʇǝʞɹɐɯ ןןɐɹǝʌo ǝɥʇ uı ǝɔuǝɹǝɟɟıp ǝןqɐʇou ɐ ǝʞɐɯ oʇ pǝnuıʇuoɔ (,,sʞooqʇǝu,, sןןɐɔ ןǝʇuı ɥɔıɥʍ) sɔd ʞooqǝʇou-ıuıɯ ɹoɟ ɹossǝɔoɹd ɯoʇɐ s,ןǝʇuı ˙˙˙ ؛(ʎoʎ) ɹɐǝʎ ɹǝʌo ɹɐǝʎ %4˙11– puɐ (bob) ɹǝʇɹɐnb ɹǝʌo ɹǝʇɹɐnb %0˙71– pǝuıןɔǝp sʇuǝɯdıɥs ʇıun ɹossǝɔoɹd ɔd ǝpıʍpןɹoʍ '80b4 uı 80b4 uı sʞooqʇǝu oʇ sdoʇdɐן sʍopuıʍ ɯoɹɟ ǝʌoɯ sıɥʇ ʇɹoddns oʇ sǝıɹoʇs ǝɯos ǝɹɐ ǝɹǝɥ ˙ʎʇıןıqoɯ puɐ ʎɹʇsǝɔuɐ ǝuoɥd ɹıǝɥʇ ɯoɹɟ sǝɯoɔ ʇɐɥʇ uoıʇdɯnsuoɔ ɹǝʍod ʍoן ɟo sǝƃɐʇuɐʌpɐ ןɐuoıʇıppɐ ǝɥʇ ǝʌɐɥ ʎǝɥʇ ˙sƃuıɹǝɟɟo pnoןɔ ɹǝɥʇo puɐ uoıʇɐzıןɐnʇɹıʌ 'sɐɐs 'sǝɔıʌɹǝs qǝʍ ɹoɟ sʇuǝıןɔ ǝʇɐnbǝpɐ ǝʞɐɯ ǝuoɥd ʇɹɐɯs ǝןqɐdɐɔ ʎɹǝʌ puɐ sʞooqʇǝu ƃ3 ˙sǝıƃǝʇɐɹʇs ƃuıʇndɯoɔ doʇʞsǝp pǝnsɹnd ʇou ǝʌɐɥ puɐ sǝɔıʌɹǝs ɟo sǝdʎʇ ǝsǝɥʇ uo ʎןǝɹıʇuǝ sǝssǝuısnq ɹıǝɥʇ pǝsɐq ǝʌɐɥ oɥʍ ɯoɔ˙ǝɔɹoɟsǝןɐs puɐ ǝɹɐʍɯʌ 'ǝןƃooƃ sɐ ɥɔns sǝıuɐdɯoɔ ʎq uǝʌıɹp uǝǝq sɐɥ ʇuǝɯdoןǝʌǝp ɹıǝɥʇ ˙sɹɐǝʎ 01 ʇsɐן ǝɥʇ uı ǝʌıʇɔǝɟɟǝ ʎןɥƃıɥ ǝɯoɔǝq ǝʌɐɥ ǝsǝɥʇ ɟo ʇsoɯ ˙ suoıʇɐɔıןddɐ qǝʍ puɐ sɐɐs 'uoıʇɐzıןɐnʇɹıʌ ƃuıpnןɔuı sǝɔıʌɹǝs pnoןɔ ɟo sǝdʎʇ ʎuɐɯ ǝɹɐ ǝɹǝɥʇ ˙uoıʇɐzıuɐƃɹo uɐ ɥƃnoɹɥʇ pǝʇɐɔıןdǝɹ ǝq oʇ pǝǝu ʇou sǝop ǝƃɐɹoʇs ʞsıp ʎʇıןıqɐıןǝɹ-ɥƃıɥ sɐ ɥɔns ǝɹnʇɔnɹʇsɐɹɟuı ǝʌısuǝdxǝ ʇɐɥʇ ʇıɟǝuǝq ןɐuoıʇıppɐ ǝɥʇ sı ǝɹǝɥʇ ˙ʇuǝıɔıɟɟǝ ǝɹoɯ ɥɔnɯ sı ɹǝʌɹǝs ןɐɹʇuǝɔ ɐ uo ǝɹɐʍʇɟos ǝɥʇ ƃuıuunɹ ǝɹoɟǝɹǝɥʇ ˙%01 uɐɥʇ ssǝן ɥɔnɯ 'ʍoן ʎɹǝʌ sı ǝɹɐʍʇɟos sıɥʇ ɟo ǝƃɐsn ǝɔɹnosǝɹ ɹǝʇndɯoɔ ǝƃɐɹǝʌɐ ǝɥʇ ˙ǝʌısuodsǝɹ ǝq oʇ ǝɔɐɟɹǝʇuı ɹǝsn ǝɥʇ ʇuɐʍ noʎ puɐ suoıʇɐʇndɯoɔ ǝsuǝʇuı ǝɥʇ sǝop ʇı uǝɥʍ ʎɐp ɹǝd sǝʇnuıɯ ʍǝɟ ǝɥʇ 'ǝƃɐsn ʞɐǝd ʇɹoddns oʇ ƃuıʎɐd ǝɹɐ noʎ 'ʇı ʇɹoddns oʇ ǝɹɐʍpɹɐɥ ɔd ǝʌısuǝdxǝ ʎnq noʎ uǝɥʍ ˙ǝɯıʇ ǝɥʇ ɟo ʇsoɯ ƃuıɥʇou sǝop ʇı ˙sǝןɔʎɔ ʎʇnp ʍoן ʎɹǝʌ sɐɥ 'sɔıɥdɐɹƃ ʎʇıןɐnb ɥƃıɥ ǝʞıן ǝɹɐʍʇɟos ǝʌısuǝʇuı ʎןןɐuoıʇɐʇndɯoɔ uǝʌǝ 'ǝɹɐʍʇɟos ʇuǝıןɔ ʇsoɯ ˙sɹǝʇndɯoɔ ןɐuosɹǝd s,ǝןdoǝd ʎuɐɯ uo ʇı ƃuıop uɐɥʇ ɹǝısɐǝ sı uoıʇɐɔoן ןɐɔısʎɥd ǝuo uı ƃuıuunɹ ǝɹɐʍʇɟos ƃuıpɐɹƃdn puɐ ƃuıuıɐʇuıɐɯ ǝsnɐɔǝq uoıʇɐzıןɐɹʇuǝɔ ɥʇıʍ ʎןןɐɔıʇɐɯɐɹp sǝsɐǝɹɔǝp (oɔʇ) dıɥsɹǝuʍo ɟo ʇsoɔ ןɐʇoʇ ˙sʇuǝıןɔ ɹǝןןɐɯs ǝʌɐɥ puɐ ƃuıʇndɯoɔ ǝzıןɐɹʇuǝɔ-ǝɹ oʇ sı ʇı ǝsıɹdɹǝʇuǝ uı puǝɹʇ ɹoɾɐɯ ʇuǝɹɹnɔ ɐ

14 September 2009

What I need from a 3G Netbook

I could use one know for

  • working on the train
  • working at cafes while waiting for the kids
  • working in the country while visiting friends and family
My PC and laptop seem like overkill for researching on the web, emailing, writing reports and building a few models in a spreadsheet. They also use a lot of electricity and take up space.
To be an effective replacement for the PC and laptop, a netbook would need to have
  1. Be reasonably priced. $200 would be nice.
  2. Have a reasonably priced connection plan.
  3. Have access to cheap or free software

    1. Web browser
    2. Word processor
    3. Drawing tools
    4. Spreadsheet
  4. Reliable connection with good coverage.
  5. Good battery life. 6 hours would be nice
Basic set of applications
  1. Gmail with tasks
  2. Google calender
  3. Google Docs
  4. GIT for source code management
  5. YUML
  6. Ubuntu or Windows XP with cygwin
  7. Gnu tools
  8. VNC or WRD
  9. ssh
That would get me started. It would be nice to  have Eclipse and a local word processor but running these over a remote shell would be more than adequate. I worked that way with all my heavy tools on VMware instances for years and it worked well. The VMware instance were hosted in a data center and backed up regularly

10 September 2009

Electronic Medical Records Bonanza?

Big Bucks in Health IT!, quoting from http://www.healthcareitnews.com/news/global-market-hospital-it-systems-pegged-35b-2015 , says

SAN JOSE, CA – The global hospital information systems market will climb past $35 billion by 2015, according to a new forecast by Global Industry Analysts. The United States represents the largest market in the world. The U.S. hospital information system market is experiencing an increase in acceptance of customized technology such as laboratory information systems and radiology information systems, the report notes. The market is also a promising ground for electronic medical record systems.
The Asia-Pacific region (excluding Japan) represents the fastest growing hospital information systems market, exhibiting a compounded annual growth rate of 11.5 percent over the next few years, according to analysts. Despite being a smaller market in terms of revenue, the Asia-Pacific promises excellent growth opportunities for hospital information systems, they said.
The global vendors profiled in the report include McKesson , Cerner , Allscripts-Misys Healthcare Solutions, Eclipsys, Computer Programs and Systems, Siemens Medical Solutions USA, QuadraMed, Medical Information Technology, Healthland, GE Healthcare, iSOFT Group, Agfa-Gevaert, Brunie-Software, IBA Health and Integrated Medical Systems. 
The full release is here: Global Hospital Information Systems Market to Cross $35 Billion by 2015, According to New Report by Global Industry Analysts, Inc. Increasing awareness among medical service patrons on the benefits of using Information Technology in the healthcare sector, coupled with growing demand for affordable-yet quality healthcare services is forcing hospitals and other medical centers to adopt IT in their daily operations. Subsequently, Healthcare IT systems such as the Hospital Information Systems witnessed a great demand in the healthcare services sector. Adoption of HIS in hospitals is increasingly being encouraged and promoted by the Governments world over. http://www.prweb.com/releases/2009/02/prweb2021984.htm

09 September 2009

bit.ly custom URLs

Where does http://bit.ly/peterwilliams direct to?

Is it the same web page as http://linkd.in/PeterWilliams  ?

16 August 2009

Open Goverment Made Simple

There has been a lot of talk about Open Government recently, including this from Peter Williams

"Australian governments should adopt international standards of open publishing as far as possible. Material released for public information by Australian governments should be released under a creative commons licence." or in simple terms "make public data open and free".
That is useful, clear and straightforward.

So how to do it?

My experience in organising company data over the last 10 years is that the teams I have worked in have tried many content management systems (CMSs) and none of them were satisfying to use (though some were interesting to implement.) Inevitably the document taxonomies that made sense to the site administrators did not work for most of the users and the users soon gave up trying to find things through the CMS.

Then one day someone in the company I was working at  purchased a Google Search Appliance (GSA) and indexed most of our intranet with it. After that everybody could find all the documents they knew existed on the intranet  and discovered useful ones they did not know existed.

To be fair, things were not quite that simple. Most companies need reliable storage, decent version tracking, access control and many other things that CMSs provide. However people need to be able to find documents much more than they need these other things. Very few people need version tracked, access controlled documents that they cannot find in the first place.

So why don't goverments just make their data visible to internet search engines and store it somewhere secure with some simple versioning system now, and then do the fancy stuff later? Why are they are investing in CMSs like Sharepoint?

The reason we did not do this in the companies I worked in was that many of the features in the CMSs we used were useful and the people who implemented the systems decided they needed all these features. Finding documents was just one of several check-boxes on their requirements documents. They were acting as implementers and experts, not users. The systems they ended up with made perfect sense to everyone except the users.

The interesting thing about this was the implementers were users in most cases. They were aware of the limitations of CMSs but they had to follow either the direction of their users who had had not used CMSs enough and to understand how badly they would work in practice or the direction of their managers and key stakeholders who had heard that CMSs were good. The person who got  the GSA was an IT guy who just went out and tried it without surveying users or bringing in CMS vendors to talk to his key stakeholders.

For a different perspective on Open Government, read some Tim O'Reilly.

16 July 2009

When Words Fail

A while back I worked at a company who made software+hardware products in a maturing market. The company found  it needed to deliver higher quality products with more features and was struggling to do so from an old codebase. It had become clear to the management team that late-stage serious defects were the major cause of schedule/quality issues but they had been able to fix this problem.

The codebase management team had a lot of ideas about what the causes were and how to fix them. They had discussed "technical debt", "silo-ing" and other causes. However in the end they settled on two key priorities: taking extreme care with code changes and sticking with established QA processes to minimise the number of introduced bugs.

Eventually the project was given to me to manage. One of the (many) things the development team had done well was to document each bug and cross-reference bug fixes against  the source code. I analysed about 100 recently fixed serious software bugs, looked up their fixes in the SCM and then looked up the date at which the code changes causing the bug were checked in. This showed that most of the bugs being found had been introduced months before they were discovered. It was clear that the late-stage defects were dominated by latent bugs being unmasked by changes, not by bugs introduced by changes.

Some changes to the development process were needed. The development group was responsible for creating code without introducing bugs and the QA group was responsible for finding the bugs the development team missed. However the QA process was unsuited to discovering latent bugs fast because it had a long cycle based on testing user scenarios. Therefore I got small teams of developers and QAs to work closely together to find, fix and verify bugs and I took some developers away other work to develop a system to find and fix (and eventually prevent the introduction of more) latent bugs. This work is described here. With these changes in place, code stability improved rapidly and late-stage serious bugs essentially ceased to be found.

That was a fairly straightforward technical solution to a fairly straightforward  technical problem. So why had the very capable management team who had known the underlying causes (technical debt and silo-ing) not been able to fix the problem for so long?

Change is known to be difficult in organisations and there is an industry built around dealing with this. However our immediate problem was not an inability to persuade people to change. In fact consultation and review had been distracting people from doing the experimentation required to find the underlying causes of the problem was and how to fix them. The more people talked about the problem the further they got from the solution (hence this post's title).

The situation reminded me of Uncle Bob Martin's Agile Smagile

As I said before, going meta is a good thing. However, going meta requires experimental evidence. Unfortunately the industry has latched on to the word "Agile" and has begun to use it as a prefix that means "good". This is very unfortunate, and discerning software professionals should be very wary of any new concept that bears the "agile" prefix. The concept has been taken meta, but there is no experimental evidence that demonstrates that "agile", by itself, is good.
The deeply ingrained practices in the organisation I worked in had grown out of  ideas that had worked well in the past. They had been good enough to cover a wide range of development scenarios for a long while and were clearly based on experimental evidence from past development. However somewhere along the way people had stopped experimenting and modifying the rules, and started just following the rules. This is what Uncle Bob called "going meta". The problem for our organisation was that the set of rules it had got to when it stopped experimenting were not universally true, they were only true for the type of the development they were doing when they stopped changing the rules.

The changes I made to detect and fix latent bugs (high-coverage automated system testing, static analysis with Klocwork and refactoring with unit tests) were adopted across the development organisation and became part of the standard development process, at least for the time I was there. That was good but I wondered if those practices would become a fixed part of the new development process because they had worked some time in the past. And I wondered whether they would prevent the company from addressing problems that arose in the future, just as the practises that had worked well in the past had come to do.

22 June 2009

Minimal Non-C++Programmer Bamboozling C++ Question

I recently read a stream of blog posts about why developers don't like C++ for general purpose programming. This post typifies much of the criticism of C++'s complexity. It includes an interview question about creating a C++ class that behaves like a class in a high level language such as Java. The author says that he uses this to weed out job applicants who haven't used C++ for real work.

It strikes me that tripping up developers with C++'s many oddnesses is much easier than that. Here is a simple question that I believe will confuse many non-C++ programmers:

What is the output of this program?


#include <string>
#include <iostream>
using namespace std;
class Parent {
public:
string func()  { return "parent"; }
virtual string vfunc() { return "parent+virtual"; }
};
class Child : public Parent {
string func()  { return "child"; }
virtual string vfunc() { return "child+virtual"; }
};
string test1(Parent parent)  {
return parent.func() + " - " +  parent.vfunc();
}
string test2(Parent& parent)  {
return parent.func() + " - " +  parent.vfunc();
}
int main() {
Child  child;
cout << "test1: " << test1(child) << endl;
cout << "test2: " << test2(child) << endl;
}

I have seen C++ interviewers ask questions like this but only show test1 then ask what  is on the stack when test1 is invoked.

02 June 2009

Movement to the Cloud

A current major trend in enterprise IT is to re-centralize computing and have smaller clients. Total cost of ownership (TCO) decreases dramatically with centralization because maintaining and upgrading software running in one physical location is easier than doing it on many people's personal computers. 
Most client software, even computationally intensive software like high quality graphics, has very low duty cycles.  It does nothing most of the time. When you buy expensive PC hardware to support it, you are paying to support peak usage, the few minutes per day when it does the intense computations and you want the user interface to be responsive. The average computer resource usage of this software is very low, much less than 10%. Therefore running the software on a central server is much more efficient. There is the additional benefit that expensive infrastructure such as high-reliability disk storage does not need to be replicated through an organization.
There are many types of cloud services including virtualization, SaaS  and web applications . Most of these  have become highly effective in the last 10 years. Their development has been driven by companies such as Google, VMware and Salesforce.com who have based their businesses entirely on these types of services and have not pursued desktop computing strategies. 
3G netbooks and very capable smart phone make adequate clients for Web Services, SaaS, Virtualization and other cloud offerings.  They have the additional advantages of low power consumption that comes from their phone ancestry and mobility.
Here are some stories to support this 
  1. Move from Windows Laptops to Netbooks in 4Q08 In 4Q08, worldwide PC processor unit shipments declined –17.0% quarter over quarter (QoQ) and –11.4% year over year (YoY); ... Intel's Atom processor for mini-notebook PCs (which Intel calls ''Netbooks'') continued to make a notable difference in the overall market performance but not enough to help the market avoid dramatic decline. Without Atom, worldwide PC processor unit shipments declined –21.7% QoQ and –21.6% YoY
  2. Intel readying for rise of Android-based netbook Intel readying for rise of Android-based netbooks Intel is expecting a flurry of Android-based netbooks in late 2009 and throughout 2010, and is preparing to support manufacturers with suitable chipset hardware. That’s according to VentureBeat’s “reliable source”, who suggests that the chipmaker - responsible for the Intel Atom N270 processor found in the vast majority of netbooks on the market today - is betting that Google’s open-source mobile platform will find a niche among what are usually Ubuntu or Windows XP machines
  3. RCR Wireless 2009 Wireless Forecast: Chips Meanwhile, Qualcomm has placed emphasis on its Snapdragon chipset for notebooks, netbooks and mobile Internet devices, with the latter category projected to surge as netbooks bridge the price gap between laptops and handhelds...Qualcomm will have competition, as Intel Corp. — the world’s largest chip maker — has targeted MIDs as PC sales rates slow, hoping to grab some of the market served by Apple Inc.’s iPhone. PC maker Dell Inc. is rumored to be preparing a smart phone play to open new markets. ...Meanwhile, most chip vendors are looking to core strengths and seeking bright spots in the year ahead. Smartphone sales are projected to grow this year — analysts vary on the strength of that growth — and vendors who make GPS, Wi-Fi and Bluetooth chips and 3G modems have a fighting chance, analysts have observed. ...“These are bright spots,” said Will Strauss, principal at Forward Concepts. “3G in general is still growing.”
  4. Cloud computing ascends the mainstream (The Age 20 Feb 2009) illustrates how mainstream cloud computing has become. Juicy Orange pays 10 to 80 cents per hour for the virtual service, which means it has stopped forking out for physical hardware - it has become a web development company with none of its own computers. "We don't spend hours and days waiting for physical hardware installations any more. I'd be hard-pressed coming up with any real disadvantages. Once we got over the learning hump, it's been nothing but better," says Mr. Murphy
Cloud computing requires fast, reliable and portable network connections. The telecommunication carriers who can provide this will achieve a pricing advantage over their competitors

Some Consequences of Movement to the Cloud

  1. Cannibalization of existing sales. What will netbooks do to the PC and laptop value chain? Currently netbooks are growing the PC pie but in the longer term US$250 netbooks may cannibalize US$1000 laptop sales at great cost to Intel and Microsoft. Netbooks are well-suited to cloud computing so netbook + cloud may replace laptop the Microsoft software. Google, salesforce.com, Amazon and other software already works well from the cloud while Microsoft Word does not so well-suited. The $250 vs. $1,000 price difference will drive change.
  2. Possible convergence of smart phones and netbooks.
  3. Processor wars. Intel Atom, Qualcomm Snapdragon, Freescale’s ARM based chipset.

05 May 2009

Software Engineers were on the Right Track

 It is often sad when software engineers are shown to be correct. One of my first posts was on monetizing social search

This innovative use of social networking could have a short-term payoff. It turns out to be one of the few successes to date in making money from social networking When I ran a straw poll on making money from social networking with some software engineers, wannabe entrepeneurs and friends, they all came up with essentially the same idea: Mining the users' personal details and finding some way to make them pay to keep this information confidential

My panel of experts shared the cynicism of the sysadmins who say
NEVER anthropomorphize lusers.

Intellius 's recent acquisition of Spock appears to have proven my panel of experts were on the right path.

Michael Arrington warned last week
But one company may have bitten and are close to buying the company. Sources are saying that the infamous Intelius (founded by the equally infamous Naveen Jain ), a people search engine that charges users to access data, may be buying Spock soon. If these rumors are accurate, God help Spock. Not only is Intelius embroiled in all kinds of legal and ethical disputes, but they also have a shaky history when it comes to acquisitions.
Then Ajit Jaokar summed up
So, let me see if I get this right
a) Spock trawls the web looking for our data
b) It creates a profile about us in their site without approval
c) It encourages us to enrich that information
d) It charges us to access our own information
e) And ultimately .. it sells that same information to a background check company ..
For instance, my 'harnessed' profile (i.e. I did not create it) says .. Ajit Jaokar is an Indian-born British author and Web 2.0 specialist. He is the founder and CEO of the publishing company Futuretext. He is also the... and the rest you have to pay for :) (Don't bother .. that information is freely available on my blog .. and lots more .. so you can hopefully make your own judgements about me .. )
Can I delete my own information in Spock?.. Now it gets MUCH more interesting .. Spock says on deletion of information ..
If you'd like to remove yourself from Spock, please read the following information and click the link below.



Before requesting removal, please make sure the original source of the information Spock found for you has been removed or made private (MySpace, blog, Friendster, etc). This will prevent you from being re-indexed on the site. Please note that you can only request removal for your Spock search result.

When filling out your information please make sure to include your name, e-mail, a link to your Spock Search Result (i.e. http://www.spock.com/Tiger-Woods), and the reason why you'd like to be removed. The Spock Support Team will review your claim and get back to you within 24-48 hours.

So, I have to ensure that the Original sources of information that they got the profiles from should also be made private(i.e. my blog, my facebook profile etc etc) ..(else they will 'harness' me again!)

26 April 2009

The Future of ICT

It is worth thinking about the future from time to time. It helps us craft investment strategies and career paths that match the major trends in the world. So what is the future of ICT?

My guesses are
  1. Simplification.
  2. Movement to the cloud.
  3. Fixed/mobile convergence.
  4. Integration of simple cloud services. (1+2)
threesixtyfive | day 244
Modern ICT systems are insanely complex while the most productive computer users I know all use simple tools. 
Most of the things we do we with computers are much simpler than the popular packages are capable of. e.g. Editing some text does not require a full blown desktop publishing program like MS Word, yet MS Word is the most popular text editor in the world. Likewise keeping track of some customers and inventory does not require a gigantic package like SAP, yet SAP is the biggest selling ERP software package in the world.
Modern Times
The costs of learning these immensely complex packages are considerable in terms on time lost. There is probably a much higher cost in working as slaves to these packages which distracts from finding the best solutions to an enterprise's problems. A current trend in corporate ICT is to use "best of breed" packages with the minimum possible customization because the payback from customizing is much less than the cost. (BTW, this does not seem to be true for ERP). This means that enterprises that enterprises are paying the cost of  not solving their ICT problems as well as they could. This cost has to be a significant fraction of their ICT budgets.
What is ERP anyway? (MS&T ERP Center, 01/29/2009)
This article explains why "best of breed" software packages sell well. It boils down to the promise of lower total cost of ownership (TCO) through using a single vendor for all services and a mega-brand that makes buyers feel safe. 
SAP ERP systems effectively implemented can have huge cost benefits. Integration is the key in this process. "Generally, a company's level of data integration is highest when the company uses one vendor to supply all of its modules." An out-of-box software package has some level of integration but it depends on the expertise of the company to install the system and how the package allows the users to integrate the different modules.
Movement to the Cloud
Centralized computing is much more efficient than desktop-centric computing. TCO decreases dramatically with centralization because maintaining and upgrading software running in one physical location is far easier than on many people's personal computers. 
Most client software, even computationally intensive software like high quality graphics, has very low duty cycles. It does nothing most of the time. When you buy expensive PC hardware to support it, you are paying to support peak usage, the few minutes per day when it does the tricky computations and you want the user interface to be responsive. The average computer resource usage of this software is very low, much less than 10%. Therefore running the software on a central server is much more the 10x more efficient.
Expensive infrastructure such as high-reliability disk storage does not need to be replicated through an organization. Virtualization, SaaS etc only became effective in the last few years so many software and hardware vendors built their (then efficient) businesses around powerful client PCs running software locally.

Fixed Mobile Convergence 
Skype Crashing on iPhone Fix
When simple applications and cloud computing become dominant, the requirements for terminals become much less. Smart phones and 3G netbooks are already very capable and are becoming more so. They also use little power and are portable.

The next level of usability is to have one device for fixed and mobile work. That device should be able to work with WiFi and 3G networks and move seamlessly between them. The technology for this is maturing.

From Wikipedia
A clear trend is emerging in the form of fixed and mobile telephony convergence (FMC). The aim is to provide both services with a single phone, which could switch between networks ad hoc. Several industry standardisation activities have been completed in this area such as the Voice call continuity (VCC) specifications defined by the 3GPP. Typically, these services rely on Dual Mode Handsets, where the customers' mobile terminal can support both the wide-area (cellular) access and the local-area technology (for VoIP). However, an alternative approach achieves FMC over 3G mobile networks - eliminating the requirement for Dual Mode. This approach, broadly termed cellular FMC, is in trials by telecoms operators including BT.
An alternative approach to achieve similar benefits is that of femtocells .
Integration of simple cloud services
When cloud computing is widespread and simple cloud services are widely available, integration  companies will be able to assemble tools to meet the needs of businesses. This should be a vast business since it competes with the mega-apps Microsoft, Oracle, SAP, Siebel etc and the mega-glue Tibco etc.

If the recent history of software development is a guide, nimble companies will start to build effective suites and grow rapidly to  form a foundation for this industry, then they will be followed by specialist companies who will take care to make their software inter-operable. This will evolve into a software ecosystem and sales channels will emerge. With fixed mobile convergence in the mix, application stores may be used for sales, removing the need for sales and marketing teams in the startup companies that start this new business category.

At this time  the setups of hyper-productive software users will be easy to replicate in the cloud. User applications will  be available to users as simple serices on simple devices like 3G netbooks. Enterprise applications will run in the cloud with simple interfaces. Business outsourcing will be simple because the software will run in the cloud with well-defined APIs.

The Consequences
  1. These changes will result in a dramatic increase in productivity that will boost economies world-wide.
  2. Software will be simple so ICT staff will not be slaves to the machines of gigantic software packages.
  3. This will free up ICT staff's time to add business value which will increase productivity even more.
Photo Credits
threesixtyfive | day 244 by Sybren A. Stüvel.
What is ERP anyway? (MS&T ERP Center, 01/29/2009) by MS&T Center for ERP.
Skype Crashing on iPhone Fix by theleetgeeks.
Modern Times by jampa.