Monday, June 22, 2009

Start Hacking Montezuma

It is very bad that I have suspended my study process of LISP for some weeks. I hope I should concentrate things which are meaningful and off those meaningless things, such as argue in forum and read entertainment story.

To follow Lesie's suggestion, I prepare to hacking Montezuma. First, I have read the treatise <An Object-Oriented Architecture for Text Retrieval>. It is a great paper, It uses an elegant, simple approach to accommodate a scalable complex architecture. I aslo understand Montezuma can not use the code samples.

I suppose fix bugs is a good start to get involve in a open source project, :).

standard tokenizer hangs on some input

As Edi Weitz pointed out, the culprit is the complex regular expression(method, token-regexp) in standard-tokenizer.lisp, and I have reduced the problem into a simple case:

CL-USER> (cl-ppcre:scan
              (cl-ppcre:create-scanner
                 "(_\\w+)*\\@\\w+") "_______________________________________"
                          :start 0)
;; Evaluation aborted.

I speculate that '\w' includes underscore in regular expression would account for this bug. and replace with other character of '\w' cause it too.

CL-USER> (cl-ppcre:scan (cl-ppcre:create-scanner
               "(a\\w+)*\\@\\w+") "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
               :start 0)
;; Evaluation aborted.

cl-ppcre is a perl-compatible regular expressions library, I should check it in Perl. Maybe perl is more efficient in regular expression operation, I raise the number of underscores, but it is OK.

$str = "john._______________________________________
__________________________________";

if ($str =~ m/(_*\w+)*\@\w+/)
{
   print "ok\n";
}

To conclude, it isn't montezuma's bug but cl-ppcre.

broken :must-not-occur or phrase query

I found query "html-template !\"edi weitz\"" is OK in my test corpus, but if I tried query "html-template !edi !test", it tell me:

"Invalid initialization argument: SCORER in call for class #<STANDARD-CLASS DISJUNCTION-SUM-SCORER>.".

It is obvious that there is not slot named scorer in disjunction-sum-scorer class, it maybe a typo, scorer should be substituted by sub-scorers. I modified it at line 199 of boolean-scorer.lisp, It is OK and all unit test are passed too.

Friday, May 15, 2009

ASDF-INSTALL and Cliki

This article should be posted before The annoyance of ASDF-INSTALL, but it is also the time factor of blog debut.

ASDF-INSTALL and Cliki works together as a Common Lisp's answer to CPAN, I use it to get Common Lisp libraries for a long time, but what I want to say is how to create your own package and tell it to Cliki.

  1. Make a gzipped tarball of the ASDF system, arranged to unpack into a subdirectory systemname_version.

    $ tar -cvzf systemname_version.tar.gz systemname_version/

  2. Cliki require that all packages are accompanied by detached PGP signatures.

    $ gpg -b -a mail-reader_0.1.tar.gz

    Note I have runned Linux in virtual machine for a long time and gpg complained it can not get enough random feeds when I want to generate a new key pair.

    $ gpg —gen-key

    Finally, I got sufficient random feeds by run this command in putty and run

    $ cat /etc/random

    in eshell in emacs in another putty process. I don't known why, it just works.

  3. Upload the package and signature file to a web server.
  4. Create a page on cliki with the same name as the package which content should contain

    :(package "http://www.example.com/lisp/mail-reader_0.1.tar.gz")

    Note you should not use "A N other" as Your name for editing, It let me stumble until I realize it. but why is it default?

Thursday, April 30, 2009

Some experiences for montezuma cl-markdown and cl-html-diff

I used these three libraries recently and want to write something down, :).

Montezuma

Montezuma is a full text index/search engine, What interested me is it has a port circle: Common Lisp -> Ruby -> Java -> Common Lisp, Why I can not use original version?

Montezuma offers a search-each method to search, but search-each only call a callback function to deal with search result, I have to mutate variables which violate function programming style. Another inconvenience is there is not a clean all index method, only a delete-document method. Besides it does not export get count of index method. I must use ::.

(montezuma::num-docs (montezuma::reader *wiki-index*))

cl-markdown

I use emacs-muse as mashup language of this post, however elisp is not popular, perl is, Markdown is a similar markup language written in Perl. I found cl-markdown as a Common Lisp port.

cl-html-diff

It is great Common Lisp always has library which I want to find. cl-html-diff generates a human-readable diff for html document, human-readable is provide two element del and ins, I do not know whether there are default style for them, I add two style to my css file.

ins{
    color: blue
    text-decoration: underline
}
del{
    color:red
    text-decoration: overline
}

Monday, April 27, 2009

Learn the use of Mercurial (II)

It should be series 0, but it happens before I open this blog so it delays as II, ;).

I said I have a flaky internet connection, though ISP claims it is very fast, but I believe GFW would use up lots of bandwidth. I do not want touch politics, but please let technology pass, OK?

People who lived out of my country do not has this problem, therefor they design software without considering it, it brings pains for me. I want to clone a repository which have contained lots of changeset. I power on my machine and let it start clone work in the morning, when I come back in the night, I found it stop work very early and my machine just let me paid power fee at the rest of time!

I seek help from Google, but there is only a awkward method: clone the nth reversion and incrementally pull m reversion at a time until the tip reversion. I wrote a script to get rid of input-wait loop.

#!/bin/sh
myvar=0
while [ $myvar -ne 11 ]
do
     myvar=$(( $myvar + 1 ))
     echo $((280 + $myvar * 25))
     hg pull -r $((280 + $myvar * 25))
done

Wednesday, April 15, 2009

Learn the use of Mercurial (I).muse

I have ever used VSS, CVS and SVN as source control management system. Most of time I only use some simple commands like checkout update, commit, diff, history. Now they are classified as centralized source control management system and marked as old fashion1, many distrubuted source control management system emerges like git, mercurial, darcs and so on. Now I come into mercurial.

Mercurial use hg as its alias: yes Mercurial is hg! I don't read chemistry symbol for quite a long time, :)

hg clone are is similar with svn checkout, they are the start point. It is easy to familiar with commit, diff, history,

So can I do some differently for this fashion product? yes transfer changesets from among repositories. It does not need a central repositories for syncing code between two development place, it is very useful when there is not internet connection in one place.

hg export and hg import are a pair operation. hg export need a revision number and export the changeset corresponding with this revision. hg import then merge this changeset into another repository. hg bundle and hg unbundle are used for changsets while previous pair only deal with one changeset.

When I'm in centrailized repository, I will be very careful for commit so that I does not break the availability of the whole repository, but now I'm in my repository, I could commit whenever I like, nobody will blame me. But when I am ready for push my work, in order to conceal my stupid things, or avoid mess other repository's log, or just reduce network bandwith, I need destory some commit track. Google tell me use hg strip, but I don't see it in hg help, then I tried hg rollback, unfortunately it can not roll back twice. Finally, I found strip is provided by the MqExtension. add

[extensions]

hgext.mq =

into .hgrc, strip command appears.


1. I believe old does not mean bad, :)

Thursday, April 9, 2009

Study Selector widget (II)

With inspiration from widget hierarchy, I found why my get-widget-for-tokens does not get URI tokens, on-demand-selector widget are mapped to "main" by navigation widget first, so I should visit http://127.0.0.1:8080/main/asdf not http://127.0.0.1:8080/asdf1. Besides, get-widget-for-tokens can not only widget but also consumed tokens, otherwise there is a page-not-found error.

Let me clear the whole update protocol for selector widget:

  • handle-normal-request get URI tokens from browser and call update-widget-tree.
  • update-widget-tree call update-children which is specialized by selector, widget, so get-widget-for-token is called with URI tokens.
  • if there is not http-not-found error which can be throwed by update-children, render current widget.
  • make sure all tokens were consumed, or report a page-not-found error.
  • Specially for on-demand-selector, it set widget returned by get-widget-for-token as its children and cached, so it best fit dynamic wiki-style content creation

1. navigation asdf is a arbitrary URI tokens just for testing.

I found the reason why dataform widget get not updating

It is not your fault, I'm sorry for modifing you to conceal my mistake. I redefined with-widget-header method of widget which contains you, but I do not use recommended way, :(.

here is with-widget-header documentation:

"Renders a header and footer for the widget and calls 'body-fn' within it. Specialize this function to provide customized headers for different widgets.

'widget-prefix-fn' and 'widget-suffix-fn' allow specifying functions that will be applied before and after the body is rendered."

What I need is a customized footer for widget, I add my code into body of the method arbitrarily, so you get not updating when your status is changed. I should use widget-suffix-fn parameter!

No hasty coding at all!!!!