...it's me playing in branches :)
Well... in fact it's me working on my Google Summer of Code project and specifically on KMail.
My Summer of Code project has the following goals:
- Help, and possibly complete, the port of KMail to native Qt4
- Replace the old Qt3-based KMHeaders with a super-cool "Message List View"
- Implement a view for "whole threads" of messages
Just to make you a bit curious about the sources i'll introduce an interesting problem encountered in the development: the message list view filling algorithm.
The Rich Message List View
When you click on a folder in any e-mail client you usually get its contents displayed in some window. The contents of a folder are obviously messages and you likely want the list to be displayed with some kind of "preferred arrangement". When you're searching for the latest messages received or sent you probably to want the messages to be sorted by date. When you're searching for a particular sender you might want the messages to be sorted by that field. When you're looking at a mailing list you usually want the messages to be arranged in trees which represent discussion threads. Many e-mail clients (soon including KMail) allow you to group the messages and threads by some kind of criteria. For example, you might want to group the messages by "smart" date ranges: "Messages received Today", "Messages received Yesterday", "Messages received Last Week", "Messages from Last Year"...
The mail folders can be very large. It's not unusual to find folders with more than 50000 messages inside (just stay subscribed to the linux-kernel mailing list - or kde-commits! - for some weeks and you'll end up having an example of such a folder). Scanning and organizing all the messages inside such a folder *will* be time consuming. The current KMail implementation is blocking in this sense and even if it uses clever optimisations it still tends to freeze for several seconds when opening a large folder. Freezing is a symptom we need to avoid.
So now you should be getting the big picture. Large folders but no freezing, grouping or no grouping, threading or no threading, sorting by that or by something else. How to approach such an algorithm?
Well.. the actual kmail-soc branch source code has an implementation for this :) I'll leave you the fun of finding the details by reading the comments in the source and i'll outline only the general idea.
First of all, we take care of the freezing problem. The idea is that the UI must be able to process events while the filling algorithm is running. KMail structure is not thread safe at all: multithreading is not an option. Calling "processEvents()" once in a while is also not very nice: it looks so "visual basic"-ish and may lead to a lot of unnecessary calls. So we approach the problem by idle time processing. This is a typical algorithm for UI-based programs.
We set up a zero-interval timer that the Qt core will trigger every time it has nothing else to do (that is, the event loop is "idle"). Once we're called we will be able to do a chunk of work. Idle time processing imposes an obvious restriction on our implementation: it *must* be interruptible. That is, it must be able to perform a part of the work, stop at a certain point and later resume exactly from that point.
The underlying folder is assumed to be a "flat" structure in that the messages can be retrieved by using a numeric index. The natural splitting approach is to cover a contiguous part of the message index space in each work chunk.
We start at the initial index and go upwards... well... the direction and order of the messages to access in the underlying storage is yet another part of the story but i'll omit it for now.
We actually define a "maximum chunk processing time" as a constant in the sources and while processing messages, once in a while, we check if we ran out of time. If we did, then we store our current index as initial index for the next job and return control to Qt. And this works nicely, even if the real-world implementation is really more complex than the one i've just described.
Next we take care of message threading. Have you ever wondered how your e-mail clients knows that a message is a reply to another message? The answer is, it either gets this information (and trusts) the sender's e-mail client or it attempts to guess. The basic idea is that each message should contain a "Message-Id" field with an unique ID string. Then every (direct) reply to that message should contain this unique ID in the "In-Reply-To" header field. This doesn't always work because of broken or featureless clients. There is also a problem when one of the thread messages is missing since it literally breaks the tree in two non-joinable pieces. So we have the "References-Id" field. Smart e-mail clients put there several unique identifiers of messages belonging to the same thread. There is a complex rationale for that we're using only one specific entry in that References-Id (see sources for more information). But anyway, we have two somewhat "direct" means for threading messages. Then there are hopeless e-mail clients which do not include these fields in replies. So we try to guess by looking at the subjects... but i'll leave the details here and go on with the algorithm.
Messages then "come in" from the folder in some order. The order does not necessarily need to be the "right one". That is, a reply to a message X may come in before the message X itself. This means that initially we'll miss the parent for the reply and *later* we have to find out that the original message is its (perfect?) parent. This complicates the things a bit since we can't just append the messages to the trees. We can append them when a perfect parent is found but when no parent is found we don't know if there will be one later. The current solution tries to keep the message without a parent unattacched as long as possible (during a single execution of the fill view algorithm) hoping that the parent will jump out. When we reach the end of the folder without finding the parent then the parentless child must be attacched to the view and shown to the user (even if an orphan, it's still a valid message).
But then the user presses the "check mail" button and new messages are downloaded from the server... and the missing parent suddently appears. Then we have an additional hazard: we must be able to change the hierarchy of the already displayed messages... and we must do it on-the-fly, possibly without bothering the user that might be actually reading *another* message in the same folder (and maybe, as per Murphy's Law, belonging to the very same thread).
Then we have message grouping. Determining the group that a *single* messsage belongs to is quite simple. A bigger problem comes out when we want to be able to group threads by something that is not the first e-mail in the thread. Maybe it's not totally obvious but think of a group of "all the discussions that had activity today". This is actually the group of threads for that the user has received at least one message today. To implement this, in certain configurations, groups must be held up until the end while in others they can be attached directly to the view.
Anyway, at the moment of writing, the sources in the kmail-soc branch have a nearly working 4 pass solution for the problem (which in fact has several additional hardcore details). It works but it's complex and hard to read. This might also mean that it will be hard to mantain in the future. Something must be done in this direction too.
Then there is the potential Akonadi port... but that's yet another part of the story. Ah, and skinning too! ...we have both eye-candy lovers and conservators that want the view "just like it was in KMail 1.0" :D
Well...I hope to have interested you at least a bit :)
I always try to make sure that the branch code compiles. You're encouraged to try it out and send feedback!
About Me
My name is Szymon Stefanek. I live in Italy and i'm studying Informatic Engineering at the University of Siena. Currently i've passed 29 of the 31 required exams and i'm planning to graduate in December 2008.
i'm an open source software enthusiast and I have written free code since 1998. i'm the leader of the KVIrc project that is actually in a mature state. Other developers are taking care of it and i'm trying to expand my horizons by joining another open source team.
I have used KDE since 1998. In the past, I have contributed some artwork and several pieces of my code have indirectly ended up in the KDE codebase.