Thursday, March 26, 2009

Profiling on the DS

Fast, small, or easy. On this project, I've learnt that for PDF rendering you can have only two of those.

Since the hardware doesn't really support more than 4 mb of rendering, small is a requirement. Slowly, I've been moving from it being easy on the code side to fast on the user side; which is good. A difficulty I've faced is that mupdf is a large library; spread over roughly three modules in many files. This makes assessing what is taking the most time non-trivial.

Luckily there is a very good function level profiler that is available for the DS written by simonjhall; the guy who ported Quake I and Quake II to the DS.

Function Level Profiling

This section talks about what function level profiling is doing, and why it is cool. If you don't care or know all of it already, feel free to skip it, the more instructional stuff is in the next section.

In case you aren't familiar with the concept, gcc and g++ offer this great feature where they insert a call to a specialy designated function on every function call, and also insert a call to another special function on every function return. It gives both functions the address of the called function, as well as the place its being called from.

So lets say we have a simple program

void c(){ /* something */}
void b(){ c(); }
void a(){ b(); c();}
int void main(int c, char* args){
a();
}
If we just ran this program, execution would go something like:

  • main()
  • a()
  • b()
  • c()
  • c returns
  • b returns
  • c()
  • c returns
  • a returns
  • main returns
And now, although this program runs instantly, we want to figure out how much time the code spends in each function. With the ability to insert functions before and after function calls, we can have the special entry function grab the time on entry, record it, and when the exit function is called, figure out what the difference is; and then we have the amount of time that function ran. So With special _entry(void* calledFunction) and _exit(void* calledFunction) functions defined and enabled for our program; the execution goes something like:
  • main()
  • _entry(*a())
  • a()
  • _entry(*b())
  • b()
  • _entry(*c())
  • c()
  • c returns
  • _exit(*c())
  • b returns
  • _exit(*b())
  • _entry(*c())
  • c()
  • c returns
  • _exit(*c())
  • a returns
  • _exit(*a())
  • main returns
We won't be too concerned here with the implementation of these functions -- we provide files later that handle this for us(note: if you want to do this in non-arm processors, use the -pg flag). It is worth noting a couple of things though. Primarily, as this is a debug tool, the amount of overhead this adds is very signifigant. It is possible for memory to be an issue; the special functions need to maintain some form of stack of called functions so it can keep the entry data for functions that call other functions. However, most likely you will notice that the profiler adds alot of time to your program, and the flags for it should only be enabled when its needed.

On a more technical note; in general, you can only rely on this kind of profiling so much. There are many places online detailing why; but given the simplicity of the DS (ie, no cacheing to worry about), most don't apply here (note: If you can think of a way this kind of profiling can lead one astray on the DS, please comment :)).

How to profile on the DS

So download these three files: cyg-profile.c, cyg-profile.h, and resolve.pl(That last one is just 'resolve' on the server. Change it to 'resolve.pl'). You should add the c file and the h file to your project. Make sure though, if the rest of your project is C++ to '#define __cplusplus' before you include the h file.

This code won't actually profile everything; it will rely on you enabling the profiler within your code. This allows you to focus on profiling only where you know the slowdown to be. If you want to profile everything of course, just do it right off :).

Initialize the profiler with cygprofile_begin().
Enable the profiler with cygprofile_enable().
Temporarily disable the profiler with cygprofile_disable().
Permenatly disable the profiler and write the result file out with cygprofile_end(). (Nothing will be written unless you call this function at the end.)

Now you need to change your build process a little. Append to your c flags these two things: '-finstrument-functions -mpoke-function-name'. The first one tells the compiler to use the special functions in cyg-profile.c; the second tells the compiler to poke the function name into the binary so we can get a readable version later on.

One more thing is necessary. There are no real easy ways to get a real good reading for the current time on the DS. So this code instead relies on counting hblanks to provide accurate time measurements. It does not handle counting hblanks, that is something you need to do. You do this by adding an hblank interrupt and simply incrementing a global integer named 'hblank'. If you are already handling this interrupt; go ahead and add the global definition anywhere, and at the top of the hblank handler throw in a 'hblanks++'. If you don't, just call this code in your initialization stuff:

irqEnable(IRQ_HBLANK);
irqSet(IRQ_HBLANK, hblankCount);

And then add this code above it somewhere in the file:

int hblanks = 0;
static void hblankCount (void)
__attribute__ ((no_instrument_function));

void hblankCount(){
hblanks++;
}

If you already have an hblank handler, like I said, use it. But modify its declaration to have the '__attribute__ ((no_instrument_function))' modifier. This tells the compiler to not hijack this functions entry/exit; which could be hazardous in something that runs as an interrupt. I'd actually recommend doing this to all your interrupts; unless you really want to profile them specifically.

Have it all set up? Run it! If everything goes right, you'll generate a file cyglog.txt(it is up to you to enable the FAT filesystem) ; which is generally unreadable until we run it through resolve.py. We do this next.

This next part requires you to have perl installed. If you are like me, you don't have it yet. Go get it, come back.

Copy this file from your DS to your harddrive. Then open up a terminal/command prompt. Navigate to wherever the cyglog.txt file is, and make sure that perl and the arm toolchain are on your path. You'll also need to find a copy of the elf file for whichever processor generated the cyglog file. That file will provide the resolver with all the nice names for the ugly function pointers that the profiler gives you.

Moment of truth: run 'perl resolve.pl {program.elf} cyglog.txt'. It should dump out a large table straight to standard out. Once you get it working, you'll want to dump it to another file to look at. The next section is going to pretend you named that file 'result.txt' ('perl resolve.pl {program.elf} cyglog.txt > result.txt'), but you of course, don't have to.

Interpretting the Results

The table output will look like this:

address calls ms ms/call ms/frame lowest stack highest stack function name

0x2004284 153 0.0636 0.000 0.000 15384 15424 bgGetGfxPtr
0x200422c 153 0.0000 0.000 0.000 15360 15400 bgGetMapBase
0x2016718 9269 185.4962 0.020 0.309 14048 14904 fz_concat

Its a real good idea right now to throw this output in some kind of spread sheet app. That way you can sort the whole thing by individual columns.

The first column is the actual memory address of the program. Not too useful.
The second column is calls. This is the straight number of times the function was called.
Third we have ms. This gives us, predictably, the total time spent in that function in milliseconds.
Fourth, we have ms/call. This is the average time spent in the function per call. Pay special attention here; the amount of time a function takes can vary wildly.
Fifth, we have ms/frame. This number is a bit of a red herring, and not really meaningful unless you are profiling code that needs to occur within one vblank. This number is simply the number of total milliseconds divided by the number of frames processed as measured by the hblank you are incrementing.
Sixth: Highest/Lowest stack. This, presumably, tells you the min and max location of the function on the call stack -- ie, how 'nested' your function is. I don't know when this information would be useful; if you do please tell me :). A reader notes below that it is useful sometimes to tell if your program is about to overflow the stack.
And of course, last but not least, the function name. If you are coding in C++, and using classes heavily, get used to its name mangling. A function named repairFrame in the class PdfApp that takes a single fz_irect_s argument for instance, comes out as '_ZN6PdfApp11repairFrameE10fz_irect_s'.

The first thing you want to do is sort it by total time taken. Likely, you'll find your best optimizations by optimizing the functions at the top of this list specifically. If that function is called alot, it means that small things can go a big way in that function. If it isn't called that much, checked to see if you are doing something necessarily costly in that function (Like... floating point math) or if its calling expensive children.

Understand that the MOST time saved you'll get from modifying any of these functions is the total time they take to execute. Prioritize based on this. Optimizing your code is an iterative process; try to only optimize one thing at a time, and kind of keep track of what each optimization gave you. Its a good habit to save all the result files along the way to consult.

Selective Profiling

Elhobbs had a useful tip for setting up an easy way to specify which files to profile and which to not. Add the following to your makefile after you include the normal ds_rules:

%.clk.o: %.clk.c 
   @echo $(notdir $<) 
   $(CC) -MMD -MP -MF $(DEPSDIR)/$*.d $(CFLAGS) -finstrument-functions -mpoke-function-name -c $< -o $@ 


Then whenever you want to profile a file, change its name to *.c to *.clk.c. You will still have to enable the profiler, but there is virtually no overhead if you initialize the profiler but never use it. When you view your report later on, you should only see functions profiled that were in that file. 

Twentyseven points out below that it can be dangerous to profile code that occurs in interrupts; that extra overhead for profiling could cause it to do really goofy things. You can choose which functions to profile with the '__attribute__ ((no_instrument_function))' modifier I mention above. If the code starts acting up only when the profiler is on, its a good indication that you should cut back on what you are profiling within interrupts. 

And I guess thats all I have to say about profiling. Questions? Concerns? Did I fuck something up really bad and you want to tell me? Leave comments :).


Tuesday, March 17, 2009

Ands-pdf Release 1.4

Sorry for the delay to v1.4; but to make up for it, I've included a lot of new features!

Binary (Note: You must put the fonts folder at the root level of your slot-1 device)
Source

Changelog from 1.3 to 1.4:

Features:
  • New viewing mode. This is the 'reader' mode. More on that later.
  • D-pad now pans the screen. I got alot of requests for this one!
  • File browser is now filtered; will only show PDF files. Note that it will show folders that don't have PDF files in them.
  • You can now adjust the backlit brightness by holding down Select + Up/Down arrows.
  • You can now cancel out of the file browser with 'B', or by clicking the file browser button again.
  • Rendering is getting progressively faster.
  • Mini-help screen and info screen on startup.
  • Added splash screen. If you find this annoying, remove the splash files in the FONTS folder, and it will bypass this step.
  • Fonts are external now. This gives us alot more memory to play with at the expense of higher load times on certain PDF's. (Note: I took out the dingbats font. If anyone as a PDF that has dingbats, I will hand out that file too... It is much larger though).
Bug Fixes:
  • You can no longer scroll off the bottom of the file browser.
  • Some memory issues resolved that led to instability.
  • The zoom-tool used to mess up if you clicked it more than once. Doesn't anymore.
  • Tons of other stuff I'm sure...
Reader Mode

There is a new icon on the tool bar that looks like a paper with a lower case 'r' on it. This button brings you to the reader mode.

The reader mode serves two purposes. First, it lets you quickly navigate through pages. Second, it lets you read the text of a PDF in a format appropriate for the NDS.

I'm really excited about this new feature, however it isn't as far along as I would hope. The long and short of it is that the way PDF files store their text is difficult to read since every PDF generator seems to do it a bit differently; and there are a number of different font formats as well. This effectively means that it won't work for all PDF's, and won't do what you expect it to do for many more. I considered not releasing until I had everything working, but figured people would want new features even if some of them were faulty.

Some PDF's will render fine in the text view. Some will render with no spaces. Others will not render at all (I haven't run into alot of those, but, unfortunately, I suspect that non-latin encodings suffer much more greatly from this.) I haven't completely nailed down the auto-formating, but I'm close. You'll probably notice some puncutation is goofed up; thats another thing I'm working on. Its a work in progress.

If you have a document that doesn't look right in the reader mode that you don't mind sharing, please please contact me either here or elsewhere. I probably won't be able to respond to everyone, but I will look at it, and will try to figure out how to make it work right. Thank you!

To the right is a typical screen shot of the reader mode. On the bottom screen are different page layouts. On the top screen is the text pulled from the page. You can scroll the text via the D-Pad up and down keys, and change pages by the right and left keys. Touching pages will also give them focus. The numbers are, intuitively enough, the page numbers for that page.

Controls
General:
  • Start opens a new file ('B' will cancel this operation.)
  • Select + Up/Down adjusts the screen brightness to the desired level
  • B button toggles view mode between normal document view mode and reader view mode.
Normal Viewer Mode:
  • D-Pad pans the document
  • X toggles between book mode and normal mode
  • Right Shoulder/A moves forward a page.
  • Left Shoulder/Y moves back a page.
Reader Mode:
  • Right/Left arrow moves page forward/backward
  • Up/Down arrow moves page text up or down.
Installation Notes

Unlike the previous versions, this one has font artifacts you need to keep on the flashdrive for some PDF's to load correctly. You must put the FONTS folder on your card at the root level. To do this you can simply unzip the contents of the binary zip file on to your slot-1 device. This will put the FONTS folder and the nds file on the device. You can move the nds file wherever you want, but you cannot move the fonts folder. I hope to make this configurable in the future, but we aren't there yet.

As always, please post any comments, questions, bug reports, etc etc :)

Monday, March 9, 2009

GUI Poll

I've gotten a number of reports that the comment system for blogspot is a bit buggy sometimes. This isn't something I have direct control over, so if you have an account at gbadev.org or gbatemp.net, go ahead and shoot me messages there; my nick on both sites is 'albinofrenchy'. Sorry for anyone who has had a hard time because of the website.

But the real reason I'm posting is because I'm putting polls up for user interface stuff. The first one is going to be for the d-pad, which alot of people want to be utilized but noone agrees on what it should do. Those three options aren't the only three I'll consider, if you have a better idea post it in the comments and I'll amend the poll. If you have good ideas about other buttons, give them to me too and if we have different opinions, I'll make another poll! 

Thursday, March 5, 2009

Request for help

If any homebrewers out there know how to integrate the extra ram from Slot-2 devices easily into the memory manager the DS uses (newlib I believe), please drop me a line. Thanks!

Release v1.3


A new release! This one is mostly a bug fix and a clean up of all the old functionality. This includes: 
  • Made the zoom functionality more intuitive based on feedback I got from alot of you. Now you draw a box around what you want to see on the DS. It will use the current orientation and come up with a zoom that fits that box to the DS screen. Try it out, if you don't get it or plain don't like it, leave me some feedback.
  • Made the screen dim whenever you go to open a new file or you open the mini document view.
  • Changed the old god-awful icon to the new god-awful icon for the mini document view. Its now a magnifying glass.
  • Made the current location on the page reset when you go forward or backward a page. 
  • The page number now appears in the little bar on the bottom/left.
Thats all the big stuff, and covers alot of ground in terms of the feedback I've been getting. 

The controls are largely unchanged, but I'll post them here for completeness sake:
  1. 'X' changes the orientation of the page. Starts off in 'book' style orientation.
  2. Right and Left shoulder go to the next or previous page.
  3. Start brings up the dialog to open  a new file.
  4. Select toggles whether or not to show debug messages. Most users won't want this.
  5. 'B' resets the zoom to the default value of 100%. 
One of the peices of feedback I've gotten has to do with this control layout. I want to get a good feel for what people think would be the best way to go on this before I just start changing stuff. What do you think the controls should be? What other controls would you want to see added? 

The next major feature, and major release, is going to involve a reader mode. As it is, this pdf reader is great for viewing and looking at PDF's, and perhaps even grabbing some key facts from it, but noone is going to read a 100 page e-book in it. You'd get arthritis from so much scrolling and the rendering is so slow as to make it really annoying. 

Instead, the plan is to allow you to be able to choose a viewing mode where you can display the textual blocks of a PDF document on its own screen. You'll lose the formatting from the PDF document, but then you'll be able to fit the text in the DS nicely. And if you want to see the formating, just pop back into the normal viewing mode and you'll have the experience that exists now! 

Would this be something you'd want? Leave a comment if you have ideas on how to make this feature work like you'd want or if you just think its a good idea. 

Tuesday, March 3, 2009

Sorry for the lack of updates!

For anyone following the project, the lack of updates is due to life getting busy. Its not abandoned. For the immediate future, I plan to have a new version out which has some bug fixes and suggested modifications, and then the next version should have some major overhauls. 

I've gotten alot of really great feedback though; thanks to everyone for that! I've got comment moderation for old posts turned on so I can filter out spam; but you no longer should have to do captchas (It apparently was broken the other day?). The turn around on comment moderation should be no longer than a day though in most cases. 

Tuesday, February 24, 2009

Screenshots

I've run off some screen shots to entice people to try it out.

Also, I've noticed some people have started to upload it to things like gbatemp.net and the like. I encourage this; the more attention the better :). 

However; I'd ask that you always include a link to this blog when you do. Its good to be able to see how many people are interested in this project, and its even better if they have an easy way to contact me.

Normal layout across two screens:


















Rotated layout across two screens (left to right):

Monday, February 23, 2009

Another release!

This release is coming much later than I wanted; but it has some very nice features. 

Here is the rom itself.
Here is the source.

First, on the PDF compatibility side, I've managed to solve the issue of non-embedded fonts not displaying. The program now uses a set of built-in fonts; so it might not always display exactly how the author wanted it to if the font is something obscure. Or dingbats, which I stripped out because it was simply too big to include.

Please please be sure to message me if PDF's don't work now, I have no test pdf's that break that don't have large images at this point.

On the features side, I've added a 'preview' button that displays a 'mini' view of the document. The button is the only filled in icon on the small status bar at the bottom; it looks like a bulls-eye. I'll change the icon later to something that makes sense.

If you just tap the mini-image, it will pan to put the point you clicked in the upper-left corner. 

If you draw a line with the stylus, it will adjust the zoom on the document to fill the screens with the size of the line you drew. Hopefully this is more intuitive in practice than it is in description. For instance, if you draw a line from the left to right of the document, it will zoom to fit the width of the document to the screen. If you draw a line from the top to the bottom of the document, it will zoom out to fit the height of the document to the screen. 

I'd really like comments on the GUI functionality. Does it do what you expect? Especially if it doesn't, tell me why and what you expect it to do. Ideally I'd have it so the user can not read any instructions and be up and running with this. 

I've also provided the ability to change files. This hasn't been extensively tested, and there may be memory leaks in it, so beware of that. If you hit 'start', it should pop up with the file browser.

Another new tool is the console, which you can layer over the screen with the 'Select' button. This should make the 'debug' builds not necessary so I will not release anymore of those. 

Lastly, rendering should be signifigantly quicker now. Still could be better, but for now it seems usable. 

Wednesday, February 11, 2009

New Release

The beta release is done! In this release I've worked only on basic navigability and rendering; I've got a lot to do in terms of PDF support and the like but I haven't had a ton of time to work on the project.


Here is what we got:
  • (Limited) Zooming in/out
  • Previous Page / Next Page
  • Support for vertical layout  or horizontal layout
  • Improved rendering
So far the zoom only aligns the PDF width with the width of the screen. Arbitrary zooming isn't difficult, but I'd rather solidify the GUI before I add features that should be part of the GUI. 

Funny things occasionally happen when flipping horizontal or vertical. You might have to scroll a bit to find the page. If you don't see it immediatly, zoom out to look :). 

The controls:
- Right shoulder: Next page
- Left shoulder: Prev page
- A: Zoom in
- B: Zoom out
- X: Rotate the page
- Select: Toggle rendering modes. The default one is bug free, but the secondary is quicker.   As soon as I figure out a few artifacting bugs, the secondary will be the only mode available. 
- You can still move around with the stylus. 

If you find a PDF that does NOT render, please leave a link in the comments to it. If the thing is massive or has a large image, that is probably the culprit, but leave it anyway. I hope to go through and vastly improve the percentage of PDF's this thing reads in the next release. 

Actually, if you find any bugs (and I'm sure there are plenty) or, better yet, have any ideas you think would work well, by all means comment!

Friday, February 6, 2009

Todo List

A feature list for the next pseudo release:
  • Next/Prev page (mapped to the shoulder keys) (done)
  • Open File command (mapped to 'Start') (done)
  • Determine rendering slowdowns and fix them. Might be able to do it with doing two small render's instead of one big one and use previous frame. 
  • Some kind of toolbar at the bottom; preferably to show page / page total and have a accessable file thing. 
  • Use color pallete so we can retain true color. I thought the color pallete mapped to 24 bit color. Oops. 
The first two items are actually done now as of an hour ago; I hope to be able to implement the next three by next week.