Thursday, March 26, 2009

Profiling on the DS

Fast, small, or easy. On this project, I've learnt that for PDF rendering you can have only two of those.

Since the hardware doesn't really support more than 4 mb of rendering, small is a requirement. Slowly, I've been moving from it being easy on the code side to fast on the user side; which is good. A difficulty I've faced is that mupdf is a large library; spread over roughly three modules in many files. This makes assessing what is taking the most time non-trivial.

Luckily there is a very good function level profiler that is available for the DS written by simonjhall; the guy who ported Quake I and Quake II to the DS.

Function Level Profiling

This section talks about what function level profiling is doing, and why it is cool. If you don't care or know all of it already, feel free to skip it, the more instructional stuff is in the next section.

In case you aren't familiar with the concept, gcc and g++ offer this great feature where they insert a call to a specialy designated function on every function call, and also insert a call to another special function on every function return. It gives both functions the address of the called function, as well as the place its being called from.

So lets say we have a simple program

void c(){ /* something */}
void b(){ c(); }
void a(){ b(); c();}
int void main(int c, char* args){
a();
}
If we just ran this program, execution would go something like:

  • main()
  • a()
  • b()
  • c()
  • c returns
  • b returns
  • c()
  • c returns
  • a returns
  • main returns
And now, although this program runs instantly, we want to figure out how much time the code spends in each function. With the ability to insert functions before and after function calls, we can have the special entry function grab the time on entry, record it, and when the exit function is called, figure out what the difference is; and then we have the amount of time that function ran. So With special _entry(void* calledFunction) and _exit(void* calledFunction) functions defined and enabled for our program; the execution goes something like:
  • main()
  • _entry(*a())
  • a()
  • _entry(*b())
  • b()
  • _entry(*c())
  • c()
  • c returns
  • _exit(*c())
  • b returns
  • _exit(*b())
  • _entry(*c())
  • c()
  • c returns
  • _exit(*c())
  • a returns
  • _exit(*a())
  • main returns
We won't be too concerned here with the implementation of these functions -- we provide files later that handle this for us(note: if you want to do this in non-arm processors, use the -pg flag). It is worth noting a couple of things though. Primarily, as this is a debug tool, the amount of overhead this adds is very signifigant. It is possible for memory to be an issue; the special functions need to maintain some form of stack of called functions so it can keep the entry data for functions that call other functions. However, most likely you will notice that the profiler adds alot of time to your program, and the flags for it should only be enabled when its needed.

On a more technical note; in general, you can only rely on this kind of profiling so much. There are many places online detailing why; but given the simplicity of the DS (ie, no cacheing to worry about), most don't apply here (note: If you can think of a way this kind of profiling can lead one astray on the DS, please comment :)).

How to profile on the DS

So download these three files: cyg-profile.c, cyg-profile.h, and resolve.pl(That last one is just 'resolve' on the server. Change it to 'resolve.pl'). You should add the c file and the h file to your project. Make sure though, if the rest of your project is C++ to '#define __cplusplus' before you include the h file.

This code won't actually profile everything; it will rely on you enabling the profiler within your code. This allows you to focus on profiling only where you know the slowdown to be. If you want to profile everything of course, just do it right off :).

Initialize the profiler with cygprofile_begin().
Enable the profiler with cygprofile_enable().
Temporarily disable the profiler with cygprofile_disable().
Permenatly disable the profiler and write the result file out with cygprofile_end(). (Nothing will be written unless you call this function at the end.)

Now you need to change your build process a little. Append to your c flags these two things: '-finstrument-functions -mpoke-function-name'. The first one tells the compiler to use the special functions in cyg-profile.c; the second tells the compiler to poke the function name into the binary so we can get a readable version later on.

One more thing is necessary. There are no real easy ways to get a real good reading for the current time on the DS. So this code instead relies on counting hblanks to provide accurate time measurements. It does not handle counting hblanks, that is something you need to do. You do this by adding an hblank interrupt and simply incrementing a global integer named 'hblank'. If you are already handling this interrupt; go ahead and add the global definition anywhere, and at the top of the hblank handler throw in a 'hblanks++'. If you don't, just call this code in your initialization stuff:

irqEnable(IRQ_HBLANK);
irqSet(IRQ_HBLANK, hblankCount);

And then add this code above it somewhere in the file:

int hblanks = 0;
static void hblankCount (void)
__attribute__ ((no_instrument_function));

void hblankCount(){
hblanks++;
}

If you already have an hblank handler, like I said, use it. But modify its declaration to have the '__attribute__ ((no_instrument_function))' modifier. This tells the compiler to not hijack this functions entry/exit; which could be hazardous in something that runs as an interrupt. I'd actually recommend doing this to all your interrupts; unless you really want to profile them specifically.

Have it all set up? Run it! If everything goes right, you'll generate a file cyglog.txt(it is up to you to enable the FAT filesystem) ; which is generally unreadable until we run it through resolve.py. We do this next.

This next part requires you to have perl installed. If you are like me, you don't have it yet. Go get it, come back.

Copy this file from your DS to your harddrive. Then open up a terminal/command prompt. Navigate to wherever the cyglog.txt file is, and make sure that perl and the arm toolchain are on your path. You'll also need to find a copy of the elf file for whichever processor generated the cyglog file. That file will provide the resolver with all the nice names for the ugly function pointers that the profiler gives you.

Moment of truth: run 'perl resolve.pl {program.elf} cyglog.txt'. It should dump out a large table straight to standard out. Once you get it working, you'll want to dump it to another file to look at. The next section is going to pretend you named that file 'result.txt' ('perl resolve.pl {program.elf} cyglog.txt > result.txt'), but you of course, don't have to.

Interpretting the Results

The table output will look like this:

address calls ms ms/call ms/frame lowest stack highest stack function name

0x2004284 153 0.0636 0.000 0.000 15384 15424 bgGetGfxPtr
0x200422c 153 0.0000 0.000 0.000 15360 15400 bgGetMapBase
0x2016718 9269 185.4962 0.020 0.309 14048 14904 fz_concat

Its a real good idea right now to throw this output in some kind of spread sheet app. That way you can sort the whole thing by individual columns.

The first column is the actual memory address of the program. Not too useful.
The second column is calls. This is the straight number of times the function was called.
Third we have ms. This gives us, predictably, the total time spent in that function in milliseconds.
Fourth, we have ms/call. This is the average time spent in the function per call. Pay special attention here; the amount of time a function takes can vary wildly.
Fifth, we have ms/frame. This number is a bit of a red herring, and not really meaningful unless you are profiling code that needs to occur within one vblank. This number is simply the number of total milliseconds divided by the number of frames processed as measured by the hblank you are incrementing.
Sixth: Highest/Lowest stack. This, presumably, tells you the min and max location of the function on the call stack -- ie, how 'nested' your function is. I don't know when this information would be useful; if you do please tell me :). A reader notes below that it is useful sometimes to tell if your program is about to overflow the stack.
And of course, last but not least, the function name. If you are coding in C++, and using classes heavily, get used to its name mangling. A function named repairFrame in the class PdfApp that takes a single fz_irect_s argument for instance, comes out as '_ZN6PdfApp11repairFrameE10fz_irect_s'.

The first thing you want to do is sort it by total time taken. Likely, you'll find your best optimizations by optimizing the functions at the top of this list specifically. If that function is called alot, it means that small things can go a big way in that function. If it isn't called that much, checked to see if you are doing something necessarily costly in that function (Like... floating point math) or if its calling expensive children.

Understand that the MOST time saved you'll get from modifying any of these functions is the total time they take to execute. Prioritize based on this. Optimizing your code is an iterative process; try to only optimize one thing at a time, and kind of keep track of what each optimization gave you. Its a good habit to save all the result files along the way to consult.

Selective Profiling

Elhobbs had a useful tip for setting up an easy way to specify which files to profile and which to not. Add the following to your makefile after you include the normal ds_rules:

%.clk.o: %.clk.c 
   @echo $(notdir $<) 
   $(CC) -MMD -MP -MF $(DEPSDIR)/$*.d $(CFLAGS) -finstrument-functions -mpoke-function-name -c $< -o $@ 


Then whenever you want to profile a file, change its name to *.c to *.clk.c. You will still have to enable the profiler, but there is virtually no overhead if you initialize the profiler but never use it. When you view your report later on, you should only see functions profiled that were in that file. 

Twentyseven points out below that it can be dangerous to profile code that occurs in interrupts; that extra overhead for profiling could cause it to do really goofy things. You can choose which functions to profile with the '__attribute__ ((no_instrument_function))' modifier I mention above. If the code starts acting up only when the profiler is on, its a good indication that you should cut back on what you are profiling within interrupts. 

And I guess thats all I have to say about profiling. Questions? Concerns? Did I fuck something up really bad and you want to tell me? Leave comments :).


Tuesday, March 17, 2009

Ands-pdf Release 1.4

Sorry for the delay to v1.4; but to make up for it, I've included a lot of new features!

Binary (Note: You must put the fonts folder at the root level of your slot-1 device)
Source

Changelog from 1.3 to 1.4:

Features:
  • New viewing mode. This is the 'reader' mode. More on that later.
  • D-pad now pans the screen. I got alot of requests for this one!
  • File browser is now filtered; will only show PDF files. Note that it will show folders that don't have PDF files in them.
  • You can now adjust the backlit brightness by holding down Select + Up/Down arrows.
  • You can now cancel out of the file browser with 'B', or by clicking the file browser button again.
  • Rendering is getting progressively faster.
  • Mini-help screen and info screen on startup.
  • Added splash screen. If you find this annoying, remove the splash files in the FONTS folder, and it will bypass this step.
  • Fonts are external now. This gives us alot more memory to play with at the expense of higher load times on certain PDF's. (Note: I took out the dingbats font. If anyone as a PDF that has dingbats, I will hand out that file too... It is much larger though).
Bug Fixes:
  • You can no longer scroll off the bottom of the file browser.
  • Some memory issues resolved that led to instability.
  • The zoom-tool used to mess up if you clicked it more than once. Doesn't anymore.
  • Tons of other stuff I'm sure...
Reader Mode

There is a new icon on the tool bar that looks like a paper with a lower case 'r' on it. This button brings you to the reader mode.

The reader mode serves two purposes. First, it lets you quickly navigate through pages. Second, it lets you read the text of a PDF in a format appropriate for the NDS.

I'm really excited about this new feature, however it isn't as far along as I would hope. The long and short of it is that the way PDF files store their text is difficult to read since every PDF generator seems to do it a bit differently; and there are a number of different font formats as well. This effectively means that it won't work for all PDF's, and won't do what you expect it to do for many more. I considered not releasing until I had everything working, but figured people would want new features even if some of them were faulty.

Some PDF's will render fine in the text view. Some will render with no spaces. Others will not render at all (I haven't run into alot of those, but, unfortunately, I suspect that non-latin encodings suffer much more greatly from this.) I haven't completely nailed down the auto-formating, but I'm close. You'll probably notice some puncutation is goofed up; thats another thing I'm working on. Its a work in progress.

If you have a document that doesn't look right in the reader mode that you don't mind sharing, please please contact me either here or elsewhere. I probably won't be able to respond to everyone, but I will look at it, and will try to figure out how to make it work right. Thank you!

To the right is a typical screen shot of the reader mode. On the bottom screen are different page layouts. On the top screen is the text pulled from the page. You can scroll the text via the D-Pad up and down keys, and change pages by the right and left keys. Touching pages will also give them focus. The numbers are, intuitively enough, the page numbers for that page.

Controls
General:
  • Start opens a new file ('B' will cancel this operation.)
  • Select + Up/Down adjusts the screen brightness to the desired level
  • B button toggles view mode between normal document view mode and reader view mode.
Normal Viewer Mode:
  • D-Pad pans the document
  • X toggles between book mode and normal mode
  • Right Shoulder/A moves forward a page.
  • Left Shoulder/Y moves back a page.
Reader Mode:
  • Right/Left arrow moves page forward/backward
  • Up/Down arrow moves page text up or down.
Installation Notes

Unlike the previous versions, this one has font artifacts you need to keep on the flashdrive for some PDF's to load correctly. You must put the FONTS folder on your card at the root level. To do this you can simply unzip the contents of the binary zip file on to your slot-1 device. This will put the FONTS folder and the nds file on the device. You can move the nds file wherever you want, but you cannot move the fonts folder. I hope to make this configurable in the future, but we aren't there yet.

As always, please post any comments, questions, bug reports, etc etc :)

Monday, March 9, 2009

GUI Poll

I've gotten a number of reports that the comment system for blogspot is a bit buggy sometimes. This isn't something I have direct control over, so if you have an account at gbadev.org or gbatemp.net, go ahead and shoot me messages there; my nick on both sites is 'albinofrenchy'. Sorry for anyone who has had a hard time because of the website.

But the real reason I'm posting is because I'm putting polls up for user interface stuff. The first one is going to be for the d-pad, which alot of people want to be utilized but noone agrees on what it should do. Those three options aren't the only three I'll consider, if you have a better idea post it in the comments and I'll amend the poll. If you have good ideas about other buttons, give them to me too and if we have different opinions, I'll make another poll! 

Thursday, March 5, 2009

Request for help

If any homebrewers out there know how to integrate the extra ram from Slot-2 devices easily into the memory manager the DS uses (newlib I believe), please drop me a line. Thanks!

Release v1.3


A new release! This one is mostly a bug fix and a clean up of all the old functionality. This includes: 
  • Made the zoom functionality more intuitive based on feedback I got from alot of you. Now you draw a box around what you want to see on the DS. It will use the current orientation and come up with a zoom that fits that box to the DS screen. Try it out, if you don't get it or plain don't like it, leave me some feedback.
  • Made the screen dim whenever you go to open a new file or you open the mini document view.
  • Changed the old god-awful icon to the new god-awful icon for the mini document view. Its now a magnifying glass.
  • Made the current location on the page reset when you go forward or backward a page. 
  • The page number now appears in the little bar on the bottom/left.
Thats all the big stuff, and covers alot of ground in terms of the feedback I've been getting. 

The controls are largely unchanged, but I'll post them here for completeness sake:
  1. 'X' changes the orientation of the page. Starts off in 'book' style orientation.
  2. Right and Left shoulder go to the next or previous page.
  3. Start brings up the dialog to open  a new file.
  4. Select toggles whether or not to show debug messages. Most users won't want this.
  5. 'B' resets the zoom to the default value of 100%. 
One of the peices of feedback I've gotten has to do with this control layout. I want to get a good feel for what people think would be the best way to go on this before I just start changing stuff. What do you think the controls should be? What other controls would you want to see added? 

The next major feature, and major release, is going to involve a reader mode. As it is, this pdf reader is great for viewing and looking at PDF's, and perhaps even grabbing some key facts from it, but noone is going to read a 100 page e-book in it. You'd get arthritis from so much scrolling and the rendering is so slow as to make it really annoying. 

Instead, the plan is to allow you to be able to choose a viewing mode where you can display the textual blocks of a PDF document on its own screen. You'll lose the formatting from the PDF document, but then you'll be able to fit the text in the DS nicely. And if you want to see the formating, just pop back into the normal viewing mode and you'll have the experience that exists now! 

Would this be something you'd want? Leave a comment if you have ideas on how to make this feature work like you'd want or if you just think its a good idea. 

Tuesday, March 3, 2009

Sorry for the lack of updates!

For anyone following the project, the lack of updates is due to life getting busy. Its not abandoned. For the immediate future, I plan to have a new version out which has some bug fixes and suggested modifications, and then the next version should have some major overhauls. 

I've gotten alot of really great feedback though; thanks to everyone for that! I've got comment moderation for old posts turned on so I can filter out spam; but you no longer should have to do captchas (It apparently was broken the other day?). The turn around on comment moderation should be no longer than a day though in most cases.