Our campus site is using an iframe, and we encourage the other sites on campus using our iframe to have an identity of the university. This iframe has links to campus address, campus online directory, contact campus page link, an index of the campus pages, google search and most importanty a list box of quick links. This quicklinks informatin is held in the database, and unfortunately when it was programmed, nobody thought of performance, or optimizing i guess. The page accesses the database everytime iframe is loaded. Our main campus page gets a hit of 25000 to 35000 hits per day, and as we have decentralized structure, we cant control other departments' web site, so we have no idea how many hit they get, but roughly i will say, the iframe gets a hit of 45000 a day. Guess what, as there is no caching, or any other optimizing structure, the web sites hit the database 45000 times a day just to get a 80 or 85 rows of data which is quicklinks. It is not easy to add or delete a link from the quicklinks as it requires some meetings, management decisions so i can say it is mostly read-pnly database; it has been updated once a month at most.
So why was caching the information not thought at the beginning; i have no idea; but whenever i see code that i think it is really ugly, i try to fix it. What is the best fix for this? I can think of some different approaches:
- Read the quicklinks from the database, put it in a cache, have sqlcachedependency to monitor the table (I will call this Approach I); so whenever there is an update dont read the dirty cache but update the cache
- Read the quicklinks from the database, put it in a cache; have 1 day of lifetime for cache (Approach II)
- Dont use database at all,use an xml file read; from the xml file, so dont hit the databas; however it requires a disk access of 45000 times (Approach III)
- Dont use a database at all, use an xml file, read from the xml file one time during a day. so if there is an update, it will be reflected to quicklinks the next day (Approach IV)
- Like Approach IV but instead of reading the xml file the next day, read the last modified day, and if it is different from the last read day, read the xml file again (Approch V)
- Hardcode the links in the code :) (Approach VI)
Immidiately i eliminate Approach VI, as i dont like putting data in the code itself. Approach I looks ok to me at the beginning but we are only talking about 80 rows, and the links are updated once a month, i dont think it is good idea to poll the table for new updates, so i eliminate this. Approach II too is basically like Approach I, but we dont poll the database for updates, we just read the entire table (once again we are talking about 80 rows at most) the next day, this is better than approach i, as there will be 1 database hit per day. Comparing to 45000 hits per day, this of course a lot better approach, and if there is any update in the table, the worst case it will be up in the site after a day.
Approach III is eliminated as it hits the disk 45000 times. Approach IV is of course better, as it will hit the disk 1 time per day to read the xml to the memory. I dont see any performance implementing approach V against IV cause, the xml file will be very small, and instead of checking the last modified date, the whole xml could be read easily.
So what is the verdict and why ? I picked approach II, as it hits the database only 1 time a day, and use cache to store the data. If there is any update in the table, it will be reflected to the site the next day. In fact i was planning to implement approach IV, as there wont be any database hit, and reading 80 lines of data in an xml will be faster, but we might be sharing or reusing this quicklinks table in other projects too; so instead of sharing xml file between projects and ending up different versions of the xml file in the future, i decided the approach II.
Have fun refactoring :)
Tags: