DrupalBin
Submit Code
About
Recent Posts
Fix for Latest item + title list
Support Ticketing System module: code excerpt from 6.x-1.x-dev support.module lines 2843 to 2941
Fix for Create product reference field
Create product reference field
Re:
Fix for Code
Code
Code
bellHead's fields error
Code
more
User login
Log in using OpenID:
What is OpenID?
Username:
*
Password:
*
Log in using OpenID
Cancel OpenID login
Create new account
Request new password
Tags
CCK
drupal
fapi
jquery
menu
module
php
simpletest
test
theme
user
views
more tags
Home
›
Fix for ***This is not code*** Article for Blog and Planet-Soc
Fix for Fix for ***This is not code*** Article for Blog and Planet-Soc
View
Fix
Fixes are not saved to the database until you submit.
Summary:
Tags:
Any tags you'd like to associate with your code, delimitered by commas (example: Views, CCK, Module, etc).
Show summary in full view
Yesterday for me was the second day of Summer of Code for Drupal. My project is <a href="http://drupal.org/project/security_scanner">Security scanner component for SimpleTest module</a>. chx, my mentor, and I were working on a way to scan through all links on the site, checking for XSS vulnerabilities. In order to do this, we have to build up a crawler that catch all the links of the website and save them into a database. Then we have to use this links to inject xss or sql seed into the forms, in order to view if there are vulnerabilities or not. But now we're only at the first part of the project, so we're catching links. The basic idea is: make a scan of the page and save their links. Then extract the first link that has status field "0" (default status). Scan that url with a crawler and save its links again into the database. This means that database table "crawler_links" is such a queue of urls ready to be processed. After processing the page update the record of that page changing its status (move it to "1"). If you put this into code: <?php db_query("SELECT link_id, path FROM {crawler_links} WHERE status = 0 LIMIT 1"); ... process page db_query("UPDATE {crawler_links} SET status = 1 WHERE link_id = %d", $link_id); ?> However, if we want more than crawler working on at the same time, then this is a classic example of a "race condition": if another crawler is started while the "process page" part runs, then the second crawler will process the very same page. Not good. We can extend the meaning of the status field so that 1 means "processing began" and 2 means "processing finished", so: <?php db_query("SELECT link_id, path FROM {crawler_links} WHERE status = 0 LIMIT 1"); db_query("UPDATE {crawler_links} SET status = 1 WHERE link_id = %d AND status = 0", $link_id); ... process page db_query("UPDATE {crawler_links} SET status = 2 WHERE link_id = %d", $link_id); ?> Much better but if you are starting your crawlers from cron then it is still possible that two parallel SELECTs will run before one crawler has a chance to run an UPDATE. We need to make sure that this can't happen. However, the most popular database (MySQL with MyISAM is) does not support transactions. With transactions, the UPDATE would fail, so we would need to re-run our two queries until the UPDATE succeeds. Instead, we label our crawlers uniquely and do db_query("UPDATE {crawler_links} SET crawler_id = %d, status = 1 WHERE crawler_id = 0 LIMIT 1", $crawler_id); $page_to_visit = db_fetch_array(db_query( "SELECT id, path FROM {crawler_links} WHERE crawler_id = %d AND status = 1 LIMIT 1", $crawler_id)); ... process page db_query("UPDATE {crawler_links} SET status = 2 WHERE link_id = %d", $link_id); ?> How can we make sure the crawler_id is unique for sure? MySQL has autoincrement fields: specifically, we have a crawler table with a single "id" field. We INSERT into this table and the last insert id is going to be the $crawler_id. This permits the simultaneously cooperation of two or more crawlers, without overlap. And this is the beautiful environment of Summer of Code, learning to think, solving problems in a state-of-art way. Only for opensorce and for cotton =) !
Syntax highlighting mode:
ActionScript
ColdFusion
Diff
Drupal 5
Drupal 6
HTML
INI
Javascript
MySQL
PHP
Python
robots.txt
SQL
Text
Select the syntax highlighting mode to use.
See Also:
Order
Title:
URL:
-1
0
1
Title:
URL:
-1
0
1
Any links you'd like to have associated with the post (Drupal.org issue, Wikipedia article, etc).
File attachments
Changes made to the attachments are not permanent until you save this post. The first "listed" file will be included in RSS feeds.
Attach new file:
The maximum upload size is
1 MB
. Only files with the following extensions may be uploaded:
jpg jpeg gif png txt doc xls pdf ppt pps odt ods odp
.