Archive for March, 2009

 

Assignment 8: xml parsing?

Europe ? #
Polska http://pl.engadget.com
Deutschland http://de.engadget.com
Asia ? #
???? http://chinese.engadget.com
???? http://cn.engadget.com
??? http://japanese.engadget.com/
???? http://kr.engadget.com/
EspaƱol http://es.engadget.com
HD http://www.engadgethd.com
Mobile http://www.engadgetmobile.com
Engadget http://www.engadget.com/
Engadget #
Web http://search.aol.com/aol/search?invocationType=wl-gadget&query=
Images http://search.aol.com/aol/image?invocationType=wl-gadget&query=
Video http://search.aol.com/aol/video?invocationType=wl-gadget&query=
News http://search.aol.com/aol/news?invocationType=wl-gadget&query=
Local http://local.aol.com/aol/local?invocationType=wl-gadget&query=
RSS Feed /rss.xml
Contact us /contact/comment/
Tip us on news! /contact/tips/
http://www.monoprice.com/products/subdepartment.asp?c_id=104&cp_id=10428
Permalink http://www.engadget.com/2009/03/30/mini-displayport-adapters-now-available-for-20/
Email this /forward/1502779/
31 Comments http://www.engadget.com/2009/03/30/mini-displayport-adapters-now-available-for-20/#comments
http://www.businesswire.com/portal/site/google/?ndmViewId=news_view&newsId=20090330006184&newsLang=en
Permalink http://www.engadget.com/2009/03/30/intels-xeon-3500-5500-series-officially-unveiled-for-servers-a/
Email this /forward/1502788/
17 Comments http://www.engadget.com/2009/03/30/intels-xeon-3500-5500-series-officially-unveiled-for-servers-a/#comments

For this assignment I tried to get engadget headlines and mix up the links so that it doesn’t make sense, but all i could parse was some links and some crap. I tried to use the Getter.java and Homework.java to make this work. Later i found that Homework.java was used to parse HTML and not XML. So instead of feeding the rss.xml link for engadget I fed in the direct html link. Since the source for the HTML was very messy I could not seperate out the elements required.

code:

import org.dom4j.Document;
import org.dom4j.DocumentFactory;
import org.dom4j.io.SAXReader;
import org.dom4j.Element;
import org.xml.sax.XMLReader;
import java.util.List;
import java.util.HashMap;
import java.util.regex.*;

public class Getter1 {
public static void main(String[] args) throws Exception {
// String url = args[0];

HashMap<String, String> map = new HashMap<String, String>();
map.put(”xhtml”, “http://www.w3.org/1999/xhtml”);
DocumentFactory factory = DocumentFactory.getInstance();
factory.setXPathNamespaceURIs(map);

XMLReader tagsoup = new org.ccil.cowan.tagsoup.Parser();
SAXReader reader = new SAXReader(tagsoup);
EasyHTTPGet getter1 = new EasyHTTPGet (”http://www.engadget.com”);

Document document = reader.read(getter1.responseAsInputStream());
List listItems = document.selectNodes(”//xhtml:li”);

for (Object o: listItems) {
Element elem = (Element)o;
String[] parts = elem.getText().split(”/”);

Element anchor = (Element)elem.selectSingleNode(”xhtml:a”);
String project = anchor.getText();
String href = anchor.attributeValue(”href”);
System.out.println(project + ” ” + href);

}

}
}

Posted by admin under Uncategorized  •  No Comments

Context Free Grammar:

I decided to make a prescription/disease identifier. This script automatically allocates a disease and prescribes medication for a patient. For this i made use of the context free grammer code and changed the grammar.

o/p:

Too bad you are dying from Mucormycosis- Zygomycosiss Neurontin

Im sorry you are suffering from Mucormycosis- Zygomycosiss take Gabapentin

Too bad you are dying from Lymphogranuloma venereum take Gabapentin

Im sorry you are suffering from Lymphogranuloma venereum take Wellbutrin SR (Bupropion Hydrochloride Sustained-Release)

Im sorry you are suffering from Lymphogranuloma venereum take Gabapentin

Congratulations you are free from Lymphogranuloma venereum take Wellbutrin SR (Bupropion Hydrochloride Sustained-Release)

Im sorry you are suffering from Mucormycosis- Zygomycosiss take dog food laced with Imitrex Nasal Spray

Posted by admin under Uncategorized  •  No Comments

Midterm :

My midterm project was a made making use of the word cloud code and a little PHP. Since this class deals with text i thought i will make use of the twitter API to pull tweets from twitter and use that as input for the word cloud program.

For pulling the tweets I used PHP. Since the word cloud program requires a text file as input I had to parse the xml feed of twitter and then save it to a txt file on the server and feed that text file as input for the word cloud.

$file = “http://twitter.com/statuses/public_timeline.xml?count=200″; is the link for the public timeline of twitter in xml.

The php code is as follows :

<html>
<head>
<META HTTP-EQUIV=”refresh” CONTENT=”15; URL=http://www.sanjaypapinazath.com/a2z/twit_pull6.php”>
</head>
<body>
<?php

ini_set(’display_errors’, true);
ini_set(’display_startup_errors’, true);
error_reporting(E_ALL);

$file = “http://twitter.com/statuses/public_timeline.xml?count=200″;

//$fp1=fopen(’/home/sanjaypa/public_html/a2z/twit_pull.txt’,'w’);

$currentTag = “”;

function contents($parser, $data){
$fp1=fopen(’/home/sanjaypa/public_html/a2z/twit_pull.txt’,'a ‘);
global $currentTag;
if ($currentTag == ‘TEXT’ || $currentTag == ‘NAME’) {
echo “$data<br/>”;
fwrite($fp1, $data);
fclose($fp1);

}
}

function startTag($parser, $name){
global $currentTag;
$currentTag = $name;
// echo “<b>”;
}

function endTag($parser, $data){
// echo “</b><br />”;
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, “startTag”, “endTag”);
xml_set_character_data_handler($xml_parser, “contents”)
$fp = fopen($file, “r”);
$data = fread($fp, 80000);
if(!(xml_parse($xml_parser, $data, feof($fp)))){
die(”Error on line ” . xml_get_current_line_number($xml_parser));
}

xml_parser_free($xml_parser);
fclose($fp);
?>
</body>
</html>

In the processing sketch i made use of the SMS library which makes use of the accelerometer in the laptop.

So when the sketch is running and the laptop is tilted upwards the words used more will go up.

o/p :picture-1.png

Posted by admin under Uncategorized  •  No Comments