My time at AT&T so far has been enjoyable. I have a program that has been running continuously for a few days, collecting data from news websites like CNN, MSNBC, Fox News, and ABC News. What it does basically is downloads a list of RSS(Really Simple Syndication) feeds from these news sources, and stores them in a database. The program then downloads the articles linked from each RSS feed.
An RSS feed is a technology that allows you to receive an updated listing of recent posts to some data source, be it CNN or your favorite blog that provides RSS support.
I'm working on building various training models over the data collected by my script. The training is for capitalization of words. When the researchers at AT&T receive text that is created by an ASR(Automatic Speech Recognition) program, the text has no capitalization or punctuation. And with Closed Captioned text, capitalization and punctuation are often done sloppily. So the object of this is to automatically make the text look nice and normal. :-)