wiki:PSPSSample
PSPS Help Sections Software Components PSPS User Tools Project Wikis Related sites * -- restricted access

Back to Main PSI Help Wiki

Sample Queries

THIS PAGE IS UNDERGOING SIGNIFICANT REVISION AT THIS TIME

On this page we illustrate how one can use the ODM database to answer some basic searches in the database. We will limit these searches for the moment to those tables that will be populated when the ODM opens for business, namely detections and the corresponding Frame and Image metadata. Almost all of these queries could be generated with the Query Builder in PSI but we show them here written out in Structured Query Language (SQL).

As new data types (e.g. stacks, difference images, object records) are added to the database the queries shown here will become more sophisticated as one will be able to make interesting joins between different tables. As the PSPS begins its roll out beta testing we are starting only with detection records and associated metadata so the queries are somewhat limited compared to what will be available in the future.

About SQL

This page is not intended to be a detailed tutorial in SQL, so for those who are unfamiliar or only slightly familiar with SQL there are many helpful tutorials available on the web. We have attached a pdf file (sql_intro.pdf) to this page for a SQL introduction does a good job of covering the basics. For those new to SQL or who only have a passing knowledge of the language, it would be a good idea to look at one of these on-line tutorials before going over these examples.

Unfortunately, the "S" in SQL doesn't stand for Standard. The while there are published standards for SQL it's not clear whether any single implementation meets them all. Further, the SQL dialect appropriate to different databases often introduce vendor specific "features." The used by the Microsoft database used for the ODM is known as T-SQL and there are numerous books describing its features and use. The SSDM, on the other hand, uses the popular MySQL database. One difference between the two versions can be seen in how one limits the number of rows returned as the results of a query. Suppose we want the first 100 entries of some table xyz. In T-SQL we'd submit the query

SELECT TOP 100 * 
FROM xyz

while for MySQL the query would be

SELECT * 
FROM xyz LIMIT 100

Since the data stores in the PSPS are read-only to the users no harm will occur if you enter an incorrect SQL command -- the worst you get is an annoying error message.

One last note on SQL has to do with the formatting of the commands. Basically, there is no standard formatting for how one types a SQL command. In the examples presented below we have tried to break the different parts of the query over several lines to illustrate what is being asked for, from what tables, and what constraints we want to apply to the query. In writing your own SQL feel free to add line breaks, indentation, and other spacing to make the query more readable.

Examples

The following examples in this section

  1. Find what frames have been loaded by looking at the FrameMeta table.
  1. Use the built in functions to search boxes or cones.
  1. Combine box/cone searches on the object table to gather the data to make color-magnitude or color-color plots.
  1. Extract light curve information for all sources measured more than some specified number of times in a given area.
  1. Extract 1000 random sources from the color-magnitude diagram created in example 3.
  1. Input a list of sources from an external file and search the database for objects that match, then find the detections for these objects.
  1. Find how many sources in a region are only seen in the image stacks and not in the single frame detections.

1. Finding loaded frames

When one first gets into the database it's natural to ask what data is available. In particular, what frames have been loaded into a particular data base?

To get a list of everything loaded the following simple query against the FrameMeta table will give you the answer:

SELECT frameID, frameName, raBore, decBore, filterID, expStart
FROM FrameMeta

This will return the frameID (number) the frameName (as used by the CAMERA and IPP), the telescope bore sight right ascension and declination, and the exposure starting time (MJD).

As the volume of data grows, you might want to restrict this query to a more specific piece of the sky so as not to be swamped by lists of frames you don't care about. This is accomplished by modifying the query to constrain the raBore and decBore attributes. Say we want to look at the region 2 degrees on a side on the celestial equator at 23 hours RA. Exposures in which the telescope boresight falls within this region can be found by executing the query

SELECT frameID, frameName, raBore, decBore, filterID, expStart
FROM FrameMeta
WHERE
     raBore BETWEEN  344. AND 346.
AND
     decBore BETWEEN -1. AND 1.0

Should you what to find frames that are partially overlapping with this region you can expand the lower and upper bounds in the BETWEEN statements to allow for the fact the telescope has a 3 degree diameter field of view. (This is an exercise left to the reader.)

This example prepared by Jim Heasley.


2. Cone and box searches

Here are some examples of spatial queries using the internal database functions. (The descriptions of the internal database functions can be found here). There are several built-in functions available to users that make spatial queries, i.e., those with coordinate cuts, much more efficient than simply including the coordinate constraints in the WHERE clause. Consider the same region in the previous example. These internal functions will return a table of the object IDs, ra, dec, and direction cosines of the sources inside the box. The cone search function returns an additional argument, distance, which is the distance in degrees of the object located from the reference position. For this example we make our queries against the 3 pi database.

First consider a rectangular search using straight coordinate constraints. We limit the output of these queries to the first 500 responses so we can run them within the fast queue without overflowing the output buffer space. You can get the entire result set by running this in the slow queue and having the output automatically redirected into your MyDB.

SELECT top 500 objID, ra, dec, cx, cy, cz FROM object 
WHERE 
     (ra between 344. and 346.)
AND 
     (dec between -1.0 and 1.0)

This query can be rewritten as follows to use the either of two internal function that returns a rectangular search area for a box. The function fGetObjFromRect(@ra1, @ra2, @dec1, @dec2) returns table of objects inside a rectangle defined by two (ra, dec) pairs. Note the order of the parameters: @ra1, @ra2, @dec1, @dec2. Assumes @dec1<@dec2. There is no limit on the number of objects returned.

SELECT top 500 * FROM dbo.fGetObjFromRect(344., 346., -1., 1.0)

The other function is fGetObjFromRectEq(@ra1, @dec1,@ra2,@dec2)) returns a table of objects inside the rectangle defined by two (ra, dec) pairs. This is a variant of fGetObjFromRect (actually calls it) that takes the input parameters in a more intuitive order (ra1, dec1, ra2, dec2). Assumes dec1<dec2. There is no limit on the number of objects returned.

SELECT top 500 * FROM dbo.fGetObjFromRectEq(344., -1.0, 346., 1.0) 

A cone (radial) search of 1. degree radius (@theta) for objects about the target position (@ra, @dec) = (345., 0.0) uses the function fGetNearbyObjEq(@ra, @dec, @theta) All the arguments are in degrees:

SELECT top 500 * FROM dbo.fGetNearbyObjEq(345., 0.0, 1.0) 

Because our internal functions return a table, we can use these in queries including to form joins with other tables, in particular, to restrict the spatial range of those other tables to the rectangle or cone specified in the functions.

This example prepared by Jim Heasley.


3. Color-magnitude and color-color plots

Get the information for a specific region on the sky to plot a color-magnitude diagram or generate a color-color plot. For this example we will use the SA3 database. The center of this field is at (333.74, 0.35). The query to generate the data for these plots is contained entirely in the object table. For this example, We will make a g vs g-r CMD and a g-r vs r-i color-color diagram. For purposes of the example we will use the magnitudes and colors created from the average properties determined from the individual exposure information. So, we will want go get the attributes gMeanMag, grMeanColor, and riMeanColor from the object table. As a first case, let's query over the entire SA3 database (which is small in comparison say to our 3 pi database). The query to get the data for the CMD is

SELECT top 500 objID, gMeanPSFMag, (gMeanPSFMag - rMeanPSFMag) as grMeanColor FROM  object 
     WHERE
     gMeanPSFMag <> -999 AND rMeanPSFMag <> -999
     AND 
     ng > 0 AND nr > 0

Note that for the query we have added the constraint that the color and magnitude not equal the default value of -999, i.e., we only want those objects for which we have both the color and magnitude actually evaluated. A quick style point here as well, namely one can specify the not equal condition either with the <> or using != (as in C or Perl). }}} The query to get the color-color plot data is almost identical:

SELECT top 500 objID, ra, dec, ng, nr, gMeanPSFMag, rMeanPSFMag, (gMeanPSFMag - rMeanPSFMag) as grMeanColor FROM  object 
     WHERE
     gMeanPSFMag <> -999 AND rMeanPSFMag <> -999
     AND 
     ng > 0 AND nr > 0
     AND 
     ra BETWEEN 330 AND 331
     AND
     dec BETWEEN 0 AND 1

To show how we can combine information for the object table with that from one of our spatial query functions, let's extract the objID, ra, dec, and CMD information for a box that is 2 degrees on a side centered on the middle of the SA3 region. We will want to perform a join between the output table created by fGetObjFromRectEq and the object table.

SELECT top 500 r.objID, r.ra, r.dec, o.gMeanPSFMag, (gMeanPSFMag - rMeanPSFMag) as grMeanColor FROM dbo.fGetObjFromRectEq(332.74, -0.65, 334.74,1.35) r
     JOIN object o ON o.objID = r.objID
     WHERE
     o.gMeanPSFMag <> -999. AND rMeanPSFMag <> -999

Of course, we could have written this query using the object table alone:

SELECT top 500 objID, ra, dec, gn, rn, gMeanPSFMag, rMeanPSFMag, (gMeanPSFMag - rMeanPSFMag) as grMeanColor FROM  object 
     WHERE
     gMeanPSFMag <> -999
     AND 
     ra BETWEEN 332.74 AND 334.74
     AND
     dec BETWEEN -0.65 AND 1.65

(Indeed, for this example the second form working on the object table alone runs faster than joining with the output of fGetObjFromRectEq.)

If you actually want to plot the results you would save the query results into a MyDB table (by running through the "slow" queue) and using the graphing menu in PSI.

This example prepared by Jim Heasley. Modified by Conrad Holmberg 04-13-2013


4. Light curves

Extract the light curve in the r filter information for all sources in the SA3 database which have more than 10 measurements in this filter. For this query we want to use information from the DetectionFull? view (PSF instrumental flux, estimated error in flux, and time of observation) and from the object table to find those sources that have 10 or more measures in the r band (nr >= 10).

SELECT objID, obsTime, psfFlux, psfFluxErr FROM DetectionFull
    into mydb.lightcurve
    WHERE
    filterID = 2
    AND
    objID IN (SELECT objID FROM object WHERE nr >= 10)
    ORDER BY objID, obsTime

(You will need to run this query in the slow queue!) The output was sent to the table lightcurve. To look at the results point the catalog/database to MyDB and run the query

SELECT top 1000 * FROM lightcurve

To get the information for a specific objeect, e.g., objID = 103463341756449716 you would run

SELECT * FROM lightcurve
     WHERE objID = 103463341756449716 	

This example prepared by Jim Heasley. Modified by Conrad Holmberg 04-13-2013


5. Extract random sources from the color-magnitude diagram created in example 3

Select a random sample of 1000 sources in the color-magnitude plot for the entire SA3 database. This is essentially our first query of example 3 with a slight modification to randomize the order using the newid function:

SELECT top 500 objID, (gMeanPSFMag - rMeanPSFMag) as grMeanColor, (rMeanPSFMag - iMeanPSFMag) as riMeanColor FROM object
     WHERE
     gMeanPSFMag  <> -999 AND rMeanPSFMag <> -999 AND iMeanPSFMag <> -999
     ORDER BY NEWID()

This example prepared by Jim Heasley. Modified by Conrad Holmberg 04-13-2013


6. Find detections for a list of sources in a file

From a list of coordinates input by the user, find which match with objects in the database to some spatial tolerance and then extract the detections for them. This query was originally proposed by Sarah Hutton of Durham University in trying to match up known QSOs with sources in the PSPS. There may be more efficient ways to perform this search, but this was the first one tried by the PSPS developer that workd!

The first step is to load your target coordinates into your MyDB. We assume that you are using a comma-separated variable file with a header line that identifies the columns. Here are the first few lines of Hutton's data that is stored in a file named targets.csv

id,ra,dec
2,8.602333,14.39825
3,8.642531,13.76937
4,8.837154,14.62506
6,16.41981,-0.553863
8,35.11477,1.233822
9,35.59353,-0.221799
10,35.62499,0.810419
11,35.94346,-0.652299
12,36.35896,1.023373
13,36.52647,-0.908066
14,36.66141,-0.099411
15,37.16094,0.555606
31,116.4966,22.44748
44,124.8812,5.923227
46,124.9213,5.828524
48,125.5691,6.062468
49,125.7334,6.091327
50,125.7377,7.017876

An additional 169 lines are not shown. You upload this in the MyDB using by clicking on the upload button to expand upload file form. Select the new table button and specify the database table name as "targets" then click "Upload Table" You should then go back to the Query Page and select the MyDB and run the command

SELECT * FROM targets

to confirm that the file has uploaded properly. (This will run nicely in the fast queue unless you have a very large target file.) After confirming your list has uploaded, you want to build a table in the MyDB to hold the results of any matches that the ODM makes with the sources in your list. In the Query Page, with the MyDB selected create the table matches using the following SQL:

CREATE TABLE matches(
    source_id   BIGINT,
    source_ra   REAL,
    source_dec  REAL,
    objID       BIGINT,
    ra          REAL,
    dec         REAL,
    dist        FLOAT
)  

This table will be populated with the input ID, ra, and dec from targets as whatever object ID and positional information your search finds as a match (including the distance between the original position and the closest match. On the Query Page, change the selection from MyDB to PS1_3PI (or 3PI_OLD). Now we do the actual search of the ODM database to see which of the uploaded targets have a corresponding object in the database. The SQL sequence below does this using a SQL cursor. This example uses use the targets and matches tables you set up in your MyDB. The specific example does a cone search of 2 arcseconds around the position you provide. You can change this by adjusting the value of the 3rd argument provided to the fGetNearestObjEQ function.

DECLARE @id int
DECLARE @ra float
DECLARE @dec float
DECLARE @objid bigint

DECLARE my_cursor CURSOR READ_ONLY
FOR
SELECT id, ra, [dec] FROM mydb.targets

OPEN my_cursor
  FETCH NEXT from my_cursor into @id, @ra, @dec

WHILE(@@fetch_status = 0)
BEGIN
  INSERT mydb.matches
  SELECT @id as source_id,@ra as source_ra,@dec as source_dec,
        objID, ra, [dec], distance
        FROM dbo.fGetNearestObjEq(@ra,@dec,2./3600.) -- search 2 arcsec radius
  FETCH NEXT from my_cursor into @id, @ra, @dec
END

CLOSE my_cursor
DEALLOCATE my_cursor

How long the query will take depends upon the size of the input target list and the search radius. To see what was found change the database table to MyDB and run the query

SELECT * FROM matches

and you'll get the list of matches that were found by your search.

Finally, you may want to see which specific detections have been attributed to these object. In the Query Page with the PS1_3PI (or PS1_OLD) database selected run the following query:

declare @tbl00 table(objid bigint not null)
insert @tbl00 select objid from mydb.matches

declare @tbl00 table(objid bigint not null)
insert @tbl00 select objid from mydb.matches

SELECT d.objID, d.detectID, d.filterID, (-2.5*log10(d.psfFlux)+d.zp) INTO mydb.detections_table
FROM DetectionFull d
join @tbl00 o on d.objID=o.objid

In the SQL above we create a temporary table on the ODM head node to hold the match results already stored in your MyDB matches table. One could reference the table mydb.matches in the query but in practice this results in a great deal of inter-node traffic on the cluster and very slow performance. Putting the information back into the head node (in one single, fast copy) results in much faster execution.

In the MyDB you can now run

SELECT * FROM detections_table

to get the list of actual detections found for each object.

This example prepared by Jim Heasley.


7. Count sources in a region only seen in stacks

Sources can be identified either from the individual frames or from the image stacks. One natural question to ask is how many objects are found in the stacks that are not seen in the individual frames? These are sources that are presumably too faint to be detected above the threshold in the individual frames. (This question was in fact posed by Ken Chambers.)

The answer to this question can be addressed by simply looking at the number of distinct objects found in both the Detections and StackDetections in the region of interest. Assume we are looking at the SA3 database so that we are considering a clearly defined region (without having to specify it explicitly in the queries). First, for the SA3 database we run the following queries to get the counts:

SELECT DISTINCT objID INTO mydb.detection FROM Detection

SELECT DISTINCT objID INTO mydb.stack FROM StackDetecion

Then, changing to your MyDB, the number of objects in the stack table that are not in the detection table can be found with the following query

SELECT COUNT(f.objID)FROM
(SELECT s.objID FROM stack s LEFT JOIN detection d ON d.objID = s.objID
WHERE
d.objID IS NULL) f

Another way to perform the actual count given the MyDB tables stack and detection can be written as

SELECT COUNT(f.objID) FROM 
(SELECT s.objID FROM stack s WHERE
s.objID NOT IN (SELECT d.objID FROM detection d)) f

The first form runs somewhat faster than the second.


This example prepared by Jim Heasley.


Other sample queries will be added as the data attributes in the ODM are populated by the IPP. Future queries will demonstrate selections of star and galaxy samples, proper motions and parallaxes, etc.


Last modified 4 years ago Last modified on 11/04/13 21:59:31

Attachments (1)

Download all attachments as: .zip