Home » Questions » Computers [ Ask a new question ]

Distribution of table in time

Distribution of table in time

"I have a MySQL table with approximately 3000 rows per user. One of the columns is a datetime field, which is mutable, so the rows aren't in chronological order.

I'd like to visualize the time distribution in a chart, so I need a number of individual datapoints. 20 datapoints would be enough.

I could do this:

select timefield from entries where uid = ? order by timefield;

and look at every 150th row.

Or I could do 20 separate queries and use limit 1 and offset.

But there must be a more efficient solution..."

Asked by: Guest | Views: 107
Total answers/comments: 4
Guest [Entry]

"Michal Sznajder almost had it, but you can't use column aliases in a WHERE clause in SQL. So you have to wrap it as a derived table. I tried this and it returns 20 rows:

SELECT @rownum:=@rownum+1 AS rownum, e.*
FROM (SELECT @rownum := 0) r, entries e) AS e2
WHERE uid = ? AND rownum % 150 = 0;"
Guest [Entry]

"Something like this came to my mind

select @rownum:=@rownum+1 rownum, entries.*
from (select @rownum:=0) r, entries
where uid = ? and rownum % 150 = 0

I don't have MySQL at my hand but maybe this will help ..."
Guest [Entry]

"As far as visualization, I know this is not the periodic sampling you are talking about, but I would look at all the rows for a user and choose an interval bucket, SUM within the buckets and show on a bar graph or similar. This would show a real ""distribution"", since many occurrences within a time frame may be significant.

SELECT DATEADD(day, DATEDIFF(day, 0, timefield), 0) AS bucket -- choose an appropriate granularity (days used here)
FROM entries
WHERE uid = ?
GROUP BY DATEADD(day, DATEDIFF(day, 0, timefield), 0)
ORDER BY DATEADD(day, DATEDIFF(day, 0, timefield), 0)

Or if you don't like the way you have to repeat yourself - or if you are playing with different buckets and want to analyze across many users in 3-D (measure in Z against x, y uid, bucket):

,COUNT(*) AS measure
,DATEADD(day, DATEDIFF(day, 0, timefield), 0) AS bucket
FROM entries
) AS buckets

If I wanted to plot in 3-D, I would probably determine a way to order users according to some meaningful overall metric for the user."
Guest [Entry]


For whatever reason, your example only works when the where @recnum uses a less than operator. I think when the where filters out a row, the rownum doesn't get incremented, and it can't match anything else.

If the original table has an auto incremented id column, and rows were inserted in chronological order, then this should work:

select timefield from entries
where uid = ? and id % 150 = 0 order by timefield;

Of course that doesn't work if there is no correlation between the id and the timefield, unless you don't actually care about getting evenly spaced timefields, just 20 random ones."