Gaussian Fat Regression

Apologies for the absence. A brand new research project is getting cranked up and I've been pretty short on time. As such the posting schedule may get a bit erratic. I'll do my best.

I'm also working on improving some other things as well. When I started college I somehow managed to avoid the freshman 15 (the weight lots of people gain during college), but nevertheless ended my four years about that much heavier. That wasn't a bad thing. Lots of it was muscle from doing a bit of working out, and I was pretty slight going into college in the first place.

The problem is that going into grad school I did it again, minus the working out. That's not a trend I want to maintain, and so I've embarked on a project to reverse the process. Here's the results thus far. Y-axis in pounds, x-axis in days, day 1 is 8/12/09, data taken first thing each morning for consistency:

i-7b7142f1efa09b47c3b8242f89bcb9f4-graph.png

There's no special method or system I'm following, just the old calories in < calories out. To do that I've just traded in Coke for water, reduced the number of fast food meals (and I don't get fries when I do), followed a "no junk" rule when shopping for food, and stepped up the walking / stair climbing / pushups / situps / etc. Eventually I'm going to add running to that. I'm not looking forward to it, but it needs to be done.

If you Fourier transformed this data set you'd probably see a spike near the 1 per week mark, as healthy eating tends to bite the dust when doing weekend grilling with friends. But that's ok. The point is to get the average calorie burn rate above the average intake, and a weekend reversal of reasonable size will only dent the average a bit. The average is what I track using the linear best-fit, shown by the line and its associated equation. The linear fit is one of the all-time most useful tools in the experimentalist's kit, allowing first-order relationships to be teased out of noisy data. The way it works is to draw a line and calculate the distance between each point and the line. Square those distances and add them up. The best-fit line is the one for which that sum is smallest. We don't have to do this by trial and error, there's a simple procedure due to Gauss (and Legendre, long story) which does the whole thing in one step. The R^2 value is an indication of how well the data points are fitted by the line. The value here of 0.76 indicates a relatively good fit.

In this case the slope of the line (which represents the pounds-per-day change) is -0.108, corresponding to a loss of 0.108 pounds per day, or about 3/4 pound per week. Nothing dramatic at all, but then I'm not looking for dramatic. (For reference, by the way, I am 5'9") One pound of fat corresponds to around 3500 food calories, so my daily calorie deficit averages 378. This is a bit of an indication of how fragile the weight loss process can be, since one extra slice of pizza can nuke a day's progress. On the other hand, it's also a sign of how easy it can be to get on the right track - cut out just a little and you're making good headway. I think charting it has helped me stick with things. Motivation is tough when visible indications of success are so slow, but the chart helps you see the progress in the daily variability.

How far to take it? My original goal was "below 160". With current progress I'll be there soon - statistically, on the 26th of this month. Fingers crossed.

More like this

As you might have noticed, ScienceBlogs picked up a couple of new bloggers recently. Peter Janiszewski and Travis Saunders moved their blog, Obesity Panacea, over to these parts last week. Their move gives me an opportunity that's way too good to pass up - an excuse to present my latest excuse for…
Darn it! I'm sick and tired of being a scarecrow! Charles Atlas says he can give me a real body. All right! I'll gamble a stamp and get his free book! -Countless Magazine and Comic Book Ads Last weekend, Abbie over at ERV proclaimed herself the fittest person on Scienceblogs, and one of my readers…
CALORIE COUNTS AREN'T TELLING THE FULL STORY By Joe Schwarcz, Freelance July 30, 2011 There are undoubtedly all sorts of terrorists out there hatching intricate plans aimed at destroying the western world. They needn't bother. All they have to do is wait and westerners will eat themselves into…
I've blogged several times on here about the connection between microbes and obesity (aka "infectobesity;" previous posts here, here, and here.). It's an interesting area of study, with two general directions: investigating which of our gut flora (alone or in combination with others) affect our…

I hope you make your goal but your time: extrapolation in regression is always dangerous. :)

Why do you say 0.76 is a relatively good fit? I always thought that was a relatively poor fit. It's only good compared to random, but not so much compared to a straight line.

What constitutes a good fit is very context-sensitive. While 0.76 would be pretty bad for most physical science applications, it's a good indication that our rough average estimate of the weight/time relationship is more than noise.

Be careful. In approximately 4.3 years you will weight nothing at all. Then how will you depress the keys on your keyboard?

By Rhinanthus (not verified) on 06 Oct 2009 #permalink

This seems a good point to mention the Quantified Self group, http://www.meetup.com/quantifiedself/ . Most of them start out measuring weight or some athletic target, and some move on from there. Some are driven by pathologies, such as migraines or diabetes. Once you have data, there's plenty to learn from it.

By Nathan Myers (not verified) on 06 Oct 2009 #permalink

The first 19 days could be loosely fit with a straight line having a slightly increasing slope. Then, after a mysterious drop of 4+ pounds at day 22 you appear to have a completely separate trend with a decreasing slope beginning around the 30th day.

Trying to force the data to fit a single straight line seems to miss the reality of what actually happened. But it's hard to say. Either way of looking at it has a lot of uncertainty.

Drastic weight loss (2+ lbs/week) can be accompanied by kidney stones. Binge dieting tells the body a famine is on and it cuts back metabolism, then rebounds. Fall is here - leave your jacket open when walking outside. A Calorie is a Calorie. Save the World! Be a poikilotherm.

Finally a post on a topic that I somewhat comprehend. I agree with Robert. Adding a dummy variable representing the first ~20 observtions could provide you with a slope closer to the more recent pattern. You could also try a nonlinear relationship. Moreover, since this is time series data and the data can be assumed to be highly autocorrelated, you might want to utilize an autoregressive model. However, as an aspiring economist, I have to recommend that you develop a theory before you fit a line to the data.

By j barreca (not verified) on 06 Oct 2009 #permalink

Hilarious. I wish you'd posted the Fourier transform too.

By Peeter Joot (not verified) on 06 Oct 2009 #permalink

Actually I'm pretty sure observations 14-19 are sketchy enough to discount completely. Long and gross story short, there was likely at minimum moisture condensed in the mechanism (and this is a cheap electronic scale to begin with) due to what we might choose to call a bathroom fixture malfunction. The scale was also picked up and moved during that period.

Dropping those points would more-or-less bring the first 10 data points in line with the rest, but I'm following the scientific ethics rule of not chopping data that merely might be wrong.

Is this that quack/fad "eat less and exercise" weight-loss program? For shame! Next you'll be recommending that "eat moderate amounts of a variety of foods" dietary malarkey!

Anyway. Weight should be plotted logarithmically, of course. So should currency fluctuations, stock prices, the DOW and so on. But who listens to me?

Good luck on the weight loss! By the way, running isn't so bad once you get used to it. I used to hate running, but my wife got me doing it every other day (for the first month we would start out running 1 minute, then walking for 4, adding a running minute and decreasing a walking minute each week). Now that I've been doing it regularly for a while it's actually kind of enjoyable!

Oh, I should also say that my wife got me in the habit of jogging first thing in the morning after waking up before I can start complaining that 1) I'm awake at 0530 and 2) that I'm awake so that I can run.

Congrats. It looks like you're making good progress.

I was doing something like this with my running schedule and caloric intake as well as weight. I think just doing one thing and sticking to it like you have is much better.

And here's a quick quiz for you. How do you measure "distance from the line" so that you can minimize it? In fact, given that you did this in Excel, I'd lay good odds on a bet that your regression line isn't actually the one that minimizes the euclidean distance of the points from the line. But the odds are pretty good that you did minimize the residuals in the y-dimension (which is reasonable for your chart).

Hope you're measuring your weight at a consistent point in the day, such as before breakfast and after urination. When you're measuring weights to a tolerance of 1 kg, a full bladder, or a recent big meal can have a significant effect.

was doing something like this with my running schedule and caloric intake as well as weight. I think just doing one thing and sticking to it like you have is much better.