Often in science, we are seeking underlying explanations to behavior, and propose that numerical measurements might be correlated to experimental variables through a mathematical equation. One common question is whether the behavior fits this mathematical model as we change a variable. Alternatively (as in the case of the Thermodynamics experiment), we are confident that the behavior should fit, but we need to extract a reliable estimate of the parameters, usually by getting the best estimate of a line's slope and y-intercept. Plotting the "best" line through experimental data (with scatter) requires using a technique called regression analysis. In general, regression can be applied to any function, but "linear regression" is the most straightforward.
If we have a simple linear model, we expect the data to behave according to
$$y=mx+b$$
where x is some "independent" variable that we select and set in some fashion, and y is the "dependent" observation that we measure. The parameters m (slope) and b (y-intercept) are often/usually physically significant things that our analysis attempts to quantitatively measure, so we need to apply statistical theory to establish confidence limits for m and b, whatever they might be. Normally, experimental uncertainty in x should be small and contribute little to error (we can explore this via propagation of error; our treatment here assumes that random error in y is the primary contributor to uncertainty in m and x).
Practically, modern computer spreadsheets implement regression analysis and for most applications are sufficient for doing things when all data are going to be equally weighted. However, you should never treat such programs as "black boxes" since at some point you will unwittingly use them inappropriately.
Let's define things in general. Our variable y is some generic function of x and parameters α1, α2, ..., αn. For each point, assuming some best value for each parameter α, the error will be yi - f(α1, α2, ..., αn, xi. Since some values will deviate positively and some negatively, we can square each error value; finding the sum of the squares gives us an aggregate measure of deviation from out "best fit" parameters:
$$X^2=Σ[{y_i-f(α_1,α_2,...α_n,X_i)^2/σ_i}]$$
The σi values are weights; we will for the moment assume these are equal (thus equal to 1) and disappear from further tratment.
We now take the partial derivative of X2 with respect to each parameter (α1, α2, ..., αn), set each derivative equal to zero, and solve for each parameter. The formal exercise is presented either in Garland, Nibler & Schoemaker, Chapter XXII, or in Shafer & Zhang, Ch. 10.
For simple 2-parameter, single variable linear equations with unit weighting, the results of this require doing the following: