Graph: power(W) x time(weeks).
Windows Size: 5
Prediction Size: 1
Neurons in the first Layer: 10
I'm using the weeks to be predicted as inputs (as well as the powers itselves), and they are converted as binary numbers, from 1 to 52 and then repeating (since a year has 52 weeks). So, for this example I have 5 inputs (the previous powers) + 6 inputs (the week to be predicted as a binary number).
Also, I would like to emphasize that, when I test the neural network, I don't feed it with real data, just with the predicted data.
Here are some examples of weird prediction:
Blue points: real training data.
Black points: real test data.
Red line: training curve.
Green line: test curve.
Windows Size: 5
Prediction Size: 1

Windows Size: 10
Prediction Size: 3

Windows Size: 10
Prediction Size: 3

Windows Size: 15
Prediction Size: 5

With another data base, the test curve sometimes starts to increase and descrease periodically.
Then I have tried to use the difference as input, instead of the input itself. The result was better, but was not always good, sometimes it decreased, sometimes it incresead and apparently it was random.
Windows Size: 5
Prediction Size: 1
(Difference)

Windows Size: 5
Prediction Size: 1
(Converted)

Windows Size: 5
Prediction Size: 1
(Converted)

Another trying was changing the algorithm to resilient backpropagation, but I don't have any knowledge about it. Is it better? It improved the learning of the neural network, and it was faster, but it still showed some random results, some good, some bad.
Does anyone have any idea why is this happening? How can I avoid it? Would it be just a configuration problem? Or overfitting?
I am accepting any help, even totally different ideas to improve the prediction.